On February 28, 2017, the Amazon Simple Storage Service (S3) located in the Northern Virginia (US-EAST-1) Region went down due to an incorrect command issued by a technician. A lot of websites and applications that rely on the S3 service went down with it. The full information about the outage can be found here: https://aws.amazon.com/message/41926/
While Amazon Web Services (AWS) could have prevented this outage, a well-architected site should not have been affected by this outage. Amazon allows subscribers to use multiple availability zones (and even redundancy in multiple regions), so that when one goes down, the applications are still able to run on the others.
It is very important to have a well-architected framework when using the cloud. AWS provides one that is based on five pillars:
- Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
- Reliability – The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
- Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
- Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.
- Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.
For those companies who were affected by the outage, applying the “reliability” principle (by utilizing multiple availability zones, or using replication to different regions), could have shielded them from the outage.