I recently passed the AWS Certified Solutions Architect – Associate exam. This certification allows me to prove my knowledge of the cloud and be able to help companies who are using or will be using AWS cloud services as their cloud provider. For more information about AWS Certification, please visit the official site here: https://aws.amazon.com/certification/
Initially used by developers and start-up companies, AWS has grown into a solid and robust cloud services provider. Big enterprises are realizing the value of AWS and more and more companies are extending their data centers to the cloud. For most companies, traditional on premises infrastructure may not be sufficient anymore as business users demand more from IT, including faster and scalable IT services.
Getting AWS-certified requires hard work. You need to read the book, enroll in a training class (if possible), do practice tests and get hands-on experience in AWS. In addition, you should also have a working knowledge of networking, virtualization, database, storage, servers, scripting / programming and software applications. IT professionals should invest their skills in the cloud or run the risk of being obsolete.
AWS services have many capabilities. When migrating existing applications to the cloud or creating new applications for the cloud, it is important to know these AWS capabilities in order to architect the most resilient, efficient, and scalable solution for your applications.
Cloud architecture and on-premise architecture differs in so many ways. In the cloud, you treat the infrastructure as a configurable and flexible software as opposed to hardware. You need to have a different mindset when architecting in the cloud because the cloud has a different way of solving problems.
You have to consider the following design principles in AWS cloud:
- Design for failure by implementing redundancy everywhere. Components fail all the time. Even whole site fail sometimes. For example, if you implement redundancy of your web/application servers in different availability zones, your application will be more resilient when one availability zone fails.
- Implement scalability. One of the advantages of using the cloud vs on-premise is the ability to grow and shrink the resources you need depending on the demand. AWS supports scaling your resources vertically and horizontally, even automating it by using auto-scaling.
- Use AWS storage service that fits your use case. AWS has several storage services with different properties, cost and functionality. Amazon S3 is used for web applications that need large-scale storage capacity and performance. It is also used for backup and disaster recovery. Amazon Glacier is used for data archiving and long-term backup. Amazon EBS is a block storage used for mission-critical applications. Amazon EFS (Elastic File System) is used for SMB or NFS shares.
- Choose the right database solution. Match technology to the workload: Amazon RDS is for relational databases. Amazon DynamoDB is for NoSQL databases and Amazon Redshift is for data warehousing.
- Use caching to improve end user experience. Caching minimizes redundant data retrieval operations making future requests faster. Amazon CloudFront is a content delivery network that caches your website via edge devices located around the world. Amazon ElastiCache is for caching data for mission-critical database applications.
- Implement defense-in-depth security. This means building security at every layer. Referencing the AWS “Shared Security” model, AWS is in-charge of securing the cloud infrastructure (including physical layer and hypervisor layer) while the costumer is in-charge of the majority of the layers from the operating system up to the application layer. This means customer is still responsible for patching the OS and making the application as secure as possible. AWS provides security tools that will make your application secure such as IAM, security groups, network ACL’s, CloudTrail, etc.
- Utilize parallel processing. For instance, multi-thread requests by using concurrent threads instead of sequential requests. Another example is to deploy multiple web or application servers behind load balancers so that requests can be processed by multiple servers at once.
- Decouple your applications. IT systems should be designed in a way that reduces inter-dependencies, so that a change or failure in one component does not cascade to other components. Let the components interact with each other only through standard APIs.
- Automate your environment. Remove manual process to improve system’s stability and consistency. AWS offers many automation tools to ensure that your infrastructure can respond quickly to changes.
- Optimize for cost. Ensure that your resources are sized appropriately (they can scale in and out based on need), and that you are taking advantage of different pricing options.
Sources: AWS Certified Solutions Architect Official Study Guide; Global Knowledge Architecting on AWS 5.1 Student Guide
On February 28, 2017, the Amazon Simple Storage Service (S3) located in the Northern Virginia (US-EAST-1) Region went down due to an incorrect command issued by a technician. A lot of websites and applications that rely on the S3 service went down with it. The full information about the outage can be found here: https://aws.amazon.com/message/41926/
While Amazon Web Services (AWS) could have prevented this outage, a well-architected site should not have been affected by this outage. Amazon allows subscribers to use multiple availability zones (and even redundancy in multiple regions), so that when one goes down, the applications are still able to run on the others.
It is very important to have a well-architected framework when using the cloud. AWS provides one that is based on five pillars:
- Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.
- Reliability – The ability of a system to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
- Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.
- Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.
- Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.
For those companies who were affected by the outage, applying the “reliability” principle (by utilizing multiple availability zones, or using replication to different regions), could have shielded them from the outage.