High availability On AWS: Are You Prepared?
AWS forced reboots are a good reminder for cloud adopters… if you did not architect for high availability, you may see some downtime on AWS periodically. If you took the simple steps below, you almost certainly saw no downtime.
Running a single server on AWS is much more reliable than the same thing on-premise, but it is still not fault-proof. Hardware fails, networks hiccup, and AWS can have data-center-wide issues. Just running a server on AWS does not make it highly available.
In AWS, each “region” has multiple “availability zones” (AZ), which are separate data centers with almost no shared points of failure. It is very rare for an AWS issue (or a natural disaster) to span multiple AZs at the same time. When they did their recent reboots, AWS was careful to only touch one AZ per night.
Their SLA for a single AZ is only 99.95%… that’s 5 minutes of downtime per week! If your application can’t tolerate that, you need to architect for high availability. There are many things you can do on AWS for high availability, but for a basic two-tier web application, there are two simple, and often misunderstood steps.
Load balanced web servers across AZs:
When using AWS’s Elastic Load Balancer, it is trivial to load balance across AZs. Just make sure to launch each instance in a different AZ. Even if your site does not get high traffic, why not choose a small instance size and run two, rather than a single larger size?
Run your database in RDS… unless you have highly advanced requirements (and almost everyone that says they do, really doesn’t), you should run your database in RDS. Then you simple select a checkbox “Multi-AZ”. (You can do this with Oracle, SQL Server, MySQL, or Postgres.) This sets up a synchronous replica in a second AZ, and it automatically fails over in the event of an issue with perhaps 60 seconds of downtime. It’s fully managed, there’s no extra effort on your part, and it just works. Just make sure your application does a good job of handling connections… proper retry logic and don’t cache too long. It takes all the pain out of database replication and makes your database AZ-redundant, alongside your web server.
That’s it. If you want a highly available application on AWS, make sure to do these two simple steps right away.
Oh, and something else to keep in mind… ever tried building a data center-redundant environment on-premise? Multiple data center contracts, multiple hardware deployments, special software, an expensive network link… it’s a really painful enterprise. This is the beauty of IaaS in general, and AWS specifically… data center redundancy with a few clicks.