The Limitations of Scaling with EC2

October 8, 2008

Just as with any platform you choose, EC2 has its own limitations as well. These limitations are often different and harder to overcome than what you might find while running your own hardware. Without the proper planning and development, these limitations can wind up being extremely detrimental to the well being and scalability of your website or service.

There are quite a few blogs, articles and reviews out there that mention all the positive aspects of EC2 and I have written a few of them myself. However, I think users need to be informed of the negative aspects of a particular platform as well as the positive. I will be brief with this post as my next will focus on designing an architecture around these limitations.

The biggest limitations of Amazon’s EC2 at the moment as I have experienced, are the latencies between instances, latencies between instances and storage (local, and EBS), and a lack of powerful instances with more than 15GB of RAM and 4 virtual CPUs.

All the latency issues can all be traced back to the same root cause, a shared LAN with thousands of non localized instances all competing for bandwidth. Normally, one would think a LAN would be quick… and they generally are, especially when the servers are sitting right next to each other with a single switch sitting in between them. However, Amazon’s network is much more extensive than most local LANs and chances are your packets are hitting multiple switches and routers on their way from one instance to another. Every extra node added between instances is just another few milliseconds that get added to the packet’s round trip time. You can think of Amazon’s LAN as a really small Internet. The layout of Amazon’s LAN is very similar to that of the Internet, there is no cohesiveness or localization of instances in relation to one another. So lots of data has to go from one end of the LAN to the other, just like on the Internet. This leads to data traveling much farther than it needs to and all the congestion problems that are found on the Internet can be found on Amazon’s LAN.

For computationally intensive tasks this really isn’t too big a deal but for those who rely on speedy database calls every millisecond added per request really starts adding up if you have lots of requests per page. When the CitySquares site moved from our own local servers to EC2 we noticed a 4-10x increase in query times which we attribute mainly to the high latency of the LAN. Since our servers are no longer within feet of each other, we have to contend with longer distances between instances and congestion on the LAN.

Another thing to take into consideration is the network latency for Amazon’s EBS. For applications that move around a lot of data, EBS is probably a god send as it has a high bandwidth capability. However, in CitySquares’ case, we wind up doing a lot of small file transfers to and from our NFS server as well as EBS volumes. So while there is a lot of bandwidth available to us, we can’t really take advantage of it, especially since we have to contend with the latency and overhead of transferring many small files. Not only are small files an issue for us but we also run our MySQL database off of an EBS volume. Swapping to disk has always been a critical issue for databases but the added overhead of network traffic can wreak havoc on your database load much more than normal disk swapping. You can think of the difference in access times from disk to disk over a network as a book on a bookcase vs a book somewhere down the hall in storage room B. Clearly the second option would take far longer to find what you are looking for and that’s what you have to work with if you want to have the piece of mind of persistent storage.

The last and most important limitation for us at CitySquares was the lack of an all powerful machine. The largest instance Amazon has to offer is one with just 15GB of ram and 4 virtual CPUs. In a day and age where you can easily find machines with 64GB of RAM and 16 CPUs, you are definitely limited by Amazon. In our case, it would be much easier for us just to throw hardware at our database to scale up but the only thing we have at our disposal is a paltry 15GB of RAM. How can this be the biggest machine they offer? Instead of dividing one of those machines in quarters just give me the whole thing. It just seems ludicrous to me that the largest machine they offer is something not much more powerful than the computer I’m using right now.

Long story short, just because you start using Amazon’s AWS doesn’t mean you can scale. Make sure your architecture is tolerant of higher latencies and can scale with lots of little machines because that’s all you have to work with.

3 Responses to “The Limitations of Scaling with EC2”

  1. James Court Says:

    We’re seriously considering EC2 for hosting our new Drupal site. Is it something you’d recommend overall? I appreciate you’re trying to provide a balanced view but do you think it was the right choice for you?

    Also did you look at any other cloud services? We’ve looked at mosso but can’t see how it can work without support for custom daemons like our search platform requires.

    I’d love to hear what you think of EC2 and how\why you made the decision to select them instead of another company.

  2. Clay vanSchalkwijk Says:

    I believe the problem stems from the web application in general. EC2 is a horizontal scaling platform and applications must be treated with that design in mind. Unfortunately, a lot of design still revolves around vertical scaling, and while EC2 is a very powerful platform most users thinking of switching over aren’t ready to leverage the platform for what it is. You just can’t make a square peg fit in a round hole.


  3. […] in the air. After these benchmarks we decided that the multi-core index that had served us well on Amazon’s EC2 no longer worked well for us on our new managed hosting. We are currently running a single index at […]


Leave a comment