Web Technology | Justin Leider's Web Think Tank

SOLR Performance Benchmarks – Single vs. Multi-core Index Shards

May 5, 2009

Single vs. multi-core sharded index. Which one is the right one? There is not a whole lot of information out there, especially when it comes to hard numbers and comparisons. There are a couple reasons for this. The first one that comes to mind is the multi-core functionality offered by Apache SOLR is very nascent. It was recently introduced with the latest SOLR v1.3 and hasn’t had much time to be adopted by the SOLR community. Second, the results are dependent on your schema, index size, query types and user load. These factors can account for varying performance results. As evidenced by the following benchmarks, a multi-core SOLR index has the potential to speed up the performance of your application or cut throughput and scalability by approximately the inverse number of cores.

i.e. For n cores the maximum throughput is roughly 1/n vs. a single index.

With multi-core sharded indexes the underlying assumption is that search performance improves by splitting your index into smaller chunks. These smaller shards are then faster and more efficient to search and index. However, you never get anything for free, the performance increase comes at a cost of higher CPU utilization. By breaking the index into multiple smaller pieces it makes searching and indexing on that smaller subset of the index faster, but you’ll need to search each core individually for every query. Where as a single index runs one slightly slower query, a multi-core sharded query runs n queries in parallel and then combines the results.

Read the rest of this entry »

Posted by Justin Leider
Filed in solr, Web Technology
Tags: apache, Performance, scalability, Search, Shards, solr, Throughput

2 Comments »

100x Increase in SOLR Performance and Throughput

April 27, 2009

Is your SOLR installation running slower than you think it should? Performance, throughput and scalability not what you are expecting or hoping? Do you constantly see that others have much higher SOLR query performance and scalability than you do? All it might take to fix your woes is a simple schema or query change.

The following scenario I am about to describe is proof positive that you should always take the time to understand the underlying functionality of whatever operating system, programming language or application you are using. Let my oversight and ‘quick fix solution’ be a lesson to you, it is almost always worth the upfront cost of doing something right the first time so you don’t have to keep revisiting the same issue.
Read the rest of this entry »

Posted by Justin Leider
Filed in solr, Web Architecture, Web Technology
Tags: architecture, Performance, scalability, solr

5 Comments »

PHP/KML Polyline Simplification with Douglas-Peucker

April 20, 2009

php-med-trans-light Quality GIS data sometimes comes with a lot more precision than what is usable for Google Maps (or other mapping software). The problem lies in the number of points representing a polygon that you want to overlay. A county representation for a state might include 100,000 points that is not usable without some form of reduction. Luckily there is an algorithm that solves that problem, Douglas-Peucker.

The algorithm simplifies a polyline by removing vertices that do not contribute (sufficiently) to the overall shape. It is a recursive process which finds the most important vertices for every given reduction. First, the most basic reduction is assumed. A single segment connecting the beginning and end of the original polyline. This is when the recursion starts, the most significant vertex (the most distant) for this segment is found and, when the distance from this vertex to the segment exceeds the reduction tolerance, the segment is split into two sub-segments, each inheriting a subset of the original vertex list. Each segment continues to subdivide until none of the vertices in the local list are further away than the tolerance value.

There is a PHP class that does just this: Douglas-Peucker Polyline Simplification in PHP by Anthony Cartmell. Based on the original quality of the data and tolerance level, I was able to achieve a 90-93% reduction in size. This reduction allows me to represent significantly more data at a reasonable performance level to clients. Keep in mind, that this reduction is removing data out of the coordinate array so the quality of your representation will go down with the tolerance and reduction being applied. I highly suggest that you play around with the tolerance until you find a good balance between data size and image quality.

Posted by cvanclay
Filed in php, Web Technology

1 Comment »

There and Back Again, an EC2 MySQL Cluster

January 26, 2009

Limitations of EC2 as a web platform:

Price– An m1.xlarge instance will run you ~$600 with data transfer costs. Managed hosting solutions run cheaper especially if you plan on purchasing in bulk. The grid is designed for on-demand computation and not as a cost efficient web services.
Configuration– There are a limited number of options and you will not be able to tailor the hardware to your application. Databases over 10GB of size will have performance issues since that is the memory cap.
Network storage– The primary disks offers limited storage, additional volumes will need to be attached across the network and at an additional cost.
Software – No hardware based solutions for load balancing or custom application servers. The model is software driven so all needs must be met with a software solution. In a managed hosting or collocation solution you will at least have the option of adding additional hardware and having a private network. No dedicated switching, routers, firewall, or load balancers.

EC2 might be right for you if:

Distribution Awareness– Your application was designed to scale horizontally from the get-go and you can take advantage of grid computing.
Research and Development– EC2 & Rightscale will allow for you to bring up new servers, test configuration, and scale quickly. If you are not sure what your hardware demands will be or the scope of the project it will allow for some flexibility to get this right before committing to rather lengthy contracts with other hosting options.
Disaster Recovery– If you need an off-site mirror for your site that you can keep dormant and activate as needed.

Over the past several months I have been doing extensive development using Amazon’s EC2 as my hardware infrastructure. I was tasked with taking CitySquares.com from a New England area hyper-local search and business directory to a national site in a few months. Due to the memory limitations of EC2 instances, m1.xlarge only providing 15GB the jump from a 15GB database to anything larger becomes very costly. When everything was able to be contained in two servers in a master/slave environment we were able to provide redundancy, performance, and easy management when working with the database. The final estimations of the national roll out would put our core data at 50GB. Far too large for any one EC2 instance. Going to disk was not an option as everything works off of EBS attached storage and that any disk writes means traveling over the network. An additional overhead which degrades performance even more when switching off RAM. Then, there was also the nature of the data, which means that at any page load, any piece of data could be requested.

With OS overhead, index storage, and ndb overhead, each x1.mlarge instance gave about 12GB of usable storage. Include replication, and it comes down to 6GB of storage per node. To store a ~50GB database that I had planned on requires 8 storage nodes, two management servers, and two mysqld api servers. This is where it became important to understand the advantages of vertical scaling versus horizontal scaling. EC2 provided fast horizontal scaling and configuration. Servers can be launched on demand and their configurations scripted. While I appreciated that aspect of cloud computing and being able to bring that many servers up and configure each one relatively quickly I really just needed two decent boxes with 64GB of RAM in each and a master/slave setup. The operations costs for the cluster was $6000/month, a hefty bill considering I could buy all the hardware needed to run the cluster in just a few months of paying for EC2.

Twelve servers later and a working cluster we were able to successfully roll out our MySQL Cluster with minimal performance loss. A lot of it was due to tweaking every query top to bottom. The site runs on a Drupal core, which meant a lot of the queries were not designed with distribution awareness both from code in house and the core. This was another added growing pain since the network overhead of running 12 servers on shared resources, with mediocre latency, and throughput limitations was amplifying flaws in the database design and every poorly designed query and join degraded performance significantly.

To give credit where credit is due, EC2 did allow us to scale the site up rather quickly. We were able to test server configurations, new applications, and have easy management. It would not have been possible for us to push out the data, handle influx of new traffic, and expand as fast as we had without it. Long term however, it made absolutely no sense that once we were finished scaling up, to stay on EC2. It is a great platform for start-ups to be able to configure and launch servers for their service or application and grow rapidly.

As a database platform, until EC2 offers more configuration options in it’s hardware and the ability to increase memory, the cap of 15GB will make EC2 problematic for any database that plans on growing past that. It is important to understand your application and database needs before considering EC2. It is no surprise to me that even with the RightScale interface, and easy management of EC2 web sites are reluctant to switch off their own hardware or managed hosting.

Both Sun and Continuent are pursuing MySQL Clustering on cloud computing. As of this post, Continuent for EC2 was still in closed beta and testing and Sun is doing their own research into offering more support for database clusters on compute clouds. Maybe in the future this can be something to revisit, but it would require more ndb configuration options on the network layer to cope with shared bandwidth and additional hardware configurations by Amazon (someday).

The 6.4 release which is in beta now offers new features which would make MySQL Clustering more attractive on cloud computing architecture:

Ability to add nodes and node groups online. This will allow the database to scale up without taking the cluster down.
Data node multithreading support. The m1.xlarge instance comes with 4 cores.

If you are setting up a MySQL cluster, the following resources will help get you up and running quickly:

Posted by cvanclay
Filed in SQL, Web Architecture, Web Technology

3 Comments »

The Limitations of Scaling with EC2

October 8, 2008

Just as with any platform you choose, EC2 has its own limitations as well. These limitations are often different and harder to overcome than what you might find while running your own hardware. Without the proper planning and development, these limitations can wind up being extremely detrimental to the well being and scalability of your website or service.

There are quite a few blogs, articles and reviews out there that mention all the positive aspects of EC2 and I have written a few of them myself. However, I think users need to be informed of the negative aspects of a particular platform as well as the positive. I will be brief with this post as my next will focus on designing an architecture around these limitations.

The biggest limitations of Amazon’s EC2 at the moment as I have experienced, are the latencies between instances, latencies between instances and storage (local, and EBS), and a lack of powerful instances with more than 15GB of RAM and 4 virtual CPUs.

All the latency issues can all be traced back to the same root cause, a shared LAN with thousands of non localized instances all competing for bandwidth. Normally, one would think a LAN would be quick… and they generally are, especially when the servers are sitting right next to each other with a single switch sitting in between them. However, Amazon’s network is much more extensive than most local LANs and chances are your packets are hitting multiple switches and routers on their way from one instance to another. Every extra node added between instances is just another few milliseconds that get added to the packet’s round trip time. You can think of Amazon’s LAN as a really small Internet. The layout of Amazon’s LAN is very similar to that of the Internet, there is no cohesiveness or localization of instances in relation to one another. So lots of data has to go from one end of the LAN to the other, just like on the Internet. This leads to data traveling much farther than it needs to and all the congestion problems that are found on the Internet can be found on Amazon’s LAN.

For computationally intensive tasks this really isn’t too big a deal but for those who rely on speedy database calls every millisecond added per request really starts adding up if you have lots of requests per page. When the CitySquares site moved from our own local servers to EC2 we noticed a 4-10x increase in query times which we attribute mainly to the high latency of the LAN. Since our servers are no longer within feet of each other, we have to contend with longer distances between instances and congestion on the LAN.

Another thing to take into consideration is the network latency for Amazon’s EBS. For applications that move around a lot of data, EBS is probably a god send as it has a high bandwidth capability. However, in CitySquares’ case, we wind up doing a lot of small file transfers to and from our NFS server as well as EBS volumes. So while there is a lot of bandwidth available to us, we can’t really take advantage of it, especially since we have to contend with the latency and overhead of transferring many small files. Not only are small files an issue for us but we also run our MySQL database off of an EBS volume. Swapping to disk has always been a critical issue for databases but the added overhead of network traffic can wreak havoc on your database load much more than normal disk swapping. You can think of the difference in access times from disk to disk over a network as a book on a bookcase vs a book somewhere down the hall in storage room B. Clearly the second option would take far longer to find what you are looking for and that’s what you have to work with if you want to have the piece of mind of persistent storage.

The last and most important limitation for us at CitySquares was the lack of an all powerful machine. The largest instance Amazon has to offer is one with just 15GB of ram and 4 virtual CPUs. In a day and age where you can easily find machines with 64GB of RAM and 16 CPUs, you are definitely limited by Amazon. In our case, it would be much easier for us just to throw hardware at our database to scale up but the only thing we have at our disposal is a paltry 15GB of RAM. How can this be the biggest machine they offer? Instead of dividing one of those machines in quarters just give me the whole thing. It just seems ludicrous to me that the largest machine they offer is something not much more powerful than the computer I’m using right now.

Long story short, just because you start using Amazon’s AWS doesn’t mean you can scale. Make sure your architecture is tolerant of higher latencies and can scale with lots of little machines because that’s all you have to work with.

Posted by Justin Leider
Filed in Drupal, EC2 and RightScale, Web Architecture, Web Technology
Tags: Amazon, AWS, AWS Limitations, ec2, EC2 Limitations, scalability, Scaling, Scaling with EC2, Web Architecure

3 Comments »

Running your own hardware Vs EC2 and RightScale — Part 2

September 16, 2008

This week I’ve been reminded of a very important lesson… No matter how abstracted you are from your hardware, you still inherently rely on its smooth and consistent operation.

This past week CitySquares‘ NFS server went down for the count and was completely unresponsive to any type of communication. In fact, the EC2 instance was so FUBAR we couldn’t even terminate it from our RightScale dashboard. A post on Amazon’s EC2 board was required to terminate it. Turns out the actual hardware our instance was running on had a catastrophic failure of some sort. Otherwise, at least so I’m told, server images are usually migrated off of machines running in a degraded state automatically.

Needless to say, the very reasons for deciding against running our own hardware have come back to plague us. Granted we weren’t responsible for replacing the hardware but we were still affected by the troublesome machine. We weren’t just slightly affected by the loss of our NFS server either. Since we are running off of a heavily modified Drupal CMS our web servers depend on having a writable files directory. As it turned out Apache just spun waiting for a response from the file system, our web services ground to a halt waiting on a machine that was never going to respond… ever. Talk about a single point of failure! A non critical component, serving mainly images and photos managed to take down our entire production deployment.

This event has prompted us to move forward with a rewrite of Drupal’s core file handling functionality. The rewrite will include automatically directing file uploads to a separate domain name like csimg.com or something similar. Yahoo goes into more detail with their performance best practices. However, editing the Drupal core is generally frowned upon and heavily discouraged since it usually conflicts with the upgrade path and maintainability of the Drupal core becomes much more difficult. While we haven’t stayed out of the Drupal core entirely, the changes we have made are minor and only for performance improvements. I believe it is possible to stay out of the core file handling by hooking into it with the nodeapi but it seems like more trouble than its worth.

The idea behind the file handling rewrite is to serve our images and photos directly from our Co-Location while keeping a local files directory on each EC2 instance for non user committed things like CSS and JS aggregation caching among other simple cache related items coming from the Drupal core. This rewrite will allow us to run one less EC2 instance, saving us some money as well as remove our dependence on a catastrophic single point of failure.

For the time being we have set up another NFS server. This time based on Amazon’s new EBS product. I spoke about this in a previous post. One of the issues we had when the last NFS server went down was the loss of user generated content. Once the instance went down all the storage associated with that instance went down with it. There was no way to recover from the loss, it was just gone. This is just one of the many possible problems you can run into with the cloud. While on the pro side, you don’t have to worry about owning your own hardware, the con side is you cant recover from failures like you can with your own hardware. This is a very distinct difference and should be seriously considered before dumping your current architecture for the cloud.

Posted by Justin Leider
Filed in CMS, Drupal, EC2 and RightScale, Web Architecture, Web Technology
Tags: Amazon EBS, Amazon EC2, CMS, Drupal, Elastic Block Storage, Elastic Compute Cloud, File Handling, NFS, Own Hardware, rightscale, Single Point of Failure, Site Architecture, Site Infrastructure, Yahoo Best Practices

3 Comments »

Nuances of EC2 and RightScale

September 5, 2008

So here it is, about two weeks have passed since CitySquares officially migrated its server infrastructure over to EC2 and RightScale. All in all, everything went relatively well. There were a few hiccups on the cut over day that left users with some error pages. Most of these issues were related to the DNS changeover and a little confusion over whether to set up the DNS records with Amazon’s internal IPs or the elastic external IPs. Common sense said to set the DNS to the external IPs but turns out we were supposed to use the internal IPs (10.0.0.0/8 and not the elastic IPs 75.0.0.0/8) when referencing machines that are within the Amazon networks. Oops.

Other than that, Ive spent the last couple weeks smoothing everything out and getting things working at 100%. There were a few bugs that cropped up at first, mainly IT stuff, Apache configs, htaccess issues, HAProxy issues, making sure MySQL and our NFS server was backing up correctly. All these things took precedence but lately Ive been working on trying to increase performance. At this moment I’m not entirely sure why but, our MySQL database is running queries extremely slowly. At this point it could be anything from network latency, to slow machines, to an improperly tuned config. However, MySQL performance tuning is out of the scope of this post and will be the topic of a future entry. (If a MySQL DBA is reading this and would like the opportunity to play around with EC2 and RightScale, please get in touch with me.)

In preparation for the tuning, not only for the MySQL server but the Apache servers as well, I have been setting up a separate development environment that is exactly identical to our production. With RightScale’s clone feature I was able to easily duplicate everything from one deployment to the other. That said, let me make it clear that it will copy Everything. After changing all the necessary script inputs for the dev deployment I figured I was ready to start launching the new servers… WRONG. After booting the dev master DB server as well as our dev load balancer and dev NFS server I realized that they had stolen all the IPs from our production deployment! Bad News! Needless to say, CitySquares was down for the count for the few minutes it took me to figure out what had happened, fix the mistake and then wait for Amazon to reassign the elastic IPs. So here is a friendly reminder, check the server info tab before launching and make sure it isn’t going to clobber your existing elastic IPs.

Another somewhat annoying issue I ran into while trying to copy over our MySQL S3 backup from the production bucket to the development bucket was the lack of a decent copy function. RightScale has provided copy and move functionality on a somewhat basic level. You can move or copy files either one or many at a time. However, there is a limitation to this. Each file you copy will append its location into the URL and each directory path its somewhat long. Eventually you reach the maximum URL string limit and all the effort you put into selecting the files is for nothing. Not only do you have to select every file you want to copy, you have to manually assign it to the new location. This means lots of copy and pasting. If you have a directory that has hundreds of files in it, good luck. You are better off just uploading it to a new bucket. Either way, this could have been easily solved by having a copy bucket or directory option. Problem solved.

While these few things are annoying, they aren’t show stoppers, but they are definitely things to keep in mind when using these services. I’d like to end on a positive note so Ill mention the exceptional monitoring services that are installed and configured by default on every server image we have used so far. I am extremely impressed with the out of the box functionality of the graphs and they definitely make up for the other shortcomings. They have everything I could ever want to look at and then some. From standard CPU load to the number of I/Os p/s as well as yearly, quarterly, monthly, daily and hourly time frames in three sizes, small, medium and large. All browsable via up to date thumbnail previews.

If you are considering cloud computing, I would recommend taking a look at RightScale and Amazon’s web services.

Posted by Justin Leider
Filed in EC2 and RightScale, Web Technology
Tags: apache, citysquares, Development Environment, ec2, mysql, rightscale, s3, Server Infrastructure

1 Comment »

Amazon's EBS (Elastic Block Store)

August 21, 2008

I wrote just yesterday about running your own hardware vs. using EC2 and RightScale and one of the major issues I found with EC2 was the lack of a persistent storage medium. Well, I knew the folks over at Amazon were hard at work on a new service that would allow persistent storage and turns out I received this email in my mailbox this morning:

Dear AWS Developer,

We are pleased to announce the release of a significant new Amazon EC2 feature, Amazon Elastic Block Store (EBS), which provides persistent storage for your Amazon EC2 instances. With Amazon EBS, storage volumes can be programmatically created, attached to Amazon EC2 instances, and if even more durability is desired, can be backed with a snapshot to the Amazon Simple Storage Service (Amazon S3).

Prior to Amazon EBS, block storage within an Amazon EC2 instance was tied to the instance itself so that when the instance was terminated, the data within the instance was lost. Now with Amazon EBS, users can chose to allocate storage volumes that persist reliably and independently from Amazon EC2 instances. Amazon EBS volumes can be created in any size between 1 GB and 1 TB, and multiple volumes can be attached to a single instance. Additionally, for even more durable backups and an easy way to create new volumes, Amazon EBS provides the ability to create point-in-time, consistent snapshots of volumes that are then stored to Amazon S3.

Amazon EBS is well suited for databases, as well as many other applications that require running a file system or access to raw block-level storage. As Amazon EC2 instances are started and stopped, the information saved in your database or application is preserved in much the same way it is with traditional physical servers. Amazon EBS can be accessed through the latest Amazon EC2 APIs, and is now available in public beta.

We hope you enjoy this new feature and we look forward to your feedback.

Sincerely,

The Amazon EC2 team

So this is indeed good news and removes the biggest con I mention about the EC2 platform!

Posted by Justin Leider
Filed in Web Technology
Tags: Amazon, Cloud Computing, EBS, ec2, Elastic Block Store, Persistent Storage, rightscale

5 Comments »

Amazon’s EBS (Elastic Block Store)

August 21, 2008

Dear AWS Developer,

We are pleased to announce the release of a significant new Amazon EC2 feature, Amazon Elastic Block Store (EBS), which provides persistent storage for your Amazon EC2 instances. With Amazon EBS, storage volumes can be programmatically created, attached to Amazon EC2 instances, and if even more durability is desired, can be backed with a snapshot to the Amazon Simple Storage Service (Amazon S3).

Prior to Amazon EBS, block storage within an Amazon EC2 instance was tied to the instance itself so that when the instance was terminated, the data within the instance was lost. Now with Amazon EBS, users can chose to allocate storage volumes that persist reliably and independently from Amazon EC2 instances. Amazon EBS volumes can be created in any size between 1 GB and 1 TB, and multiple volumes can be attached to a single instance. Additionally, for even more durable backups and an easy way to create new volumes, Amazon EBS provides the ability to create point-in-time, consistent snapshots of volumes that are then stored to Amazon S3.

Amazon EBS is well suited for databases, as well as many other applications that require running a file system or access to raw block-level storage. As Amazon EC2 instances are started and stopped, the information saved in your database or application is preserved in much the same way it is with traditional physical servers. Amazon EBS can be accessed through the latest Amazon EC2 APIs, and is now available in public beta.

We hope you enjoy this new feature and we look forward to your feedback.

Sincerely,

The Amazon EC2 team

So this is indeed good news and removes the biggest con I mention about the EC2 platform!

Posted by Justin Leider
Filed in EC2 and RightScale, Web Technology
Tags: Amazon, Cloud Computing, EBS, ec2, Elastic Block Store, Persistent Storage, rightscale

4 Comments »

Running your own hardware Vs. EC2 and RightScale

August 20, 2008

A couple weeks ago I began working with EC2 and RightScale in preparation of our big IT infrastructure change over. Ill start by giving a brief overview of our hardware infrastructure. Currently we’re running the CitySquares’ website on our own hardware in a Somerville co-location not too far from our headquarters in Boston’s trendy South End neighborhood.

From the very beginning our contract IT guy set us up with a extremely robust and flexible IT infrastructure. It consists of a few machines running Xen Hypervisors with Gentoo as the main host OS. Running Gentoo allows us to be as efficient as possible by specifically optimizing and compiling only the things we need. While this is a good step, it is Xen that really makes the big difference. It allows us to trade around resources as we see fit, more memory here, more virtual CPUs there, all can be done on the fly. For a startup or any company with limited resources this is rather essential. You never know where you are going to need to allocate resources in the months to come.

While this is all well and good, we are still limited when it comes to scaling with increasing traffic or adding additional resource intensive features. We have a set amount of available hardware and adding more is an expensive upfront capital investment. Not only that but in order for us to really begin to take advantage of Xen and use it to its full potential we were presented with an expensive option, it required the purchase of a SAN and more servers. For those in the industry I don’t think I need to mention that these get expensive in a hurry. This would have been a huge upfront cost for us, one we didn’t want to budget for. The second option, which is the one we eventually went with was to drop our current hardware solution and make the plunge into cloud computing with Amazon’s EC2.

Here I am now. A couple of weeks into the switch with a lot of lessons learned. There are definitely pros and cons for each platform, either going with EC2 or rolling your own architecture. Before I get into the details I want to make clear that there are many factors involved in choosing a technology platform. I am only going to scratch the surface, touching upon the major pros and cons with respect to my own opinions with best interest for CitySquares in mind.

Let me begin by starting with the pros for running your own hardware:

The biggest pro is most definitely persistence across reboots. I can not stress the importance of this one. You really take for granted the ability to edit a file and expect it to be there the next time the machine is restarted.
- You only need to configure the software once. Once its running you don’t really care what you did to make it work. It just works, every time you reboot.
- UPDATE 8/21/08: Amazon releases persistent storage.
Complete and utter control over everything that is running. This extends from the OS to the amount of RAM, CPU specs, hard drive specs, NICs, etc. The ability to have a economy or performance server is all up to you.
Rather stable and unchanging architecture. Server host keys stay the same, the same number of servers are running today as there were yesterday and as there will be tomorrow.
Reboot times. For those times when something is just AFU you can hit the reset button and be back up and running in a few minutes.
You can physically touch it… Its not just in the cloud somewhere.

Some cons for running your own hardware:

Companies with limited resources usually end up with architectures that exhibit single points of failure.
- As an aside, you can be plagued by hardware failures at any time. This usually is accompanied by angry emails, texts and calls at 3am on Saturday morning.
Limited scalability options. For a rapidly expanding and growing website, the couple weeks it takes to order and install new hardware can be detrimental to your potential traffic and revenue stream.
Management of physical pieces of hardware. Its a royal pain to have to go to a co-location to upgrade or fix anything that might need maintenance. Not to mention the potential down time.
- Also, there are many hidden costs associated with IT maintenance.
Up front capital expenditures can be quite costly. This is especially true from a cash flow perspective.
Servers and other supporting hardware are rendered obsolete every few years requiring the purchase of new equipment.

These pros and cons for running your own hardware are pretty straight forward. Some people might mention managed hosting solutions which would mostly eliminate some of the cons related to server maintenance and hardware failures. However, this added service comes with an added price tag for the hosting. Whether it is right for you or your company is something to look into. We decided to skip this intermediary solution and go straight to the latest and greatest solution which is cloud computing. To be specific we sided with Amazon’s EC2 (Elastic Compute Cloud) using RightScale as our management tool.

Some of the pros for using EC2 in conjunction with the RightScale dashboard are as follows:

Near infinite resources (Server instances, Amazon’s S3 Storage, etc) available nearly instantaneously. No more Slashdot DoS attacks if everything is properly configured and set to introduce more servers automatically. (RightScale Benefit)
No upfront costs, everything is usage based. In the middle of the night if you are only utilizing one server thats all you pay for. Likewise, if during peak hours you’re running twenty servers you pay for those twenty servers. (Amazon Benefit, RightScale is a monthly service)
No hardware to think of. If fifty servers go down at Amazon we wont even know about it. No more angry calls at 3am. (Amazon Benefit)
Multiple availability zones. This allows us to run our master database in one zone which is completely separate from our slave database. So if there is an actual fire or power outage in one zone the others will theoretically be unaffected. The single points of failure mentioned before are a thing of the past and this is just one example. (Amazon Benefit)
Ability to clone whole deployments to create testing and development environments that exactly mirror the current production when you need them. (RightScale Benefit)
Security updates are taken care of for the most part. RightScale provides base server images which are customized upon boot with the latest software updates. (RightScale Benefit)
Monitoring and alerting tools are very good and highly customizable. (RightScale Benefit)

Some of the cons for using EC2 and RightScale:

No persistence after reboot. I can’t stress this one enough! All local changes will be wiped and you’ll start with a blank slate!
- All user contributed changes must be backed up to a persistent storage medium or they will be lost! We back up incrementally every 15 minutes with a full backup every night.
- UPDATE 8/21/08: Amazon releases persistent storage.
Writing scripts to configure everything upon boot is a time consuming and tedious process requiring a lot of trial and error.
Every reboot takes approximately 10-20 minutes depending on the number and complexity of packages installed on boot. Making the previous bullet point even that much more painful.
A few of the pre-configured scripts are written quite well. The one for MySQL is as good as they get. You upload a config file complete with special tags for easy on the fly regular expression customization. The Apache scripts on the other hand are about as bad as they get. Everything must be configured after the fact.
- With Apache however, you’ll be writing regular expressions to match other regular expressions. Needless to say is a royal pain and you usually end up with unreadable gibberish.

So there you have it, take it as you wish. For CitySquares, EC2 and RightScale were the best options. It allows us to scale nearly effortlessly once configured. It is also a much cheaper option up front where as owning your own hardware is generally cheaper in the long run. We did trade a lot of the pros of owning your own hardware to get the scalability and hardware abstraction of EC2. It was a tough decision for us to switch away from our current architecture but in the end it will most likely be the best decision we’ve made. The flexibility and scalability of the EC2 and RightScale platform are by far the biggest advantages to switching and in the end its what CitySquares needs.

Posted by Justin Leider
Filed in EC2 and RightScale, Web Architecture, Web Technology
Tags: Amazon, citysquares, Cloud Computing, ec2, Flexibility, Gentoo, IT Infrastructure, rightscale, s3, scalability, Scripting, Server Hardware, Servers, Site Architecture, Web Technology, Xen

14 Comments »

Justin Leider’s Web Think Tank

SOLR Performance Benchmarks – Single vs. Multi-core Index Shards

May 5, 2009

100x Increase in SOLR Performance and Throughput

April 27, 2009

PHP/KML Polyline Simplification with Douglas-Peucker

April 20, 2009

There and Back Again, an EC2 MySQL Cluster

January 26, 2009

The Limitations of Scaling with EC2

October 8, 2008

Running your own hardware Vs EC2 and RightScale — Part 2

September 16, 2008

Nuances of EC2 and RightScale

September 5, 2008

Pages

Top Posts

Bookmark and Share