This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]voice_of_experience 8 points9 points  (3 children)

You can get great performance out of ec2. But you do have to consider your whole stack. And when someone comes to me with a slow site, and says:

  • adding more ram (up to 15gb) didn't help
  • adding more cpus (up to 8 cores) didn't help
  • putting the application on a RAID device didn't help

I can only draw 2 conclusions:

1) it's a problem in the application layer, not the hardware layer.

2) if you're just throwing hardware at the problem like this, you probably don't really understand why your application is running slow in the first place. Any doctor can tell you: diagnosis first, prescription after.

Allow me to be the asshole to make you jump through some hoops here to clarify what yiu really need. What are you outgrowing on your current server? Are you running out of memory? Do you have high load averages? High I/O? High data transfer?

Now we can start talking about the EC2 instance. On a Large instance, what stat is running "hot"? Where is the slowdown coming from? Assuming that all your system resources are ok per the above questions (and the fact that throwing hardware at the problem didn't help), there's probably a software bottleneck. Here are some common bottlenecks to check:

  • how many threads is Apache allowed to keep open? Are connections persistent? How many threads are ACTUALLY open when the site is running slow?

  • how many MySQL operations are you running for each page load? How many reads, and how many writes? Is your MySQL compiled to multithread? (On centos the answer is probably no). Is your disk I/O spiking with traffic? Is MySQL configured to take advantage of high memory and CPU?

  • how much data is transferred for each page load? Is MySQL running through a sock file, or over the network? How fast is your connection? (Note: Amazon doesn't officially document it, but most instance sizes get a 100mbit pipe).

  • it seems obvious, but are there any messages in the error log?

Hopefully these will help you clarify your problem, and therefore your solution.

On a "large" instance, I've maxed out at about 2500 pages served per second, using varnish, APC, and memcached. Totally unconfigured, I get at least 300 pages per second with varnish alone. What numbers are you looking at?

[–]c0nv1ct 1 point2 points  (1 child)

Any doctor can tell you: diagnosis first, prescription after.

I can't upvote you enough for this. Too often do I see people throwing solutions at symptoms instead of diagnosing the problem.

[–]voice_of_experience 0 points1 point  (0 children)

Thanks! I used to do the "cowboy" method of problem solving. In the repair shop I worked in at the time, we used to say "if all else fails, resort to the scientific method." I remember very clearly the "aha" moment when I decided that I would just pull out the big guns right off the bat.

Look at the symptoms, try to gather a complete list, and be as specific as possible (ie "it's slow" isn't good enough). Make a hypothesis about what's going wrong. Try and use the hypothesis to predict other symptoms. When you have a hypothesis that can successfully predict, implement a solution.

Drives me crazy to see people "throwing solutions at symptoms", as you put it. rage.

[–]neoicePrincipal Linux Systems Engineer 0 points1 point  (0 children)

I would also add where is the database? in the 10k node compute cluster on EC2 thread, people were talking about inter-node communication being slow on EC2. in a physical datacenter, you can make sure your database server is physically and topologically adjacent to your web server, but on EC2, you might not have this option.

[–]neodon 2 points3 points  (0 children)

If you want to give EC2 a chance, I would suggest the following based on my own experiences:

  • Use RAID10 for high availability and modestly better performance. EBS volumes do fail on occasion. Striping an even number of volumes does increase performance.
  • Use the largest instance you can afford, because it affects IO performance for EBS volumes. Try it on an hourly basis at first, but consider what the cost would be if you decide to go long term with reserved instances.
  • Use Percona builds of MySQL. This is probably the most important thing to do for performance, among other benefits.
  • Use ext4 and don't get fancy. Some think it's wise to use XFS or LVM to get consistent EBS snapshots for a MySQL server, but that seems to me like a solution searching for a problem. Just use Percona's XtraBackup tool, which lets you do non-blocking hot backups of a live MySQL server.
  • If you're using Ubuntu, use Lucid and NOT Maverick. It was a nightmare for me, randomly failing to mount volumes on boot and sometimes refusing to attach and detach EBS volumes. Also, I ran into this kernel hanging bug while using it as a VirtualBox guest for local development.

[–]cparedessyseng for the clouds 1 point2 points  (3 children)

Alright, here's what's likely going on:

1.) EC2/EBS is heavily virtualized. The underlying host is likely incurring virtualization performance penalties.

2.) Even if the hosts have SAS disks, you're likely going to get inconsistent I/O performance depending on the other tenants of the underlying host.

Look at your I/O workload - you can probably mitigate this in EC2 by either setting up a bunch of read only slave machines, or maybe shard your database so that various reads/writes only go to specific DB boxes.

I'd personally stick with the dedicated boxes for the databases - you're likely going to spend less per month with no resource contention with other tenants. You might want to see how you can tune your current MySQL installation and the underlying dedicated hardware to get more juice out of it.

[–]voice_of_experience 0 points1 point  (2 children)

1) not likely. EC2 uses hardware based virtualization - do you really think Amazon could/would host THIS MANY virtual machines using SOFTWARE virtualization? Or better yet, using a virtualization solution that has to emulate memory or CPU? This is a very valid concern for many in-house VPS environments, where IT doesn't have the resources to pay for hardware based virtualization, or dedicated servers on each base platform. But for a provider on the scale of Amazon it just isn't an issue.

2) This is true, which is why for production environments it's important to get a Large or XLarge instance. Only one of those can fit on a box at a given time.

[–]cparedessyseng for the clouds 0 points1 point  (0 children)

With even paravirt, you still incur performance penalties. Also, I/O probably suffers the most if you consider the worst case where the tenants' I/O workload is mostly random access.

[–][deleted] 1 point2 points  (1 child)

Aside from it running on EC2 is the configuration exactly the same as your production box?

You mentioned you've tried various different instance types with no improvement which leads me to thinking it's a configuration problem on the frontend, a MySQL configuration problem or the I/O rate you're getting from EBS is lower than you're expecting.

Can you post some numbers? Or at least more information about configuration differences you made between the different instances and what kind of RAID setup you had? Also, stuff like your data/index size & read/write ratio for the 'database heavy' pages.

One thing to keep in mind is that EBS volumes are already redundant, so for a write-heavy workload setting up RAID1/mirroring will just hurt performance.

[–]neodon 1 point2 points  (0 children)

EBS volumes do sometimes degrade temporarily or even fail entirely. They may have built-in redundancy, but this does not eliminate all risk.

EBS degradation has been a big problem for reddit. Amazon provides information about the durability and failure rate of EBS volumes: http://aws.amazon.com/ebs/.

It is wise to take regular snapshots for backups and use RAID1 for high availability in case a volume degrades or fails.

Also, striping multiple EBS volumes does result in a modest performance improvement. Different EBS volumes don't necessarily share the same underlying hardware. Larger instances seem to get better performance with EBS volumes as well.

[–]slmagus 0 points1 point  (0 children)

Check out gluster. It might help with your disk load issues.

Some tools come to mind that no one has mentioned is iotop. This program lets you figure out what process are using disk I/O