Rails Performance Analysis

 

By Terry Heath

 

Introduction

One of my favorite aspects of development is performance work. The task, with the associated profiling and benchmark tools, lends itself well to scientific analysis.
Usability and appearance are always subjective, and have, at best, fuzzy guidelines. Performance measurements are much more precise.
I’m going to give an overview of how to approach performance analysis, and the tools associated with the different pieces that form an application. I haven’t gone into anything arcane, because that would take this article from its present form to a three pound paperback at Barnes & Noble.

Measurement

Numbers, not Feelings

Before I started reading about performance, but was tasked with optimizing something, I’d go with whether or not it “felt” fast. While this is a somewhat acceptable way to determine if something needs to be optimized, it’s not a good measure of optimization.
After putting hard work into some optimization, you’re going to want to see improvement. So much, so, that if left only to your own senses, odds are you’re going to see improvement. Even if you’ve made things worse. Because of this, it’s important to go by benchmarking and profiling tools, and not feelings.
Another reason it’s important to rely on tools and not feelings is that they allow you to hone in on what’s actually slow. There are a lot of things that go on in a web request, an example being you send a request to Apache, which forwards it to Mongrel, which spins up some Ruby action, which then pipes it back to your client. You might see something in your backend code and say, “I know that’s slow, I’m going to speed it up.” Unfortunately, without a baseline measurement, you (1) don’t know how much the improvement will help, and (2) you can’t be sure that it needs improvement.
Numbers justify everything. Without them, it’s hard to explain to others what exactly you’ve been doing for a week.

Statistics 101

I was lucky enough to take a statistics class in college. Lucky in that it was an easy A. Unfortunately, I don’t remember much else about it. I’m assuming you’re in about the same position.
Though I think it’s taught as an axiom or something about sample size, in casual conversation I’ve heard it referenced as the “law of small numbers.” Essentially, if you don’t have enough samples, you can’t be entirely sure what you’re measuring. You could be measuring the time of Ruby’s garbage collector, when you really want to see how long a regex match is taking. This wouldn’t only lead to inaccurate results, but it might misguide your optimization efforts. In short, running a test more times is better.
While taking statistical measurements, it’s important to reduce confounding factors. Anything that could interfere with the numbers you’re gathering can skew your data. If you’re running a benchmark locally, close down other applications. Again, it’s hard to know if you’re actually measuring Ruby’s regex matching speeds if you’ve got Lord of the Rings playing on iTunes and are playing Tower Defense on Firefox. Maybe placing that water tower just hogged some CPU time, making your Ruby slow down. The timer’s won’t know, and neither will you.
If you’re testing server throughput, be sure that you’re testing as close as possible to the machine. If you have a sibling server that’s 5 feet from it, that’s better, because you’re not measuring other router speeds or black hole internet spots.
Lastly, when presenting your measurements, calculate a standard deviation along with the mean. This is incredibly important. A standard deviation indicates how far measurements deviate from the mean. One standard deviation will cover almost 70% of the points, and a second standard deviation will cover 90%. Though there’s no built in Ruby standard deviation calculation, I’ve provided one below [0].
If you have a request that shows it’s only taking half a second on average, you can think, “this is great, our application is so fast!” But if you couple that with the related standard deviation, and it’s 12 seconds, you know some people are waiting a lot longer than half a second. This could reveal something like some backend code hanging or a race condition that just a mean wouldn’t provide.

Rails

Important of note is that all three areas discussed (backend, frontend, server config) can directly and significantly affect performance. Luckily, both Rails and the front end can be diagnosed and profiled individually, so we don’t have to play dominoes with our tweaks.
Server configuration and tweaking, for example, the number of mongrels to run on a server, can’t be done uniquely. As back end processing can increase both the memory consumption for a mongrel and the time for a mongrel to unblock, the Rails side of things needs to be tweaked first.

Where to look?

The first task is to figure out what needs optimizing. A good place to look is the production logs, since they’ll show where people are spending time and how much time the server is taking to do it.
There’s a nice looking tool called PL Analyzer [1], part of the Rails Analyzer suite, but it doesn’t work out of the box on OSX, which I work on. It also provides a separate set of data, so I go with one I wrote a while ago, called logpwnr [2].
Logpwnr will provide thorough measurements of all actions being used in your app. If you have several production servers, you’ll need to concatenate the logs before parsing them with logpwnr. Run it like this:
./logpwnr.rb production.log > output.csv
This will provide a CSV you can import to any spreadsheet. Here’s a sample of the output:
Here we can start looking at what’s used the most, and then figure out which of those actions is a good place to start optimizing based on total request time (not provided on the screenshot, but it’s the sum of the means of the action, render, and db times). Try to keep the numbers in the context of usage. There might be a horrendously slow action in your app, but it’s only been used 5 times in as many months. Optimizing rarely used actions is not worth your time until your popular ones are sped up.

Further Down the Rabbit Hole

Once we find an action that looks appropriate to optimize, we can figure out how to approach the problem with RubyProf. RubyProf is partly maintained by Charlie Savage [3], who put together one of the best quick guides for speeding up Rails [4]. He also provides a fantastic installation guide for RubyProf [5].
One caveat is that if you’re using Rails <= 1.1.6, you’ll have to either alias the methods yourself or just put in alias_method_chain in the plugin. I went with the latter, and just put this snippet from the Rails codebase [6] at the top of the plugin.
Again, it’s important to measure all improvements you make so you can justify the refactorings and find where to pursue optimization efforts, but if you take nothing else from this section, Charlie provides these guidelines for backend performance:
Don't use ActiveRecord#attributes - ActiveRecord clones all attributes associated with an object whenever this is accessed
Get your ActiveRecord :includes right - over-aggressive includes can cause unnecessary database joins and marshaling, while under-using :includes can lead to extra database queries
Don't check template timestamps (cache_template_loading = true)
Don't use url_for - looking through the routes table every time is slow
Don't let Rails parse timestamps - Ruby’s Date (and Rails’ monkeypatches on top of it) are painfully slow
Don't symbolize keys (local_assigns_support_string_keys = false)
And a few of my own:
Always do if-checks on logging statements, e.g.: logger.debug (in Controller#new) if logger.debug? — this is important to prevent unnecessary and sometimes expensive to_s calls, and also short circuits extra method calls in a production environment; don’t comment out logger statements, as they’re useful for, you know, debugging
Avoid useless Rails helpers (HTML’s <img> tag is faster and just as easy as Rails’s image_tag)
Avoid unneeded object copying (like with gsub) when possible, using destructive alternatives (gsub!)

Frontend Optimization

A fantastic analytical tool for load times for a web page is YSlow [7], a tool put out by Yahoo as an addon to Firebug [8]. YSlow scores your page across several criteria and makes recommendations for how to speed up your site.
A super fast site on the backend with a terrible YSlow score will generally seem sluggish to users.
One of the easiest things to do is shrink your javascript and CSS. If you use a tool like asset packager [9], you can have your production javascript and CSS files concatenated and minified, requiring only 2 extra downloads for your users. This is a big win, because it’s both shrinking bandwidth requirements and the number of downloads necessary to view a page. Most browsers come prepared to only download 2 files from a host at a time, so fewer downloads is almost always helpful.
YSlow essentially looks at things that block page downloads and ways to speed up download times. To that end, a few quick Apache tweaks can go a long way.
Apache Tweakage
Most mainstream browsers accept gzip’d content. So zip that up, using Apache’s mod_deflate [10]. Also, static assets should have a really far ahead expires header. You can just put
    ExpiresDefault “access plus 10 years”
In your Apache config, and then all static assets will be given with that header. At work, we found a problem with the ExpiresDefault clause and IE7, where even requests proxied through to Mongrel were being cached, so we went more explicit, replacing the ExpiresDefault clause with:
    ExpiresByType image/gif "access 10 years"
    ExpiresByType image/jpg "access 10 years"
    ExpiresByType image/png "access 10 years"
    ExpiresByType text/css "access 10 years"
    ExpiresByType text/javascript "access 10 years"
One “gotcha” with this approach is that caching can cause problems if you’re updating your images or CSS or javascripts. Asset packager solves this for your CSS and javascripts, and you can follow the same solution with your images: change the filename whenever the file changes. An svn or git revision number of some sort at the end of the file works great.
Lastly, ETag configuration can be tweaked on Apache. Specifically, it can be turned off. This is especially important once your site gets big enough to span across multiple asset servers. The default ETag hashing mechanism is machine-dependent as opposed to strictly data-dependent, so assets across different servers will have different ETags. This equates to both unnecessary cache invalidation and unnecessary server processing to create the ETag. To turn it off in Apache, just put
    FileETag none
in your httpd.conf.
On the Rails end, however, ETags are a lot more useful. Rails has a built in ETag mechanism that is safe to consider consistent across machines, and in 2.2 the code’s been greatly simplified. You can specify expiry conditions and make an entire action conditional like this:
if stale?(:last_modified => @user.updated_at, :etag => @user)
    <code>
end
And <code> won’t be executed unless the modified time or the ETag indicates it needs to be.
After you’ve made these changes, work through the YSlow rubric and see what you can improve. YSlow provides excellent links that explain both the problems with your page and the best way to fix them [11].

Server/HTTP Tweaks

It doesn’t seem that Phusion Passenger has this same Mongrel tweaking problem, but if you need to proxy to multiple Apaches, or just need to see how your server responds under heavy load, this section will be helpful.
I’m not sure how most people set up multiple boxes of Mongrels, but it was recently discovered on one of our applications that we had it set up poorly. We had something like:
Ignore the only 2 (it should be 6) blue arrows coming from the Apache boxes; that’s my own laziness. The issue here was that we had one Apache instance proxying to 4 other Apache instances, which then turned around and proxied to mongrels or served up static content.
httperf [12] analysis (coming in the next few paragraphs!) showed that, for small requests, the difference was negligible, but as requests per second started to stack, proxying needlessly to more Apaches became a bottleneck. Proxying directly to the mongrels from the load balancing Apache box shows about a 25% performance improvement under heavy load (500req/s for 10sec).
As a quick refresher, Mongrel ran in a synchronized thread until Rails 2.2. This means that, for Rails, a Mongrel instance can only handle 1 request at a time, and when it’s processing that request, it’s blocking all other requests coming to it. This makes for an obvious scaling solution (to a point): more Mongrels!
But, before even worrying about the number of Mongrels right for your machine, you should be sure you’re not using Mongrel for things it’s not made to do. Mongrel isn’t nearly as good at serving up static files as Apache is, so be sure that any file in your public directory that’s requested gets served right back by Apache, precluding any Mongrel intervention. Put this in your vhost config (I’m assuming the rest of your rewrite rules are already in place):
<Directory "/your/apps/public/directory">
    Options FollowSymLinks
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

# Rewrite index to check for static
RewriteRule ^/$ /index.html [QSA]

# Rewrite to check for Rails cached page
RewriteRule ^([^.]+)$ $1.html [QSA]

RewriteRule ".*(images|stylesheets|javascripts)/?(.*)" "$0" [L]
And then after an Apache reload, your assets will be served up by Apache. Which means it’s time to tweak our Mongrel instances.

Enter httperf

Now, while most of these things should be done during off hours or with caution, this last test seems like it can be done during the day. That would be wrong. If you happen to slam your server well enough, you can bring down everything (I’m speaking from experience when I naively httperf-slammed a production box a few years back). Do this test during low-usage times.
This is essentially a recap of Zed Shaw’s (of Mongrel, and then “Rails is a Ghetto” fame) incredibly helpful mailing list post [13].
First, find a machine that’s close (both on the network and in proximity) to the server you want to test, but that is not the same machine (testing loopback doesn’t help so much with proxies and whatnot).
Next up, you’ll want to test a static asset on that machine. The good news is, if you’ve already gone through this guide, all of your static Rails assets are hosted by Apache. This gives you a best-case baseline against which you can measure your Mongrels.
Start out running a command like this:
httperf --hog --server=<your server> --uri=<your resource, like /happy.jpg> --num-conns 100
You’ll get some output, and what you want to pay attention to first is test-duration. Zed recommends 10 seconds, which works well for providing an ample test case, and against a static asset, lots of requests. On a run-of-the-mill production server we have, I ended up with num-conns at 1200.
After you’ve found your magic 10 second number, try running 1 request (--num-conns 1) against an action (--uri <action>) on your Rails app that’s a decent measure of how your application runs. You don’t want to hit the slowest action, and just hitting static assets isn’t going to help anyone. Be sure to find a page that doesn’t need logins, as that’ll just provide useless redirects.
If your 1 request went really slow, then you probably have your Mongrels set up incorrectly, or something else is screwed up. It should be *at least* as fast as things are in development locally, and probably a lot faster. If your Rails single request is faster than a single request of a static asset, then that asset probably isn’t static – Apache is faster than Mongrel. That’s a truism.
After the single request looks good, try running with --num-conns equal to whatever number you found works for 10 seconds for your static asset, and set --rate to num-conns/10. This, ideally, provides you with a 10 second test, though in practice it’s usually longer. Next, run it again. It’s important that your caches get populated and all the Mongrels are brought up to speed before doing a performance test. Never use numbers from a first-run test.
Now, try adding a Mongrel instance, restart things, and run the test again. If you saw an improvement in the overall test time, you’re making progress.
There’s an ideal number for Mongrels on a processor, and you’ll find it pretty easily. Increasing Mongrels by 1. As soon as you reach the tipping point, you’ll see a sharp decrease in performance. I’ve seen it decrease by as much as 25%, just from adding one Mongrel. When you reach that number, reduce 1 Mongrel, and you have the right number for your current app and your current server setup.
If you go through the rest of the links from here, you should see some pretty noticeable (both quantitative and qualitative) performance gains.


Discuss this article

Published in Issue #1: The Beginning

Back