Creating a Production Ruby on Rails: nginx

I was talking to someone about a new rails project, and we asked about which server would be the best one to work on. Of course, me being a person who loves apt-get and hates RPM, I said one word.

Debian

But then I had to point him to a place to go to make things easier. The problem is that deploying a Rails app is rarely easy, especially one that actually is meant to not die. Not only that, but I don’t always agree with all the tutorials. So I’m going to write a series of blog posts about how I’m going to do it. I’d recommend reading this post when it comes to installing ruby, rails and gems on Debian/Ubuntu but forget any of that Capistrano/Apache stuff that they talk about. I recommend vlad an nginx for setting up and deploying rails apps. We’re going to talk about nginx in thsi post.

So, the first thing that is done when setting up a production server is to choose the webserver. Now, conventional wisdom says that when you’re running Linux, you will most likely use Apache. Conventional Wisdom is very wrong, since Apache is a giant 800lb Gorilla of a webserver that has more features than you’ll ever possibly need. It’s great if you want to load things like mod_php or mod_python (which you would do with django, but that’s the topic for another post), but it sucks if you want to use it for Ruby, since we’re going to be forwarding everything to the mongrels anyway.

So, what do we use? We’re going to use the Big Red Webserver from Russia, nginx. nginx is a nice http/reverse proxy server, with small, human readable files. The first thing that we’re going to do is install it on Debian. Sudo as root and do this:


apt-get install nginx

See, isn’t apt-get the coolest thing ever! Beats the crap out of yum! Anyway, what this just did was installed nginx, so in /etc/nginx, you are now going to have to delete your stock nginx file and create a new one. The first thing that you do is specify the user. It’s best to create a user for this such as www-data.


user www-data;
worker_processes 1;
error_log /var/log/nginx/error.log debug;

Note, we also set the log files. Now, we have to set some basic settings, such as the mime-type includes, the connections that we will accept, and gzipping your data. Simple, commonsense stuff. This begins the http configuration block:


http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
sendfile on;
#tcp_nopush on;
keepalive_timeout 65;
tcp_nodelay on;
gzip on;
gzip_min_length 1100;
gzip_buffers 4 8k;
gzip_types text/plain;

OK, so far so good. Now, let’s specify some mongrel clusters. Depending on your app, you may want more or less clusters to balance the load. I’d ideally say at least 2 per processor, but sometimes you may want to run less of these for some weird reason. So, here’s what I have setup for a dual-processor machine.


upstream mongrel {
server 127.0.0.1:3000;
server 127.0.0.1:3001;
server 127.0.0.1:3002;
server 127.0.0.1:3003;
}

We’re going to show how to setup this in mongrel later. This is what we have currently. Now, we have to specify the server.


server {
listen 80;
server_name www.dogsridingrails.com;
root /var/www/dogonrails/current/public;
index index.html index.htm;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_redirect false;
if (-f $request_filename/index.html) {
rewrite (.*) $1/index.html break;
}
if (-f $request_filename.html) {
rewrite (.*) $1.html break;
}
if (!-f $request_filename) {
proxy_pass http://mongrel;
break;
}
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}

Not much to see here. We’re using nginx as a proxy to the mongrel servers. We point to public just like how we would in any rails application that’s going to production, and we specify what we do in the case of a filename request. In the case that we request index, and index exists, we show the index.html page. Otherwise, we pass it all to mongrel. Then we use a closing brace to finish the scope.


}

Now, that was MUCH simpler than the beasts of Apache logs that you’d have to wade through to do the same thing. It’s interesting to note that nginx is a lightweight proxying server, and is actually designed to do this, as opposed to Apache which is more general purpose, and is meant to load web apps using shared libraries which is always much faster than doing something like using mongrel.

I’m not saying that nginx is the right tool for every job, in fact, I would use think seriously about using Apache for a Python/Django project, but that’s the topic of another post entirely. Stay tuned for my next post about Vlad the Deployer!

S3

In the life of a web application, there comes a point where that shared hosting account just isn’t good enough (and you found out because your provider kicked you off), or your server just isn’t able to pull the queries from the database fast enough. Then one day, you finally get the filesystem error EMLINK, which you have a VERY hard time googling.

This is simple, you just created the maximum number of subdirectories that you can have in a directory. This is suprisingly not a common issue with file_column, acts_as_attachhment or attachment_fu, although I’m shocked as why it’s not. So, what do you do when you’re faced with scalability issues, and you’re image handling plugin is broken!

THROW IT ALL AWAY!

That’s what I had to do. Recently we worked on a site and we decided that because it was getting too hammered, that we would put the images on S3. Then we found the ultimate weakness of S3, which is that it’s not able to easily handle batch processing. We used the AWS:S3 library for most of the movement of the files, but we found that if we made a mistake, it would cost us hours to get these back.

Eventually, the day was saved with jetS3t, and Cockpit. Using jetS3t, we were finally able to actually get through all the S3 issues, and it saved the day at the end. (Actually, Dave saved the day at the end, my computer kept running out of memory). But we managed to get S3 support into it, and all we had to do was sacrifice File Column and replace it with this:


def user_image=( blob )
# establish S3 connection
AWS::S3::Base.establish_connection!(:access_key_id => AWS_ACCESS_KEY_ID, :secret_access_key => AWS_SECRET_ACCESS_KEY)
datestamp = Time.now.strftime('%d%m%Y')
identifier = UUID.random_create.to_s
object_path = "images/" + datestamp + '/' + identifier + '/'
object_key = object_path + blob.original_filename
self.image = blob.original_filename
self.image_dir = 'http://s3.amazonaws.com/bucket/images/' + datestamp + '/' + identifier + '/'
image_data = blob.read

#Send the file to S3
AWS::S3::S3Object.store(object_key, image_data , 'bucket', :access => :public_read)

# resize to thumnail here
img = Magick::Image.from_blob( image_data ).first
thumbnail = img.resize_to_fit! 96, 96

# Set the thumbnail directory path
thumb_key = object_path + 'thumb/' + self.image

AWS::S3::S3Object.store(thumb_key, thumbnail.to_blob , 'bucket', :access => :public_read)
end

However, if you have to do S3, I would highly recommend using a long key so that you can sort your re.sults better based on this key! However, the biggest gotcha I found when adding S3 integration to my rails app was including AWS/S3. If you include and require it, it will break your routing, this is something that can cause hours of headaches, especially if you are doing something else. At the end, we learned that S3 is a misnomer. For a large number of files, it’s far from simple.