Distributed Ruby Workers on EC2
Whether you're writing a Rails application, or a straight Ruby library, every once in a while you will have to run a long computational task. In such cases, the relatively short HTTP cycle, and usability constraints create a number of problems for the developer. However, as usual, Ruby offers us a solution: dRuby (Distributed Ruby). Simply put, DRb allows you to interact with remote objects via TCP as if they were located right on your system. Hence, to avoid locking our server, we will simply get another computer to perform the time-consuming task for us. Of course, now you're saying: where am I going to get another computer? Well, how about Amazon's EC2:
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
But enough talking, let's get down to the code.
BackgrounDrb and getting started on EC2
Instead of building our DRb solution from scratch, we will make use of Ezra's BackrounDRb plugin. For a great explanation of it, by the author himself, head to: Introduction to BackgrounDRb. To speed things up, I will assume that you have your EC2 server running, and you are sitting in front of the console (ssh). If you need help with these steps, check Amazon's Getting Started guide.
Setting up DRb on EC2
Depending on the image you booted, your server's hostname may or may not be set to your public handle, and we need to fix this for a reason you will see shortly. A quick bash script should do the task:
# Retrieve and store our ip / hostname
wget -q -O /tmp/public-ip http://169.254.169.254/latest/meta-data/public-ipv4
wget -q -O /tmp/public-hostname http://169.254.169.254/latest/meta-data/public-hostname
hostname -F /tmp/public-hostname
echo $(hostname) > /etc/hostname
You can also manually execute these commands in your console. You don't really need the public IP, but I kept it for debugging purposes. After performing these steps, executing 'hostname' on your command line should return your full ec2.xxxx.com handle. Now we're ready to install BackgrounDRb:
svn checkout http://svn.devjavu.com/backgroundrb/trunk backgroundrbmkdir backgroundrb/workersNext, we will create a dummy echo worker which will simply append a string to our request, and store it as a final result. With a little imagination, you can extend this example to convert video files, create PDF's, and the list goes on!
class EchoWorker < BackgrounDRb::Worker::Base
def do_work(string)
logger.info("\tTest query: #{string}")
results[:status] = "Echo back: #{string}"
self.delete
end
end
EchoWorker.register
Save the worker in your 'workers' directory and we are almost done on the server side. To simplify the command line, we will also create a configuration file for our DRb server:
---
:port: 2000
:protocol: druby
:worker_dir: workers
:pool_size: 10
We will run our server on port 2000, via druby (TCP) protocol, and allow at most 10 workers. Now we're ready to launch the server. However, there is a small caveat you need to be aware of: internally all EC2 servers are addressed via local NAT'ed IP addresses. The servers are unaware of their public IP's, and hence won't bind to them, but specifying their public hostname will do the trick:
script/backgroundrb start -- -c server.conf -h $(hostname)
Voila, you have turned an EC2 box into a DRb server. Having said that, do check your log directory for any error messages!
Connecting with a client
Connecting to our server is much simpler, in fact we don't even need any additional libraries. DRb is bundled with Ruby by default, hence our client is a single file:
require 'rubygems'
require 'drb'
remote_worker = DRbObject.new(nil, 'druby://ec2-xx-xx-xx-xx.z-1.compute-1.amazonaws.com:2000')
key = remote_worker.new_worker(:class => :echo_worker, :args => 'Hey EC2!')
puts "Job key: #{key}\n"
# Circumventing a bug: http://backgroundrb.devjavu.com/ticket/50
result = remote_worker.worker(:backgroundrb_results).get_result(key, :status)
puts "\t Reply :: #{result}\n"
puts "Done"
Our standalone client will create a remote process, store a temporary key, and then ask the server for the result - simple as that. Rails developers have it even easier: install the same plugin in your application and follow the configuration instructions in BackgrounDRb's RDoc. Once it is configured, you will have access to a 'MiddleMan' object in your application to both create and retrieve remote results! Executing our standalone client produces:
Job key: 21d52258feec520d3f911b323e1e8b20 Reply :: Echo back: Hey EC2!Done
Voila, now you can advantage of on-demand, scalable EC2 infrastructure in your Ruby/Rails application!