Scaling ActiveRecord with MySQLPlus
Late last week Rails core team announced the release of Rails 2.2RC1, which amongst other things slips in 'thread safety' and an implementation of a connection pool for our beloved ActiveRecord. Pratik Naik and Charles Nutter have both covered the implications of these contributions, but the executive summary is as follows: single ruby process (mongrel), plus many MySQL connections (ActiveRecord connection pool), means concurrent execution within that Mongrel process. This also means memory conservation, better throughput, and a much simplified deployment model. Sounds like something worth thinking about!
ActiveRecord Under The Hood
The ConnectionPool class introduced in Rails 2.2RC1 is effectively a simple wrapper for guaranteeing thread-safe access to a predefined set of connections (in our case, MySQL sessions). Let's take it for a spin outside of Rails:
require "rubygems"
require "active_record"
ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:username => "root",
:database => "database",
:pool => 5
)
threads = []
10.times do |n|
threads << Thread.new {
ActiveRecord::Base.connection_pool.with_connection do |conn|
res = conn.execute("select sleep(1)")
end
}
end
# block and wait for all threads to finish
threads.each { |t| t.join }
# [root@localhost tm]# time ruby activerecord-pool-orig.rb
#
# real 0m10.644s
# user 0m0.405s
# sys 0m0.211s
Hold on, what happened? We've defined a pool with five MySQL connections, spooled up ten Ruby threads, and simulated some long running queries (one second each). How come it took 10 seconds to finish? It looks like it's executing our commands in serial!
To those of you familiar with the underlying gems this is hardly surprising. The problem is that the core MySQL gem uses blocking IO and every call to the MySQL server results in a completely locked up Ruby process - hence, no parallelism is possible. So much for our ConnectionPool?
Asynchronous MySQL with MySQLPlus
Thankfully, we have an alternative: MySQLPlus. Coming out of the Neverblock project, it attempts to solve exactly this problem by providing an 'asynchronous interface and threaded access support'. First, let's get it installed on our system:
$ git clone git://github.com/oldmoe/mysqlplus.git
$ cd mysqlpus
$ gem build mysqlplus.gemspec
$ gem install mysqlplus-0.1.0.gem
Now, let's see if we can patch ActiveRecord to behave as we want:
require "rubygems"
require "active_record"
# Connection pool logic works, but the underlying driver
# is still blocking. However, aliasing the query method
# to use mysqlplus produces expected results!
require 'mysqlplus'
class Mysql; alias :query :async_query; end
ActiveRecord::Base.establish_connection(
:adapter => "mysql",
:username => "root",
:database => "database",
:pool => 5
)
threads = []
10.times do |n|
threads << Thread.new {
ActiveRecord::Base.connection_pool.with_connection do |conn|
res = conn.execute("select sleep(1)")
end
}
end
# block and wait for all threads to finish
threads.each { |t| t.join }
#[root@localhost tm]# time ruby activerecord-pool.rb
#
# real 0m2.663s
# user 0m0.413s
# sys 0m0.223s
Much better! Instead of processing in serial order we were able to make use of our connection pool and complete the entire test in roughly two seconds. To achieve this, we've had to redefine the primary 'query' method in the MySQL gem to use the asynchronous interface provided by MySQLPlus - arguably not the cleanest way to handle the case. Having said that, MySQLPlus is a stable gem, and one you should watch as there are a number of interesting performance patches in the works.
Does this mean it work with Rails?
Almost. Sneaking MySQLPlus under the covers of ActiveRecord gets us pretty far, but Rails is not yet ready for production use with this pattern. First, Mongrel needs to be patched, and then the real bug hunt begins:
# simple controller to simulate a long running request
class SamplesController < ApplicationController
def slow
render :text => Sample.find_by_sql("select sleep(5)").to_s
end
def slow_pool
ActiveRecord::Base.connection_pool.with_connection do |conn|
res = conn.execute("select sleep(5)")
end
render :text => 'done'
end
end
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# append to config/environments/production.rb
config.threadsafe!
require 'mysqlplus'
class Mysql; alias :query :async_query; end
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# apache bench test, which issues two concurrent requests
#[root@localhost threadsafe]# ab -c 2 -n 2 localhost:3000/samples/slow
#
# Server Software: Mongrel
# Server Hostname: localhost
# Server Port: 3000
#
# Document Path: /samples/slow
# Concurrency Level: 2
# Time taken for tests: 5.23008 seconds
# Complete requests: 2
As you can see, with a few simple changes our single Mongrel process is now capable of serving several parallel requests. However, more rigorous testing is still showing hung up threads and broken processing patterns (it is RC1 for a reason), but the bottom line is, we're making great progress!