HandlerSocket: The NoSQL MySQL & Ruby
The end of an architectural era, time for a complete rewrite? Is it really the case that by attempting to be a "one size fits all", the RDBMS "systems of the past" excel at nothing? The Cambrian explosion of alternative database engines certainly lends some credibility to that view. However, amidst all the hype, it is also easy to overlook the fundamentals: normalized or not, or with or without a "structured query language" (SQL), a B-Tree is still one of the best performing data structures when it comes to indexing data.
With that in mind, Yoshinori Matsunobu and a few of his collaborators at DeNA (one of the largest social game platform providers in Japan), decided to build the literal "NoSQL" directly into MySQL! The HandlerSocket plugin, which they have released to the public can be installed into any existing MySQL server to provide an optimized protocol for reading and writing data directly from the underlying storage engine (such as InnoDB) without any SQL overhead. And the results are stunning: faster than memcached, more flexible, and no cache coherency problems.
HandlerSocket: Under the Hood
The core insight behind HandlerSocket is that for in-memory workloads where data is accessed via an index, the overhead imposed by SQL parsing, locks and concurrency controls has nothing to do with the reading or writing of data from the underlying storage engine. In other words, if all you need is direct access to the index, then you can bypass the SQL layer altogether - that is exactly what HandlerSocket provides.
The plugin is a daemon which can be loaded into any MySQL server, and which opens additional ports on the server to accept direct reads and writes to the underlying storage engine. The protocol is incredibly simple, and most importantly, HandlerSocket runs inside of your existing MySQL server, accessing the same underlying data. This means that you have the full expressive power of SQL, the persistence and error recovery of the underlying engine, and an optimized read/write protocol for where you need it.
Benchmarks are a subjective sport, but Yoshi's tests show that direct access to InnoDB via the HandlerSocket protocol yields a significant improvement even when compared to a raw memcached connection! Perhaps it is memcached we should be optimizing, not running away from our RDBMS engines?
Hands on with HandlerSocket
To get started, follow the installation instructions provided in the HandlerSocket repo, or simply pickup one of the latest builds of the Percona Server - it now ships with HandlerSocket built in! Once you're up and running, you can tweak your settings for the plugin in your my.cnf
:
[mysqld]
plugin-load=handlersocket.so # or mysql> install plugin handlersocket soname 'handlersocket.so';
loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
loose_handlersocket_threads = 16
loose_handlersocket_threads_wr = 1
open_files_limit = 65535
If all is well, launch your mysql CLI, and do a show processlist
to see the open HandlerSocket connections. Now, if you are curious, you can telnet directly to the read or write port and issue some queries, or pick one of the existing clients (PHP, Java, Python, Node.js, Ruby) and do it from the comfort of your favorite language.
Ruby & HandlerSocket
HandlerSocket ships with a C++ library (libhsclient) for which there are several Ruby wrappers: ruby-handlersocket and handlersocket. If native extensions are not a problem, then gem install handlersocket
:
require 'handlersocket'
h = HandlerSocket.new(:host => '127.0.0.1', :port => '9998')
# open PRIMARY index on widgets.user table, assign it ID #1
h.open_index(1, 'widgets', 'user', 'PRIMARY', 'user_name,user_email,created')
# fetch record from index ID 1, where PRIMARY key is equal to 1
p h.execute_single(1, '=', [1])
# > [0, [["Ilya", "ilya@igvita.com", "2010-01-01 00:00:00"]]]
# open 'id_created' index on widgets.user table, assign it ID #2
p h.open_index(2, 'widgets', 'user', 'id_created', 'user_name,user_email,created')
# fetch record from index ID 2, where id >= 2, and date >= 2010-01-03
p h.execute_single(2, '>=', [2, '2010-01-03'], 10)
# > [0, [["Bob", "bob@example.com", "2010-01-03 00:00:00"]]]
Alternatively, if you are looking for a non-blocking version, you can also use em-handlersocket which does not require any native extensions. Non-blocking reactor, combined with the fact that HandlerSocket allows pipelined execution, makes a for a very fast API! A simple example of connecting to a MySQL server and doing a range scan on a composite InnoDB index:
EM.run do
c = EM::HandlerSocket.new
idx = {:id => 0, :db => 'widgets', :table => 'user', :index_name => 'id_created', :columns => 'user_name'}
d = c.open_index(idx)
d.callback do |s|
# Query index 0 for records where id >= 2, and created_at >= 2010-01-03
d = c.query(:id => 0, :op => '>=', :key => ['2', '2010-01-03'])
d.callback do |data|
p [:received, data]
end
end
end
Stop the revolution! Hold the rewrite!
If HandlerSocket can fetch data faster than memcached, then this is a game changer. Stop the revolution! Hold the rewrite! After all, caching data in another datastore leads to data duplication, cache coherency problems and additional operational complexity. Perhaps it is too early to simply throw away the billions of dollars we have invested into developing and optimizing the underlying RDBMS engines!