Parallelism is a Myth in Ruby
Update: post has been renamed to "Parallelism is a Myth in Ruby"... Why? Because concurrency is not parallelism - highly recommend this talk by Rob Pike. Concurrency is the composition of independently executing things (which is possible in Ruby), and parallelism is the simultaneous execution of multiple things (which is not possible in MRI, see below). One is about structure, and another is about execution.
Concurrency is a stepping stone towards enabling parallelism in our applications, and threading is, of course, one way to introduce execution parallelism. But, as it turns out, execution parallelism is not enabled by threading in Ruby (MRI). Instead, if you are looking for parallelism in your Ruby applications, then you should be looking at process parallelism instead. Why is that?
Ruby under the covers: Global Interpreter Lock
To understand what's going on, we need to take a closer look at the Ruby runtime. Whenever you launch a Ruby application, an instance of a Ruby interpreter is launched to parse your code, build an AST tree, and then execute the application you've requested - thankfully, all of this is transparent to the user. However, as part of this runtime, the interpreter also instantiates an instance of a Global Interpreter Lock (or more affectionately known as GIL), which is the culprit of our lack of concurrency:
Global Interpreter Lock is a mutual exclusion lock held by a programming language interpreter thread to avoid sharing code that is not thread-safe with other threads. There is always one GIL for one interpreter process.Usage of a Global Interpreter Lock in a language effectively limits concurrency of a single interpreter process with multiple threads -- there is no or very little increase in speed when running the process on a multiprocessor machine.
Deciphering the Global Interpreter Lock
To make this a little less abstract, let's first look at Ruby 1.8. First, a single OS thread is allocated for the Ruby interpreter, a GIL lock is instantiated, and Ruby threads ('Green Threads'), are spooled up by our program. As you may have guessed, there is no way for this Ruby process to take advantage of multiple cores: there is only one kernel thread available, hence only one Ruby thread can execute at a time.
Ruby 1.9 looks much more promising! Now we have many native threads attached to our Ruby interpreter, but now the GIL is the bottleneck. The interpreter guards itself against non thread-safe code (your code, and native extensions) by only allowing a single thread to execute at a time. End effect: Ruby MRI process, or any other language which has a Global Interpreter Lock (Python, for example, has a very similar threading model to Ruby 1.9) will never take advantage of multiple cores! If you have a dual core CPU, you'll have to run two separate processes.
JRuby is, in fact, the only Ruby implementation that will allow you to natively scale your Ruby code across multiple cores. By compiling Ruby to bytecode and executing it on the JVM, Ruby threads are mapped to OS threads without a GIL in between - that's at least one reason to look into JRuby.
Process parallelism
The implications of the GIL are surprising at first, but it turns out the solution to this problem is not all that complex: instead of thinking in threads, think how you could split the workload between different processes. Not only will you bypass an entire class of problems associated with concurrent programming (it's hard!), but you are also much more likely to end up with a horizontally scalable architecture for your application. Here are the steps: 1. Partition the work, or decompose your application 2. Add a communications / work queue (Starling, Beanstalkd, RabbitMQ) 3. Fork, or run multiple instances of you application
Not surprisingly, many of the Ruby applications have already adopted this strategy: a typical Rails deployments is powered by a cluster of app servers (Mongrel, Ebb, Thin), and alternative strategies like EventMachine, and Revactor (equivalents of Twisted in Python) are gaining ground as a simple way to defer and parallelize your network IO without introducing threads into your application.