MySQL Conf - Memcached Internals
Brian Aker and Alan Kasindorf gave a great talk on Memcached - every developer's favorite caching layer - at MySQL User Conference earlier last week. The slides are available online, and they're well worth a bookmark. Brian and Alan promised to continue updating the deck as they hone out their presentation at OSCON and other events. In fact, their next iteration will include the Nginx integration trick, and new and improved section on consistent hashing. In other words, check back frequently.
Memcached Best Practices
At its core, Memcached is a high-performance, distributed caching system. It is application neutral, and is currently used on many large scale web sites such as Facebook (2TB of cache, circa Q1 2008), LiveJournal, Mixi, Hi5, etc. However, it is also an extremely simple piece of software: all of the logic is client-side, there is no security model, failover, backup mechanisms, or persistence (albeit the last one is in the roadmap). But that hasn't stopped the developers from deploying it in all kinds of environments, and here are a few best practices suggested by Brian:
- Don't think row-level (database) caching, think complex objects
- Don't run memcached on your database server, give your database all the memory it can get
- Don't obsess about TCP latency - localhost TCP/IP is optimized down to an in-memory copy
- Think multi-get - run things in parallel whenever you can
- Not all memcached client libraries are made equal, do some research on yours - hint, use Brians.
Slab Allocator - Heart of Memcached
The heart of Memcached is in its memory slab allocator. A little daunting at first sight, it is actually a very elegant solution once you understand the motivation and the tradeoffs of its architecture:
- Size of maximum allocated memory to Memcached is only limited by the architecture (32/64-bit)
- The key size is limited to 250 bytes, the data (value) size is limited to 1mb
- On bootup, Memcached grabs the memory and holds on to it - wasted memory at cost of performance
- Allocated memory is split into variable sized buckets by the slab allocator
- Default slab allocator will create 32-39 buckets for objects up to 1mb in size
- You can customize the page size at compile time, and slab sizes on startup
- Each stored objects gets stored in a closest-size bucket - yes, memory is wasted
- Fragmentation can be a problem - either customize slab sizes, or evict/flush your cache every so often
- If there is unused memory in a different slab class, that memory will be re-allocated, if required
- To guarantee no-paging, disable swap on your OS - practically, just keep it tiny, as to avoid disasters
Memory Management
The memory is managed via a Least Recently Used (LRU) algorithm:
- Each slab class has its own LRU - evict target depends on the size of the object
- Expiration timestamps are checked once every second - minimum lifespan is 1 second
- Objects marked for deletion are handled asynchronously - checked and evicted once every 5 seconds
- Inconsistency between the two timers listed above can result in sub-optimal eviction policy
- LRU can be completely disabled - do it at your own risk
Best Practices: Invalidations and Expiry
Memcached does not provide any mechanisms for deleting a set of associated keys (object, name, etc.). For good or for worse, you could implement this functionality yourself with the help of prepend and append commands, however, be careful with the 1mb limit! A much cleaner way to handle this situation is to forgo the invalidation process all together:
- Instead of invalidating your data, expire it whenever you can - memcached will do all the work
- Generate smart keys - ex. on update, increment a version number, which will become part of the key
- For bonus points, store the version number in memcached - call it generation
- The latter will be added to Memcached soon - as soon as Brian gets around to it
Roadmap and Future Ahead
Memcached is a force to be reckoned with already, and the roadmap is only going to solidify this position. Amongst the big objectives:
- Binary protocol is in the works
- Support for generations - as described in paragraph above
- Multi-engine support: char based, durable (persistence!), queue
- Facebook has overhauled the core to be highly-threaded, and is expected to contribute the changes
Kudos to Brian and Alan for a great presentation! Don't forget to bookmark the slides and check back every so often, as they will be updated with time to describe best practices, common mistakes, and the roadmap ahead.