Pseudo Reverse Indexes in MySQL
When the database grows at a rate of several hundred thousand records a day (AideRSS) you inevitably run into interesting 'features' of the underlying engine. To my own surprise, reverse indexes became such a 'feature' much faster than I anticipated. As I recently discovered, MySQL currently only supports storage of index values in ascending order:
An index_col_name specification can end with ASC or DESC. These keywords are allowed for future extensions for specifying ascending or descending index value storage. Currently, they are parsed but ignored; index values are always stored in ascending order.
Of course, now you're thinking: who cares, just add an order by clause, surely the SQL server will be smart enough to read the index backwards. And in fact, you're mostly right. There is a hit on performance, but it's not a show-stopper for most datasets. The caveat: most (read moderately sized) datasets.
Degrading performance & filesort woes
It took all of three weeks for AideRSS to hit the 3 million plus indexed blog posts, and in the process I could feel the site getting more sluggish: the descending order by clause was killing us. In the worst case, merging a union of several queries meant the performance hit of a filesort operation! Queries that took 0.001 seconds with 100,000 posts in the system now took over 4 seconds each.
The 2035 version of the Y2K bug
Following Peter Zaitsev's advice on faking a reverse index, I decided to sidestep our problem by creating a separate reverse timestamp for the publication time of an indexed blog post. The trick is, since all indexes are stored in ascending order, instead of storing the publication date, you need to store a 'countdown' value from some date in the future. A few SQL queries will do the trick:
/* Add a new column for the reverse index */
ALTER TABLE posts ADD pubdate_reverse INT(9) AFTER pubdate;
/* Convert all pubdate values into a countdown to year 2035 (arbitrary decision) */
UPDATE posts set pubdate_reverse = timestampdiff(SECOND, pubdate,'2035-01-01');
/* Create a new index on reverse timestamp */
ALTER TABLE posts ADD INDEX index_pubdate_reverse (feed_id, pubdate_reverse)
/* Lets see what we get...
---------------------------------- */
/* Will return posts in ascending order by default (force old index) */
SELECT * FROM posts use (index_pubdate) where feed_id = 5;
/* Will return posts in descending order by default (force new index) */
SELECT * FROM posts use (index_pubdate_reverse) where feed_id = 5;
I can hear you cry: Y2K all over again! I made an arbitrary decision and picked 2035 as my countdown date (caveat: almost arbitrary, Time.parse in Ruby will report out of bounds for anything over year 2038, and I didn't want to resort to DateTime). Having said that, hopefully by the year 2035 MySQL folks will add support for reverse indexes. I'm keeping my fingers crossed.
In all, for a little bit of work, and some extra future headaches in terms of index consistency, we get a major improvement in speed. I'm happy to say that AideRSS is running much smoother following this update last night.