I redesigned the schema for one of our MongoDb clusters today. In doing so, I managed to reduce request times by 1.5 orders of magnitude!
We’re now sustaining request times of roughly 30ms. The first mongo schema was about 400ms on average, with occasional bursts to over a second. 30ms is also much better than the original mysql based solution.
In the original mongo schema, I was just stuffing all data for the last 30 days into two big collections. This caused problems as the resulting indexes were huge. As it turned out, the active data set only actually needed the most recent hour’s worth of data.
So I moved the active set into a TTL collection set to expire documents older than an hour. This means the indexes are now absolutely tiny.
We still need data over an hour old, but exclusively for read-only use. I moved this into datestamped collections: each day’s data is now in its own collection. As we retrieve data in day chunks, this makes retrieval and deletion of old data extremely simple and performant.