MongoDB: Agile + Scaling

As I wrote last week, I created Preditter as a project to use some of the technologies that we have invested in. I blogged on Monday about how amazingly easy it is to use the Twilio API. Up today: MongoDB (which has been created by the team at 10gen). Here too, I was blown away by the ease of getting going. You can literally be up and running with MongoDB in minutes (on my Mac laptop I just downloaded and on my Slice running CentOS I used yum).   But the thing that I was most pleased about is how truly effective the document model is in supporting agile development.

I remember how on one of the early releases, I wrote to Kristina about the PHP driver, asking why it was based on using arrays as opposed to PHP objects. Kristina rightly pointed out at the time that arrays are far more flexible and that others can construct object mappers on top of arrays (which is exactly what has happened with several active projects). But as it turns out – at least for me – developing with arrays, instead of full-blown objects is also a great way to get started on a project.

I wanted to make most of the code for Preditter be generic, as opposed to NFL specific, so that if I wanted to it would be easy to add other categories later (e.g., the upcoming elections).   In Preditter, predictions are about events which are stored in an events collection. The beauty of the document model and array based development is that the only fields that all my events need to have in common are things like category (for now: sports) and subcategory (for now: football) plus a tag array (for such things as “regular season”, “nfl”, etc). Everything else can vary across events with fields that make most sense for the specific category and subcategory. Instead of having to pre-plan for an object hierarchy and think through members and inheritance and methods and all that, I can just sling around arrays through most of the code and dispatch to specific handlers as needed (based on category and subcategory).

But it gets better. In the predictions collection instead of referring to the event by a foreign key (as you would in traditional relational design under normalization), MongoDB makes denormalization a cinch. Every prediction simply includes as an embedded document the event that it refers to.    The code for doing that can be entirely generic! It just stuffs an event into a prediction and doesn’t need to know a thing about what’s inside the event.   Similarly, when retrieving predictions – for instance all the predictions made by a specific user – the code for doing so can retrieve the associated (read: embedded) events without any knowledge about the type of event that the prediction is about.

But wait, it gets even better. The embedded documents are first class citizens when it comes to querying! For instance, finding all the predictions that are about a game involving a specific team or on a given date (or what have you) becomes a single query against the predictions collection that looks inside the embedded events (and you can have secondary indices to make those queries performant). So here the true power is revealed. Not only does MongoDB superbly support agile development, but it simultaneously allows for scaling. Should Preditter unexpectedly take off wildly, I can even shard the predicitions collection without touching a single line of code!