Just as I was bout to start this post, I saw Marco’s Entry “On Database Joins” where he quotes a tweet by @codinghorror on joins being super expensive and explains how Tumblr avoids them. Tech folks at our portfolio companies have probably all heard me say that “joins are evil” because they will make it really difficult for your service or site to scale past a certain level. Of course, once you start to program without joins, you pretty quickly demote your database engine to perform object (or key-value pair) storage. And that begins to sound an awful lot like what Amazon’s SimpleDB and Google AppEngine’s Datastore does (and 10gen is about to do).
Yet to date very few of our portfolio companies make use of those technologies (the notable exception being AdaptiveBlue, with Alex Iskold explaining their usage of SimpleDB in some detail. One obvious reason is that these data stores are in their infancy. But there is also a sense that to date they are quite proprietary and would result in significant lock-in. Relational databases on the other hand have SQL and give the impression that one could switch from say Postgress to Oracle if one wanted to. The reality is that I have seen very few such switches take place. Partially this is due due to people starting with fairly standard SQL but eventually making use of proprietary aspects of a particluar database.
The adoption of cloud databases I believe will benefit greatly from a standard that can play a similar role. At DLD, I talked with Werner Vogels, who professed some surprise that there are so few of these cloud database offerings out there to date (the team at 10gen is working on changing that) and said he would like to see more competition to SimpleDB and is interested in seeing some kind of standard emerge. The current state of affairs is that we are far from this. When you google for “cloud database standard”, only the first entry is immediately relevant and it is a blog post arguing that Atom is the right standard for cloud databases.