Last night I managed to fix my ReadyNAS NV+ with a replacement PSU purchased from PC Warehouse (bypassing customer support but shelling out a bit over $100). I updated the forums with the relevant links and am happy to once again have a running RAID array for backup at home.
More importantly though, this incident made it amply clear that in addition to my local backup I absolutely need a cloud backup as well. Would love to get recommendations on what to use.
Ideally, I would like a single provider that lets me address both use cases from my ReadyNAS: media storage (all my photos, videos and music are on there) and incremental backup using TimeMachine. Maybe trying to get both from a single provider is asking too much (and I am not even sure TimeMachine can work with truly remote volumes). So also open for separate recommendations.
Also, ideally, I would want to have a single master account that lets me connect many different machines as opposed to having to buy and setup separate plans for each of the many machines we have at home (half a dozen at last count). Again, that’s a nice to have and not absolutely critical.
Finally, once stuff is in the cloud it should also be accessible from mobile devices. That’s especially true for the media files (which I tend not to have local copies of due to size constraints), but would also be nice for the incremental backup.
Looking to hear what other people are using.
Historically in the software business high gross margins were considered essential. It is not clear to me that the same logic will apply for cloud-based services. Their cost structure might wind up being quite different. Much smaller development teams can accomplish amazing things on top of a cloud stack substantially reducing the fixed cost component. On the other hand, COGS may increase significantly as the underlying cloud services are paid on a variable basis. With that in mind, it could well be that some of the most successful cloud businesses will have small or even tiny margins (just a tiny bit of a mark-up / premium to the underlying infrastructure) but at potentially huge scale! This is just a short post to help me with my own thinking on this, but I am hoping to dig deeper with some actual numbers and maybe case studies along these lines.
We were on family vacation through Sunday, so when our iPad arrived via UPS on Saturday nobody was home, which meant that we had to wait until Monday for it to be re-delivered. I wish I could share first impressions this morning, such as Bijan did this morning, but we frankly didn’t get that far. The iPad appears as a beautiful physical object, but when you turn it on it greets you in a very disappointing way: an icon asking you to connect it to iTunes via a cable! What a contrast with my experience with Google’s Nexus One phone. On that, all I had to do to get started was — well nothing. I was able to turn the phone on and start using it right away. When I wanted to sync it up to gmail, gcal, etc. all I had to do was enter my google account info once! Yes, Apple has MobileMe — but it is a paid service and as far as I can tell covers a small fraction of all Apple users. I believe that for a slate to be truly useful, syncing to the cloud is a critical requirement.
In January I wrote a post saying that we need a standard for cloud databases. That post is now the #2 search result on google for “cloud database standards” and there are still no results that even hint at emerging standards. So it will be fun today to moderate a panel on this topic at the Glue Conference with Alex Iskold and Stu Charlton. I am planning on a lot of audience participation as there are a lot of technical folks at the conference. Here are some of the questions I am planning to cover:
- What exactly is a cloud database?
- Do we really need these? There is a recent paper which compares Map Reduce performance with parallel relational databases and finds the latter to perform 3-5x faster and require less code.
- What about the approach pursued by Drizzle? Can’t we hang on to SQL that way?
- What about key value stores? Those seem all the rage right now.
- Is the need for cloud databases all about performance or is it also about ease of development?
- Why do we need standards? Is portability really important? Who has migrated their current relational db ever? Is it learning curve?
- Is there a native data format for the cloud? Is it XML? Is it JSON? Something else?
- In the cloud, how will “things” be identified (certainly not autoincrement ID column)?
- How does this relate to IP and licensing? SQL was invented by IBM in early 70s and not standardized until the mid 80s.
- Any candidates for a cloud database standard?
If anyone has others questions they would like to see discussed (and are not at the conference), just add them as a comment. Also, of course, any answers to the questions above would be great as comments.
Just as I was bout to start this post, I saw Marco’s Entry “On Database Joins” where he quotes a tweet by @codinghorror on joins being super expensive and explains how Tumblr avoids them. Tech folks at our portfolio companies have probably all heard me say that “joins are evil” because they will make it really difficult for your service or site to scale past a certain level. Of course, once you start to program without joins, you pretty quickly demote your database engine to perform object (or key-value pair) storage. And that begins to sound an awful lot like what Amazon’s SimpleDB and Google AppEngine’s Datastore does (and 10gen is about to do).
Yet to date very few of our portfolio companies make use of those technologies (the notable exception being AdaptiveBlue, with Alex Iskold explaining their usage of SimpleDB in some detail. One obvious reason is that these data stores are in their infancy. But there is also a sense that to date they are quite proprietary and would result in significant lock-in. Relational databases on the other hand have SQL and give the impression that one could switch from say Postgress to Oracle if one wanted to. The reality is that I have seen very few such switches take place. Partially this is due due to people starting with fairly standard SQL but eventually making use of proprietary aspects of a particluar database.
The adoption of cloud databases I believe will benefit greatly from a standard that can play a similar role. At DLD, I talked with Werner Vogels, who professed some surprise that there are so few of these cloud database offerings out there to date (the team at 10gen is working on changing that) and said he would like to see more competition to SimpleDB and is interested in seeing some kind of standard emerge. The current state of affairs is that we are far from this. When you google for “cloud database standard”, only the first entry is immediately relevant and it is a blog post arguing that Atom is the right standard for cloud databases.
Yesterday, Microsoft announced the Azure Services Platform, which is their entry into the cloud computing market. I have spent quite a bit of time reading through the materials, starting with Microsoft’s press release which is full of buzzwords, but otherwise about as uninformative as could be. In usual Microsoft fashion, they have thrown the kitchen sink into the announcement trying to tie together .NET services, with Live Services, with loads of other services that have gone through multiple different names before even fully launching (e.g., the data services). It was fascinating to skim the blogosphere, as so many posts simply rehashed the language from the Microsoft press release. And no surprise. There simply was not a clear breakthrough announcement here.
The only actually meatier piece to read on the Azure main site is a white paper by an outside party. You can find it in the white paper section. I would link to it directly, but again in typical Microsoft fashion, the whitepaper is available as a .docx file only (does anyone at Microsoft see the irony in this?). From the “Introducing the Azure Services Platform” white paper, it seems that the core execution engine essentially consists of a load balancer that distributes http requests to application instances that run inside of a Windows VM. Various parts of the site refer to automatic scaling, but one of the instructional videos show editing a config file to influence the number of running instances. Other than better integration with Microsoft development tools, this does not seem to go beyond what one can actually do right now with the beta Windows version of Amazon Web Services, as The Register points out in their coverage.
Throughout the Azure site, there is a lot of mention of “industry standard SOAP, REST and XML protocols”. For instance, the data services have a URL scheme with RESTful access sort of like CouchDB does. That’s nice, but at this point it should go without saying that in the cloud everything should be exposed as a web service, including data storage. So the real reason for repeating SOAP, REST and XML like a mantra throughout the site appears to be to mask the fact that everything else about Azure is closed and extremely proprietary. Yes, several of the diagrams show Ruby, Python and PHP as coming soon and I would like to see Microsoft provide support for these languages through some of their efforts for having dynamic language on the CLR. But I have a strong feeling that even then it will be in some proprietary fashion.
In the meantime, for anyone interested in multi-language framework-based cloud computing running on a VM (the JVM), get going and support the open source effort underway at our portfolio company 10gen.com!
This has been a big week in cloud computing. Most importantly, Amazon removed the “Beta” label from EC2 and is now offering an SLA for EC2. This is essential for businesses that want to move core systems onto EC2. As I have stated before, though, EC2 is not really a cloud platform in and of itself, rather a highly advanced server provisioning infrastructure. Amazon also took several steps towards moving closer to a true cloud platform. In particular, they have added support for load balancing and automated scaling. Both of these are key ingredients for running a platform such as 10gen which completely abstracts servers for the majority of applications. I have not yet had a chance to read up in detail on how these have been implemented, but was intrigued by one line on the AWS blog which indicated that load balancing and scaling were exposed as web services which makes the programmatically accessible for others. That’s the right way to do things. And yes, they are also offering Windows support now, but so what. The other big announcements this week came from Rackspace which acquired Slicehost and JungleDisk. Rackspace has a bit of a piggy bank from the IPO, albeit much smaller than they had hoped for (the IPO raised $200 million which was half of the original plan). Whether or not these acquisitions are smart remains to be seen. They could be if Rackspace can cheaply acquire customers and move them to their Mosso platform. There are two big ifs in that sentence. With margins being compressed it’s hard to know if you bought something cheaply and migrating customers to a single platform can be painful (maintaining multiple platforms will kill you on support costs). It is also possible that these companies have real technology that Rackspace can use for Mosso, but here too one has to worry about the substantial integration risks. I sure hope that Rackspace succeeds because competition will be healthy and we would like to see many different places where folks can run 10gen.
Finally getting around to my fourth cloud principle: In the cloud it should be easy to consume and produce web services. When you build something in the cloud, you should be able to focus on the essential innovation of your site or service and be able to rely on other web services for everything else. Conversely, it should be easy for you to expose any part of your site or service as a web service that someone else can consume. Together this is what I call “ease of assembly.”
We have made huge progress in this area, but much still needs to be done. First the progress. There was a time when programming languages consisted (almost) entirely of the primitives of the language and developers were essentially starting from scratch every time. By contrast, most of the languages in common use today come with extensive libraries that facilitate reuse of large chunks of code for complicated tasks. In the last decade we also managed to come up with ways of delivering such functionality over the web in the form of a web service. For some tasks this is a big improvement over using libraries, as the web services bundle more functionality and run on someone else’s hardware. For instance, take the problem of finding references to companies in an XML document. Sure you could take an XML library, load the document, look for keywords, maybe check those against a database over company names, etc. or you could use Reuters’ Open Calais web service which does it all for you.
Now for the still to do part. While consuming web services at a basic level is very simple — often a few lines of code, when you do it at scale and in something that might be a commercial offering there are lots of things you’d like to get that those few lines of code won’t get you, such as stats on your requests (how many did you send? how many errors were generated? what was the response time? etc). Producing web services at scale is even harder. Now you need to deal with issues such as authentication, throttling requests, potentially providing an SLA, etc.
Startups such as Mashery are beginning to address some of these problems. But consuming and producing web services is a crucial aspect of cloud computing and so these should be features that are integral to a true cloud platform as opposed to requiring an add-on service.
Today I get to give a talk about a topic that I am very excited about: cloud computing. As I scanned the program at Web 2.0 Expo I found that there are 10 presentations with ‘cloud’ in the title and I suspect that many more will mention it. While it has become a buzzword and may therefore suffer the fate of other buzzwords, I am convinced that the basic ideas behind it are sound and will have profound impact on everything we do (I already had a series of posts on that but will sure revisit often). In my talk today I am going to set out 4 principles that together I believe define cloud computing. I will post these here over the next few days.
Most of the companies in the USV portfolio started out quite small. Some even with just a single server, many with a couple of web servers and a single DB server. At that stage testing is straightforward. In fact developers generally have a complete setup on their laptops and stuff that works there is highly likely to work in production.
Fast forward a year or two (in some cases only a few months) and the production setup includes master and slave DBs, dedicated servers for API and RSS, sophisticated load balancing, and so on. Developers can no longer have anything like the real deal on their laptops and code that looks fine locally may simply not work at all in production. Of course at this stage it becomes essential to have a separate testing environment and possibly an additional staging environment. With Virtualization this no longer necessarily means having lots of physical boxes around (although load for load testing that might help), but it still requires a lot of additional effort.
A cloud computing platform should eliminate this issue entirely. First, in a proper cloud platform, such as Google App Engine or 10gen, the code is completely isolated from the hardware and network topology. So if something works on the SDK on a developer’s machine, it will run on the cloud. Second, the testing, staging and production environments are all the same! They are simply different instances but all running on the same underlying cloud infrastructure. 10gen in fact has taken this to a level where you can simply flick a switch to cut over (and back should you need to).
The net result of this is that in a cloud computing world many of the current hurdles to proper testing simply evaporate. Can’t wait to get there!