Subscribe to Continuations to receive new posts directly to your inbox.
Over 100 subscribers
Collect this post as an NFT.
At OSCON, I listened to a bunch of talks about distributed file systems (including HDFS) and scalable databases (including hypertable). What is intriguing is that both need to solve a similar problem at their core because they store stuff across machines: they need a way of tracking where stuff is. In HDFS that’s the job of the namenode and in hypertable the hyperspace – which in both systems at the moment are single servers. The approach at the moment is to build the scalable DB on top of the distributed filesystem. Google’s bigtable runs on top of GFS, hypertable and hbase run on top of HDFS (although Kosmos FS can be used as an alternative for hypertable). This raises the question whether one could solve the tracking problem only once (instead of twice) by doing things the other way round, i.e. build the scalable DB and then stick files into the DB. Of course the recent history of trying to do just that on a single machine (never mind in the cloud) has a cautionary tale in Microsoft’s failed attempt to make a database the underpinning for all storage in what was then called Longhorn (and eventually became Vista). Would be interesting to know whether Microsoft’s failure was the result of a fundamental flaw with this approach or “simply” due to process and organizational problems.