Friday, January 27, 2012
Thinking About Alternatives to SOPA/PIPA
With SOPA and PIPA shelved at least for the moment, it is time to start thinking about alternatives. It would be a shame if we limited our collective thinking here to slightly different versions of those bills instead of exploring what a different approach to copyright could be that doesn’t try to fight the characteristics of the Internet but rather embraces them, providing value for rights creators/holders, technology companies and endusers.
One interesting entry here is Ian Rogers (from Topspin Media) proposal for a rights and media registry. It’s worth reading the entire post and also the comments, which include good questions from Andy and clarifying answers from Ian. In essence such a registry would enable tech companies to deliver innovative user experiences on top of content, as long as they respect the prices set by the rights holders. Rights holders would be entitled to enforcement only if they participate in the registry.
I believe this direction is very promising and is also something that was recommended by a report that the UK government’s copyright office had commissioned. An important addition though would be that this should not be a centralized registry (which then requires an operator and become a single point of control and failure) but rather a standard for publication that would allow for a decentralized implementation.
Tags: sopa pipa copyright
Thursday, January 26, 2012
Apple Is Slow Boiling Developers
How do you boil a frog? Slowly. Apparently the same is true for endusers and even software developers. That at least is what Apple seems to believe. And while this has been debunked for frogs (they do jump out as the water gets too warm), it’s not clear that the same is true for humans. We seem all too willing to trade off having a shiny device for accepting ever more restrictions on what we can do with that device.
I wonder how long it will take before people realize how much they are losing when instead of a general purpose computer they have a locked down device controlled by a central choke point. I am especially curious when developers like Marco will conclude that this is no longer in their interest. And I am fascinated to see Gruber write a long post arguing that Apple’s new ebook “standard” is not a classic case of embrace, extend and extinguish. What line of control does Apple have to cross for him to say it’s actually a step too far?
The latest tightening of control by Apple is making some APIs accessible only to applications sold through their store. I am not talking about apps for the iPhone or iPad here but applications for laptops and the Mac Mini. You can read more about it here. This whole direction is rather upsetting because I really like my MacBook. But I don’t enjoy being boiled, not even slowly.
Tags: apple control general_purpose_computing
Wednesday, January 25, 2012
Supermodularity And Service Bundling
This will be a bit of a wonky and short post with a longer and less technical one to follow some time soon. Google has just announced a coming update to their privacy policy which will essentially make it possible for Google to integrate all the information it has about a user across its many different services. This comes at the same time as the revelation that Larry Page apparently explicitly stated the goal of building “a single unified, ‘beautiful’ product across everything.”
While one can come up with many possible verbal explanations for why Google might want to go this direction, there is some powerful math that lies at the heart of it: supermodularity. Here is the definition:
A function

is supermodular if

for all x, y
Rk, where x
y denotes the componentwise maximum and x
y the componentwise minimum of x and y.
If a production function is supermodular then x and y are strongly complementary. If you want to read the bible on this consult Don Topkis “Supermodularity and Complementarity.”
A firm such as Google for which the production function relies almost exclusively on information (yes, there are servers and people as well) will exhibit super modularity almost by definition. Why? Because if X and Y are different information vectors, then as long as they carry some joint signal, the inequality will be met as you can always choose to discard additional information (meaning you always have access to the component wise minimum). In plain English: if you have access to both the search history (X) and the social graph (Y) of a user, you can always “do better” than two separate services that only have access to one of these respectively.
Tags: wonky economics google
Tuesday, January 24, 2012
Tech Tuesday: DNS
Today we are continuing with the web cycle that I outlined two weeks ago. After a URL has been parsed in Step 1, the browser needs to determine the IP address for the domain as Step 2. Reprising the previous example, let’s consider the domain name dailylit.com. How does the browser determine that in order to retrieve information from this domain it should access a server at IP address 72.32.133.224? This is accomplished via a system called DNS, which stands for Domain Name System, and provides essentially the equivalent of a telephone book which provides IP addresses (telephone numbers) for domain names (people names).
In ARPANET, the predecessor to the Internet, there were so few domain names that this telephone book was simply a file called HOSTS.TXT that was retrieved from a computer at SRI and stored locally. There were only a few domains (mostly universities) and the file was relatively short. Today on the Internet there are over 200 million domain names of the type dailylit.com, which are further subdivided through subdomains such as blog.dailylit.com. So the idea of having every computer maintain a complete and up-to-date copy of the telephone book locally doesn’t make sense any more.
Thankfully in the early 1980s, which depending on your perspective is either ancient pre-history or not that long ago, DNS was born as a service that would allow the registration of domain names and maintain a mapping between the names and IP addresses in a robust fashion. In fact, without DNS it would be hard to imagine the Internet having grown as dramatically and we probably wouldn’t have nearly as many domains to begin with.
There are many ingenious ideas in the design of DNS and I won’t be able to cover them all here. Instead, I will focus on some key concepts. The first and central one is that there is a hierarchy of authority which allows for the delegation of both registration of domain names and the lookup of IP addresses. The hierarchy starts with the 13 root servers which together make up the so called root zone from which all authority flows. It is here that the so-called Top Level Domains or TLDs get resolved. Going back to blog.dailylit.com, the TLD is the “.com” part. You can think of a domain name like nested Russian dolls, where the outermost doll, the TLD, is the rightmost part of the name.
The most common TLDs are .com and .net which together account for about half of all domain names. There is of course also .org, .gov., .edu and an ever increasing number of other TLDs such as most recently .xxx. And then there are TLDs for countries which all consist of two letters, such as .uk for the UK (duh) or .ly for Libya, popularized by bit.ly, and .us for the US, which made the domain del.icio.us possible. Each TLD has one or more registrars associated with it who are in charge of letting people and companies reserve names in that domain.
The root servers point to name servers for each of these TLDs. Since blog.dailylit.com is in the .com domain the next place to look is the .com name servers. The .com name servers in turn point to the name servers for dailylit.com itself. Currently those name servers are at Rackspace. Since Susan and I registered and control dailylit.com, we are the ones who get to decide which nameservers should be queried to find the IP address for dailylit.com and its subdomains, such as blog.dailylit.com. The way this generally happens is by logging into a system run by a registrar and setting which nameservers are to be the authoritative sources of IP addresses for the dailylit.com domain. That then gets recorded in the nameserver for the corresponding TLD.
The lookup process that started with the root, went to the .com TLD, is now at the dailylit.com nameservers at Rackspace. They in turn contain information on dailylit.com itself and its subdomains, such as blog.dailylit.com. The whole process of starting at the root and working towards the subdomain (right to left) in a series of separate lookups across different servers is called a “recursive lookup.” If this sounds complicated to you, that’s because it is. It is so complicated and resource intensive that we don’t want the web browser to have to do this each time it encounters a domain name. It would not only be slow, but it would also swamp the root servers, the TLD servers and possibly even the name servers for dailylit itself.
So instead of doing a recursive lookup every time, the results of these lookups are stored on so called DNS cache servers. For instance, most ISPs through which you access the Internet will operate their own cache servers. After they have looked up blog.dailylit.com once, these servers will “cache” (meaning temporarily store) the result of the lookup, thus providing a much faster lookup the next time. In fact, your own computer will often cache the results of lookups locally for super fast access. This is important both because even a single web page generally involves multiple requests (e.g. for images) to the same server. The duration for which the results of a recursive lookup can be cached locally is known as the Time To Live or TTL and is controlled by the owner of the domain (and generally honored by the cache servers).
The existence of cache servers (sometimes also referred to as non-authoritative servers — although technically not exactly the same) provides a critical security vulnerability for DNS. Let’s say you have gone to your favorite coffee shop and logged on to the WIFI network there. Where do your domain lookups go? Well to the cache server of whatever ISP the coffee shop uses or possibly even cache servers on the coffee shop’s own network. An attacker with access to those local cache servers could insert falsified records that could have the effect of say pointing chase.com to some rogue server that wants to steal your bank username and password. This would allow for a so-called man-in-the-middle attack (more on this in a future post). Fortunately, some security additions to DNS known as DNSSEC will in the future prevent these kinds of attacks. As more and more of our access to the Internet is over wireless networks this becomes particularly important.
If you made it this far, I hope you have a (newfound) appreciation for the complexity of a system that is used billions of times per day behind the scenes of nearly every access to the Internet. In addition to the technical issues there are also important political issues surrounding DNS. Most recently the proposed SOPA and PIPA legislation would have mandated nameserver operators to make changes that would have interfered with the implementation of DNSSEC. Then there is also the question as to who really controls the root zone which turns out to be the US Department of Commerce. Yes, for the *entire* Internet, which is all the more reason why we should make DNS better not worse.
Tags: tech_tuesday web dns
Monday, January 23, 2012
Why Software As A Service (A Personal Reminder)
I am in the process of migrating a bunch of stuff off an ancient server which has been up continuously for 5 years and is being sunset (the hardware is tired and RedHat is ending support for RHEL4). The migration process has been a potent reminder as to the many hidden costs of installed software. For instance, at the time that the old machine was set up I used Subversion for version control and Trac for documentation and ticketing. Once these were working nicely for my purposes, I stopped upgrading them. Now you could say that’s a pretty stupid thing to do, but when you have very limited time would you rather fix a bug or upgrade your tools, especially when those are working!
So now of course I had a brand new machine running RHEL5 and much more up-to-date versions of Subversion and Trac. I have been using Git and Github for a more recent project but didn’t want to make a switch here as there is a lot of info in Trac that is nicely sync’d to the source. So how hard could it be? Well, the Subversion upgrade was a cinch. I dumped the repository on the old machine and loaded it on the new machine. Including time to set up Subversion, configure Apache to serve up the repo, and my laptop to connect, this all took less than 1 hour.
But then I got around to Trac where I was on version 0.10.4 and the current version is 0.12.2. As it turns out in between they switched templating engines and went from SQLite2 to SQLite3. I had to upgrade the old machine through several versions before I was able to export my data in a format that the new install could consume. Even then the documentation on how to do this was spotty at best. At one point I got so frustrated that I thought it might be easier to downgrade my new install to 0.10.4 but an attempt to install that old version on RHEL5 went nowhere. In the end I got it all done but it took almost 3 hours. Granted that someone who does this more often than once every other year might have been much faster, but my Google searches suggest that I wasn’t the only one to find this a bit tricky.
I still have quite a few other things to migrate to the new machine which I am sure will produce some similar problems. I could of course try to find someone to do all of this, but (a) that’s not easy given that this is a small one-off project and (b) I like to stay connected to how things actually work. And this object lesson in the cost of installed software was a good reminder for when this topic comes up with startups. For anything that’s not core to your success, consider using a Software As A Service offering over installing and running your own.
Tags:
Friday, January 20, 2012
Some Quick Observations on MegaUpload
Yesterday an international police operation resulted in the shutdown of MegaUpload and the arrest of at least four MegaUpload employees in Auckland, New Zealand. This action resulted in a large scale DDoS attack by the group known as Anonymous on web sites including the MPAA, RIAA, DoJ and even the White House. While I don’t have time today for a full scale analysis here are some salient point:
1. The fact that this shutdown and the arrests were possible shows quite clearly that existing laws already provide a meaningful ability to deal with large scale copyright infringement even when sites operate from abroad. That’s all the more reason why we don’t need additional new legislation.
2. According to ArsTechnica, MegaUpload was brazenly flaunting the DMCA by only disabling links to infringing content instead of actually removing it or blocking access to it entirely. That is a violation of both the letter and the spirit of that law and should not be allowed to continue.
3. As with any digital locker site, there were also legitimate uses of MegaUpload. Many people who had work or personal files on MegaUpload have taken to Twitter to complain about a lack of access to their files. This operation and others before it (such as the server seizure that brought down Curbed, Pinboard and Instapaper) raise the question how to minimize “collateral damage.”
4. The retaliation by Anonymous has the potential to meaningfully escalate the push for government intervention in the Internet for cybersecurity reasons. This comes at a bad time as we are trying hard to keep the government out of controlling the Internet.
What a week this has been! Apple too dropped another interesting copyright bomb yesterday by claiming sales rights to any books created with iBooks Author. It seems like we are at the beginning of what Cory Doctorow has characterized as the “Coming War on General Purpose Computing.” We live in interesting times indeed.
Tags: megaupload copyright sopa
Thursday, January 19, 2012
The Day the Internet Stood Still
Yesterday (Wednesday, January 18), has a good chance as being remembered as the day that the Internet first truly showed its political clout in the US. So far we have largely pointed at events abroad when discussing the Internet’s potential to shift power. Web sites and services large and small (including Continuations) either forcefully alerted their users to the problems with SOPA/PIPA or blacked themselves out entirely. At the 12th hour even Facebook’s Mark Zuckerberg took a (by now very safe) stand on the issue.
The results from a political perspective were impressive. SOPA had already been stalled a bit but PIPA support was still strong. Following yesterday though 18 Senators including 7 co-sponsors withdrew their support for PIPA. Someone with a better knowledge of the history of American politics will probably know the correct statistics but this is a massive erosion in the support for a bill. Together with the White House’s stance against the bills in their current versions I believe that there is now a good chance to stop both SOPA and PIPA.
What’s next? First, as Ron Wyden makes clear in his terrific letter to the Internet there is still one more vote coming up on PIPA on January 24th so it is too early to declare victory. Second, it is worth reading the MPAA’s reaction to yesterday’s expressions to see just how cynical their view of what happened is. Third, unless we want a new wave of slightly different versions of these bills following the next election we need to proactively outline an alternative that is not based on government intervention in the Internet. Fourth, and maybe most importantly, we need to start the long work on using the Internet to shift political power back to the voters and away from special interests more generally and not just with respect to bills that directly affect the Internet.
Tags: politics internet
Tuesday, January 17, 2012
Tech Tuesday: Anatomy of a URL
Last week’s overview of “How the Web Works” introduced the URL (Uniform Resource Locator) as the fundamental way things are addressed on the web. Before we pick apart some actual URLs, it is worth looking at the name itself. The promise behind “Uniform” is that this addressing scheme can be used across all kinds of resources and that explains why URLs are so powerful - they can be used to address content such as a blog but also services such as the Twilio telephony API. On the web a blog entry and an incoming phone call are both simply “resources”. That means a resource is a highly abstracted concept and as you will learn if you stick with Tech Tuesday, abstraction is amazingly powerful. And on the web the URL is the most powerful abstraction of them all!
So here is a URL to pick apart:
http://blog.dailylit.com/2012/01/16/in-honor-of-dr-martin-luther-king-jr/
The very first part, the “http:” indicates which protocol to use to access this resource. What other protocols might we find there? The obvious one is https: the secure (meaning encrypted) version of http:. Here some other protocols that you may have encountered around the web “mailto:” which indicates that the resource that follows is an email address and the protocol to speak to it is SMTP or you may have seen “ftp:” for resources that are accessible via File Transfer Protocol (FTP). Another protocol supported by many browsers is “file:” which means that the resource that follows is a file on the machine on which the browser is running.
Following the “http:” are two forward slashes “//” — these indicate that this URL starts with a domain name, which in this case is “blog.dailylit.com” — we will dissect domain names in more detail in the Tech Tuesday on DNS. There we will investigate the relationship between domains and actual servers but for now it is worth pointing out that grouping resources by domain serves an important trust purpose. Your expectations about accessing content at chase.com are meaningfully different from wepretendtobechase.com. Of course it’s not always that obvious and people go to great lengths to pretend to be someone else. There is a good test of your knowledge of which domains to trust.
Following the domain name is the location of the resource within that domain. This is the “/2012/01/16/in-honor-of-dr-martin-luther-king-jr/” part in the URL above. There are several things going on here that are worth noting. First, this location is structured in an easily human readable and comprehensible form. Just by looking at the URL you can infer that this is a post about Martin Luther King on MLK day. We call this kind of location a “pretty URL.” Having pretty URLs is a good idea not just because it helps humans figure out what they are likely to get when they access the resource but also because search engines, especially Google, make pages with pretty URLs rank higher in search results (assuming that the page content actually appears to be a match for the URL).
But there is even more to a pretty URL like “/2012/01/16/in-honor-of-dr-martin-luther-king-jr/” — the slashes “/” in the URL indicate some notion of hierarchy or of a path to the resource. It also suggests that the following shorter URL should point to something useful http://blog.dailylit.com/2012. In fact this retrieves all the blog posts from 2012. There is no requirement that the domain fulfilling the request understand this shorter URL, but the fact that it does corresponds both with intuition and allows for additional degrees of automation and discovery. For instance, without any further knowledge you should be able to construct the URL for finding all the blog posts from November 2011. Here it is http://blog.dailylit.com/2011/11/ . Again, there is no requirement on the server to respond to this with a list of posts and it could instead respond with say a 404 Page Not Found. The http protocol does not speak to this, which is one of its many strengths as it lets the person or organization controlling the resource decide how to respond.
Now not every URL starts with a “//” — there are also URLs that don’t contain a domain but instead just a path to a resource. Consider for instance the following http:/ — where the resource pointed to by this URL is located depends on the context in which it is encountered. This is an example of a relative URL. It points to a resource within the context of another resource. If you are reading this in the context of the Tumblr dashboard, the link will take you to your dashboard. If you are reading this on my blog, which is at the domain “continuations.com” it will take you to the home page of my blog. Relative URLs allow for more compact expression of the location of a resource but they can also introduce interesting errors. For instance, think about what resource that relative URL will point to if you simply copy it and send it to someone via email and they open it in a web mail client!
This post is getting quite long and I haven’t yet covered fragment identifiers or query strings. Instead of going on, I will cover fragment identifiers in the context of HTML and query strings when describing how URLs can be used to transmit additional information that can be used by the server in deciding how to respond to the request to the resource, so keep following Tech Tuesday!
Tags: tech_tuesday url web
Monday, January 16, 2012
Covestor: Getting the Message Out!
Our portfolio company Covestor today rolled out a tremendous overhaul of their web presence. The goal was to dramatically simplify Covestor’s message to make it easier for that message to spread and to improve conversion. The team has done an amazing job with a process that used extensive quantitative and qualitative research into consumer reactions to inform the redesign. While it’s too soon to tell how well they will hit their numeric goals, my immediate reaction to the new site is: wow! I encourage everyone who has wondered in the past what Covestor is about to go and check out the new Covestor site.
The overhaul touched every aspect of the site from updating the logo to changing pretty much all the copy. Here is the old logo on the left and the new one on the right:


What is externally visible though is only the tip of the iceberg. Internally, Covestor has done amazing work over several years to be in a position to roll out this new site. For instance, they have recruited hundreds of model managers to the platform. They have also developed a sophisticated risk score to help match models to investor strategies. And they have honed the trade replication engine that powers covesting.
Congrats to the entire Covestor team!
P.S. If you want to know more about Covestor in person, you can meet the team and others interested in covesting at their “Take Stock Mixer” on January 31st from 6-8pm.
Tags: covestor investing
Friday, January 13, 2012
Moving Back to New York City and Homeschooling
In the middle of 2010 we started to seriously consider moving back to New York City. At the time one of the considerations was that it would be possible to experiment with homeschooling the kids. I am excited to report that we are doing both. We have a place in Chelsea that is a short walk from the Union Square Ventures office and almost as importantly around the corner from Murray’s Bagels. We are also homeschooling our kids for at least the next six months. Now the “we” here shouldn’t be read to imply that Susan and I are doing the teaching - we both work full time. Instead, we have worked with Teri and Melissa from QED to recruit some amazing tutors. Susan has all the details on that over at a special blog about our homeschooling experiment.
Tags: new_york_city personal homeschooling
← Older Entries