The Next Frontier for Peer Production: Open Machine Learning Services

Part of what makes Google such a formidable company is their access to vast amounts of data combined with top talent in machine learning. For instance, every time someone picks a specific link from a Google results page that action makes Google just a tiny bit smarter about the relevance of that result for the particular query (for this type of person). As a shareholder in Google, I have been very happy with the returns this has produced. But from an overall social progress and market structure perspective, I would be thrilled to see either Google or a third party open up some machine learning services to third parties.

What do I mean by that? For instance, knowledge of the English language (or other languages for that matter) lie at the heart of many interesting services. By analyzing vast amounts of written text on the web using neural nets and other techniques Google has accumulated an incredible understanding of language. This gets surfaced – as one example – in the predictive typing feature in Android. It fits the “any sufficiently advanced technology is indistinguishable from magic” quote. When I type, Android often presents the entire next word before I have typed a single letter, just based on the words that have come before. If you want to see an example of how much can be learned about the implicit structure of language this way, you can examine a map of the 2,500 most common English words that has come out of a research project.

Now imagine how much interesting innovation could be unlocked if these types of machine learning services were broadly available to third parties. This could happen either by Google opening up APIs or by third parties emerging. Ideally, many of these services would be (mostly) free with a quid-pro-quo of contributing data back into them. Why mostly free? Because the services themselves benefit from scale and because many use cases might be only mildly or not at all monetized. For instance, it is easy to think of awesome educational use cases for deep understanding of language, such as helping students grow their active vocabulary by suggesting alternative words and phrases in the context of writing (rather than learning them separately). Also, as we have seen from Wikipedia, peer production tends to flourish more easily in environments without strong monetary incentives.

Once we start to think about these kind of open machine learning services there is suddenly a new frontier for what is possible with peer production. Consider for example a differential diagnosis service. Given a set of symptoms it would provide predictions of possible conditions and suggest further questions or test to help narrow down the diagnosis. Building such a service is entirely within our technical capabilities. Now the question is not can it be built, but how will it be built in a way that provides for wide access and rapid data growth. It might, in fact not be a single service but a combination of separate services, such as one to classify images of skin conditions. If these services are built in an open fashion they will be able to work together to provide more powerful results. Going back to the language example for a second – a phoneme classifier works much better in conjunction with a word predictor than by itself. Because of the proximity of phonemes, discrimination depends on context. Similarly classifying images of skin conditions can benefit from knowing other symptoms and vice versa.

I should point out that I don’t believe these services need to be provided by not for profits. Simply by companies that recognize the importance of scale and know how to foster peer production. I would love nothing more than to see a series of B corporations that are dedicated to building and operating open machine learning services across many different domains.