Decentralized Identity and Content Attribution: We Need a Specification

The power of the web has been that it has given us permissionless publishing. Want to publish your opinion? You no longer need to submit a letter to the editor of a newspaper in hope of it being published – you can just put it on your own website. Through additions to the original web specifications we have managed to extend this permissionless nature of innovation from content to entire applications.

But the web specifications have been missing a crucial ingredient. We don’t have a standard way of attributing content. For instance, the content here on Continuations is Creative Commons licensed. The particular license I have chosen allows non-commercial redistribution and also remixing and transformation of the content.

Let’s take the simplest example of republishing an entire post. One way to deal with attribution is that one could require a link to the original. That works somewhat as long as the original persists and as long as there is an original in the first place. But one can easily consider a situation where I write something for another publication (not on Continuations) and still want that to be attributed to me.

Here too decentralized identity, such as Blockchain ID, could play a critical role. What I am envisioning is a spec that allows content to have one or more associated Blockchain IDs. It might look something like this:

id goes here
signature data goes here
BY-NC-SA 4.0

…. actual content goes here …

The key idea here is that Blockchain IDs come equipped with a key pair (and more key pairs can be attached to it). That allows a creator to sign their content. By specifying an attribution type, this kind of spec could also support any other ID that has a notion of key pairs and can be publicly verified (although Blockchain ID has yet another advantage as we will see shortly).

The spec would provide for different methods of signing and also first creating a canonical version of the content. For example, for HTML content a first cut at canonicalization might involve stripping all the html tags and whitespace from the content and then signing only the remaining text (note: the canonicalization should include the license so that can’t be changed).

What would such a specification be good for? To begin with, no matter where you found content in this format you would be able to verify who created it. Web browsers could do this in an automated fashion by verifying the signature attached to each piece of content against the attributed ID. By virtue of including a license you would also know what you can do with the content.

There is no attempt at DRM here. You could still rip the content portion and then do with it what you wanted. Of course, the more content gets published in this format, the easier it will be to create automated systems that track down content that has been ripped or is likely to have been ripped.

Having strong attribution also allows novel payment systems for content to be created. It would make possible subscriptions to either individuals or to content of some specified quality level. For individuals, I could simply instruct my browser to pay some number of BTC every time I consume content that is attributed to a specific Blockchain ID (this is the added benefit of Blockchain IDs – they come bundled with bitcoin addresses – the result would be similar to a decentralized version of Patreon). Or I could instruct my browser to pay some number of BTC every time I consume content attributed to someone who has a quality score above a threshold that I have set using a third party service rating service (this would be similar to a decentralized version of Netflix or Spotify and there could be multiple competing rating services for the same content type). Both of these approaches help overcome the problem with most micropayment schemes that I have written about.

There are a great many problems that need to be addressed which are not covered in my example, such as whether these should be tags from a custom HTML name space or use something more along the lines of microformats. There is the question of how to attribute work to multiple individuals. How to deal with links to original work for derivative work. How to deal with content other than text, say images, videos or music. But just getting started with single author text content and making it work might be good enough to get some adoption going. After all, the web specs we use today have come a long way too.