China is going through a crack down on social media and is apparently now requiring verified real identities online. This is very clearly an attempt to silence criticism and suppress unrest by making it easy to track down who said what online. I am strongly in support of pseudonyms which is one of several reasons why I prefer Twitter to Facebook. It is also why recently proposed bills to outlaw online impersonation are potentially problematic.
But what about going the opposite direction? What about building out crypto based systems that let individuals separate themselves entirely from their speech to provide real anonymity? On one hand I am sympathetic to the need for such systems in dictatorships. But I continue to worry that broader use and dissemination of crypto as in say Silent Circle and Bitcoin will lead us down a path towards a spy-vs-spy society everywhere.
If we want governments to be more transparent shouldn’t we be living that change already? If we embrace crypto everywhere we are just providing ammunition to those who say that government agencies need more powers and money and less supervision in snooping on us.
So what is the balance that I have in mind? An open society where you can create a pseudonymous online account (Twitter and elsewhere) and have the expectation that your service provide will not provide data to the government or others that can be used to reveal your true identity without proper checks and balances. It should require presenting evidence to an independent court that access to this data is needed. That’s why we badly need an update to the Electronic Communications Privacy Act and why the renewal of FISA without supervision is such a disappointment.
About 3 years ago, I wrote how phone numbers might be an important part of mobile identity. I started to change my mind on this last year. Now I am thinking that phone numbers may in fact much more rapidly be approaching the end of their useful life.
Why? Because of my recent experience in London with a data only SIM card. Having tired of crazy international roaming charges, I picked up a 1 GB data onlyl SIM card from a vending machine at Heathrow for 20 pounds and popped it into my phone.
I happily used foursquare, Twitter, kik and email all day long. But I was wondering how others would reach me who are not yet connected to me. I had a couple of different options. I could run Skype for voice. I could publicize my kik handle. Fortunately, I am “albertwenger” on both services so people could also just guess.
I believe that there are a bunch of ways this could go. There could a dominant global provider of a unified namespace for people if one of the existing big guys like Facebook or Twitter or one of the up and comers like kik becomes ubiquitous. Or there could be one or more providers of proprietary “phonebooks” that make it possible for people to let themselves be found on the services they want to. Or a new standard could emerge that is the equivalent of DNS but just for individuals.
Would love to know what others are thinking about this. How do you think of your mobile identity? Still phone number first or some handle? And how would you feel about a phone book approach that lets you register multiple handles? Should that be an open standard?
Companies across our portfolio are spending a lot of resources on suppressing various types of undesirable behavior ranging from comment spam to outright financial fraud. Much of this could be avoided through a probabilistic identity system. I have written about this idea before, but nobody has cracked the nut on it and the problem has only grown since. The solution to identity on the Internet is not to try for certainty and systems aiming for certainty will fail because they need to impose too many restrictions on users (would be the same effect that DRM has had on the user experience created by publishers and the music industry).
Instead, what is needed is a service that takes many signals as inputs, including Facebook, Twitter, Disqus, general web presence, and so on. That may be enough for a new social media service to establish whether someone is a real user or just a rapidly created fake account. For services such as banking that need higher degrees of certainty, it should be possible given these inputs to dynamically generate a few questions for the user to answer. Questions might be like “which of these three pictures did you publish to Twitter?” or “which of these three bands do you like best?”
The best product in this direction seems to be Lexis Nexis InstantID which draws mostly on public records about where people have lived. Adding that and additional offline data into a web service could be used to further improve it’s accuracy. If this kind of service were reasonably priced, I am convinced that it would find massive adoption.
The question as to what should represent people online has has generated some mighty fine posts and comments recently. Here is a short list of people strongly supporting some form of pseudonymity or even anonymity:
- Caterina Fake: Anonymity and Pseudonyms in Social Software
- Andy Weissman: Everybody wants to be special here (in praise of pseudonymity)
- Jyri Engeström: untitled G+ post
- Chris Pool / moot: 4chan Creator Doubles Down on Web Anonymity with Canvas (via The Atlantic)
- Jillian York: A Case for Pseudonyms (on the EFF blog)
- Fred Wilson: Are Real Names Required For Real Socializing
I firmly agree with the need for both pseudonyms and even anonymous expression on the Internet. That does not mean that every service has to support it. That should be the choice of the service provider and if someone like Facebook wants to have only real names they should be able to do that.
As it turns out though, the potential to construct any kind of anonymous service is under attack at this very moment with HR1981, another over-reaching bill coming out of Washington. The bill should more aptly be numbered as HR1984 because it combines a motherhood and apple-pie title “Protecting Children from Internet Pornographers” (who wouldn’t want to do that?) with far reaching data retention requirements for ISPs.
In essence, HR1981 would require ISPs to maintain for each account (which of course has real identity and billing information), an 18-month record of the assigned IP addresses. Because the IP address is visible to every service, any kind of breach or government access to the ISP data has the potential to completely de-anonymize the users.
I have used PopVox to express my opposition to HR1981 and encourage everyone else to do the same (or write directly to your representative).
Last night Susan and I watched “Catfish,” a movie from last year about a case of a fabricated online identity. The movie starts with a young man who is engaged in an online correspondence with what he believes to be a young girl who appears to be a painting prodigy. He also chats with the girl’s mother and carries on several increasingly steamy exchanges with her older sister. [Spoiler Alert!] Eventually, a variety of inconsistencies surface and he becomes suspicious. Together with his brother, who is documenting everything, and another friend they go on a roadtrip to find the family. It turns out that all of his correspondence is with a middle aged woman who has manufactured the other characters including a variety of online friends for them. She does have a young daughter but the paintings are all hers.
The movie is in the style of a documentary and there is some considerable debate online about whether or not it is an actual documentary or just a complete fabrication. The possibility that it is fabricated of course neatly parallels the plot line of the movie itself (something that seems lost on most people commenting about this online). We live in a digital age where it is not safe to assume that anything is what it purports to be. There is no certainty only different probabilities.
This has important implications for identity. We have to embrace a probabilistic identity online. On one extreme are people we interact with in person every day which is as close to “known” identity as we can get. On the other are anonymous comments on blogs. Everything in-between is a continuum. Just because someone has pictures of “themselves” and friends online doesn’t mean they are who they claim to be. Aside from Catfish, the recent exposure of two lesbian female online personas as fictitious and created by two males provides plenty of evidence. Those cases also make clear though that the data is out there that would be needed to assign a probability score to identities. Much of it is buried inside of systems of companies (most notably Facebook), but quite a bit of it is accessible online. Enough so that a bit of sleuthing by some folks who were willing to dig resulted in not just questioning the validity of these personas but also locating the people behind them. New services, such as Qwerly, Klout or PeerIndex and others yet to emerge might pull the information together in a way that lets them provide an explicit score.
I used to think that (mobile) phone numbers might play an important role in identity as they are quite stable for people. For instance, I have had my current number since 1999. But now I am wondering whether phone numbers will become meaningless entirely. I receive very few calls that I did not ask for on my mobile - most calls instead are pre-arranged via email. For any pre-arranged call, I could eventually use some form of IP telephony that will be namespace based (e.g., I already often do these kind of calls viae Skype).
That leaves people who need to reach me in an “emergency” — e.g., the kids’ school when one of them shows up at the school nurse. If I got a new phone number it wouldn’t be that hard to think about the half dozen places or soe that need to know about it. Ideally at some point in the future though I could simply tell the school to call me at “albertwenger.”
In a way phone numbers are IP addresses for people. If we had the web services equivalent of a gigantic telephone book (with some kind of permissioning) phone numbers would be pretty meaningless. That transition is, however, likely to take a very long time because we are missing the right standards. We need something like DNS and SMTP but aimed at the individual namespace and real-time communication. Paging DARPA?
So given the previous parts of this series (1, 2, 3), what might an alternative solution to the global namespace for people look like? First, we should have some criteria for how the system operates. Here are the ones that come to mind:
- Secure decentralized operation that’s not controlled by a single entity
- Human readable/memorable names
- Globally unique
Until yesterday I was blissfully unaware that there has been a bunch of work on this starting with an assertion by Zooko (talk about a unique name) that you can’t actually have all three in a naming system, followed by a post from yesterday (!) by Aaron Swartz proposing a solution (thanks to e.p.c. for pointing me to this). I have read Aaron’s post a couple of times but that’s only made me realize how much crypto background I lack to judge its merit. So rather than spend more time on that let me flush out a bit of an alternative model.
We are a bit shy of 7 billion people in the world. Apparently, the U.S. Census Bureau statistics suggest that there are about 150K different last names and about 5K different first names in common use in the United States. So from that alone one could theoretically generate 150 million unique combinations. Throw in a single middle initial and you’d be at 25 times that or 3.75 billion. Obviously people’s names are already distributed but this is just to try to get a handle on size (there are, for instance, also > 150K different words in the English language).
Another way to estimate how many readable/memorable usernames one could generate is by looking at the entropy rate of the English language, which is around 1 to 1.5 bits per letter. Now I think it is safe to assume that by throwing in digits and the generally somewhat higher unpredictability of the next character in a username (based on the previous characters) we could have an entropy rate of 2 bits. With let’s say up to 20 characters per username that would be 40 bits or over a 1 trillion usernames — in reality many user names will be shorter, so the actual number is probably lower (but then again maybe the entropy rate is even higher). In any case I think this is close enough to support the idea that human readable/memorable and globally unique are not at odds with each other (btw, I don’t think that the namespace needs to be restricted to English at all — I am simply using numbers on English to show that it’s not a crazy idea).
So what about passing these usernames out and managing them in a distributed manner? I think a system modeled on domain registrations would make sense. First, you are issued a globally unique numeric UserID by a registrar (more on registrars later). Then you can log in at any point and pick a globally unique human readable username for your UserID. Having both UserIDs and usernames would allow users to change their usernames. The registrars are in charge of assuring global uniqueness of both UserIDs and usernames and also add your record to the GPN (the global people namespace). This mirrors a bit the relationship between domain names and IP addresses.
Each record in the GPN would consist of at least the following (UserID, username, authprovider, authprotocol). The latter two would be how people actually use their usernames when signing up for a service — generally this will initially be the registrar itself but the user should be able to move that to other providers. You simply enter your globally unique username and hit a generic “Register” button, which using your record in the GPN can figure out where to send you to authenticate.
I think one reasonable objection to all of this would be that it is simply too late. The cat’s out of the bag. People already have usernames on tons of services. But I don’t think we should give up, at least not yet (apparently Jeff Atwood thinks so also). If a system such as this came into being in the next couple of years and was launched with some of the currently biggest username providers as initial registrars, then many people could get exactly the username they already have on these (and many other services). That would allow the existing namespaces to be converged with the GPN. What would be needed for all of this is someone like ICANN (ICANN itself?) to figure out a scheme for how UserIDs get partitioned across registrars and what database mechanism to use to for assuring global uniqueness of usernames. The model for the database itself should most likely be DNS (someone with more in depth knowledge of DNS might be able to tell whether DNS itself could be used as it is today).
I would love to hear from folks whether they think this is completely crazy in being undesirable (e.g., because not entirely distributed), not technically feasible, not politically/commercially accomplishable, etc.
P.S. Since people are likely to bring it up: webfinger may be a much more pragmatic way to get to a similar place in the end. It does, however, give up on a single global namespace for people that is separate from the domain name system (and with that the most likely give us up on individuals truly controlling their usernames).
I ended yesterday’s post by saying that I would write about potential solutions to the global namespace problem for people (parts 1 and 2). So here we go with a post on what seems to be happening (to be followed by a final post on possible alternatives).
The de facto solution for an increasing number of services appears to be delegating their namespace to Facebook or Twitter or both. Facebook originally did not have usernames but only real names. The problem with that approach was that the URLs for profile pages were based on User IDs, which don’t look good, aren’t memorable and don’t SEO at all. So now I am albertwenger on Facebook as well. I believe usernames are still optional on Facebook, although this is not entirely clear from the relevant help page (does anyone know the answer to this?).
For relying services there are a few issues. First, it looks like there could be at least two de facto namespaces (Facebook, Twitter) and there might be a couple more that matter (e.g., Skype). As a kludge, one could “subnamespace” by having URLs of the form service.com/twitter/username but then one still has to solve how to show usernames. One option for display (e.g., Quora) is to display real names everywhere and try to provide additional information to help with disambiguation. Another would be to continue the kludge and show usernames of the form username.tw or username.fb which would be pretty ugly (of course one could restrict it to usernames that would otherwise have a conflict). Alternatively, for presentation one could punt altogether and let username conflicts exist and rely on avatars and other additional information for disambiguation. Or one could decide to just force everyone through a single service.
A second problem for relying services is how to deal with changes in usernames. Facebook allows a single change in username. Twitter allows ongoing changes. I don’t think either has implemented a protocol that would call relying parties to inform them of username changes. Instead a service has to implement logic for tracking the change based on the underlying user ids as users come back to the service.
A third problem is whether or not to chose complete delegation in the first place. What about users that would just want to sign up for the service and either don’t have a Facebook or Twitter account or don’t want the linkage that is implied by delegating the namespace. Should a service support separate accounts and usernames at all (opening another door for username conflicts) or force everyone to come in through a service with a namespace? There is of course also the question of the dependency on a commercial third party that is created by delegating one’s namespace. While this may be somewhat hypothetical it is not entirely clear who “owns” the username — the person who created it? The service where it originated? Both?
Would love to hear from folks whether they think these are real problems with the current de facto solution and how services should deal with them (ideally with examples of good or bad implementations).
Yesterday, I started a short series of posts on the global namespace for consumer web services. My quick and highly unscientific survey of readers showed a surprisingly high usage of usernames based on real names. I suspect that to be highly skewed by the audience for my blog.
An important historic alternative has been to pick a distinctive username that bears no relationship to one’s first name and/or last name. Such user “handles” go back at least as far as CB Radio, dial-up bulletin board services and a bit later services such as IRC. In a pre-web world, each of those networks was closed and separate and there was no notion of search or discovery across them. At that time, there was also a seemingly clear separation between real space and cyberspace and for a long time separate offline and online “personas” were relatively easy to maintain.
That separation seems to be collapsing now as services are increasingly becoming social. If I want to find my real world friends on a service I need to be able to recognize them. Mostly this is accomplished by comparing the user base of a new service to some existing social graph, which could be in the form of one’s email contacts, Facebook friends or Twitter follows. In all of these cases once I discover potential matches, the actual likelihood of recognition is increased by the use of real names and — as several people pointed out in yesterday’s comments — recognizable and consistent avatars.
Add to this the overall explosion of consumer services that all require a username because they have “social” features and want to drive user engagement (even something minimal such as “favorites” generally requires a named user profile) and I believe that we have a global personal namespace problem. In fact, there is some indication that we already have a sufficient level of “username anxiety” that some people drop out of registering for a new service if they are asked to pick a username. I assume that is because they are worried about having to come up with yet another username should their desired one be already taken.
Artists have faced some of the same problems of needing a unique yet memorable name for a much longer time and stage names seem to be a common response. Quick and for 10 points, what is Lady Gaga’s real name? Or Prince’s? I am sure some people know this, but I had to search for both.
In the next post, I am planning to look at different emerging solutions to this global namespace problem, including the use of Twitter and Facebook as de-facto namespace standards.