Having grown up in Germany, I am well acquainted with “Datenschutz” or literally “data protection” — a set of attitudes, laws and regulations around how companies should deal with data collected from individuals. In the wake of the NSA disclosures there has been a lot of debate about how to further strengthen these protections. Much of this is well meaning but it seems to me a classic case of the road to hell being paved with good intentions.
We can never perfectly protect data nor should we try to. Perfect data protection is a physical impossibility — even black holes radiate information as Stephen Hawking famously figured out. Yes we can encrypt data as it travels over the network or is stored on disk. And yes, we may even be able to do some analysis while retaining encryption (due to the marvels of homomorphic encryption). But that does not really solve the issue — it merely shifts it to the new issue of the management of keys, which are themselves data that needs to be kept somewhere.
So in the end, there will always be leaks of data. Whether that’s because someone broke into the system and copied the keys, or socially engineered their way into the system, or intercepted the data as it is being presented to end users (which is the hole for all cryptography just as it is for all DRM) doesn’t matter. It is not a question of if a million patient records will be leaked. It is a question of when.
Does that mean we shouldn’t encrypt at all? No (at least not for now and probably not for quite some time). We tend to lock the front doors to our homes and close the windows while we are away. That to me is the equivalent of practices such as connecting to your bank, or email, or healthcare over an encrypted connection. We are making it just hard enough so that economic conditions, social norms and laws and their enforcement can do the rest. I am listing a whole bunch of different things here because they all work together to give us the relatively low property crime environment we currently live in.
Let’s come back to the road to hell part of data protection. Why and how could more data protection come to hurt us? First, by messing up the Internet. I was surprised to hear a member of the Chaos Computer Club endorse European data domicile regulation. This is re-imposing the old geographic boundaries and divides on the Internet and thus re-enforcing existing political power structures not disrupting them.
Second, by providing an argument for and movement towards “trusted chip” architectures — the idea that, in order to solve the infinite recursion of keys being again just data that needs to be stored somehow, we will store the keys in hardware that we can trust. That too has a way of concentrating instead of diffusing power. Why? Because it is exactly what is necessary for hardware to be locked down and for vendors to control what you can do on a device that you have purchased.
Third, it further cements existing information asymmetries. Companies and governments that already have a lot of secrets will find it easier to keep them instead of harder. Somehow the proponents of these new rules seem to ignore that these will also help to better protection corruption, abuse of power, etc.
Fourth, it takes our eyes of the ball from everything that could be gained by sharing more data and from creating the economic and social conditions, plus laws and regulations, that would enable more, rather than less, sharing. Let’s look at medical data for a moment. At a time when my medical record existed as a paper file, two things were true. First, it was relatively easy to physically protect my file by literally locking it up. Making copies was a physical activity that required access. So the cost of protecting it were small. Conversely, there was very limited upside to sharing it. Why? Because there was no easy way to get the world to look at it — I could physically bring it or mail it but that was to known group of people. So it made perfect sense to try to keep medical records private.
Now the situation is reversed. Protecting the digital file is hard (impossible) and very costly. But the potential upside from it being public is huge. A disease might be diagnosed, or a treatment proposed, by a doctor (or someone else) whom I have never even heard of before! And a large public collection of healthcare records would enable rapid advances in medicine for everyone. In this new world we should want to publish our medical information.
So then why are we so afraid about our medical records being public? Because right now we look at it at best as a potential source of embarrassment and at worst as a threat to our livelihood as we might lose our job or access to healthcare. This is what we really need to be working on — creating the social norms, economic conditions and laws and regulations that remove the stigma and the threat. We need to focus on protecting people from the potentially negative consequences of data about them, not on working harder and harder to protect their data.
None of this will happen overnight. It will takes us a long time to go from our current culture, society and economy to one in which some or all of us freely share our healthcare data and reap the benefits for all of humanity (instead of for a few large pharma companies with the power and wherewithal to buy all the data they need). And we shouldn’t force people into either. That’s why need some level of encryption at least for now.
PS Tech Tuesday will resume next week.
So yesterday the Feds busted the guy behind Silk Road, the marketplace for drugs and other illegal things paid in bitcoin. The indictment reads like a screenplay for a movie or Breaking Bad style television series (not that I watched Breaking Bad, just basing this on the inevitable flurry of tweets about it that I had in my stream). There should be one key takeaway here: law enforcement is easier online, not harder as government would generally have us believe. It is basically impossible to operate in modern live without leaving lots of digital footprints. Now admittedly “Dread Pirate Roberts” (really?) made some pretty glaring mistakes, such as apparently posting a question in Stackoverflow under his real name, then replacing it with a handle and later using that handle also in one of his keys.
Of course government seems hell bent on screwing up the very advantage that online provides. By taking a highly adversarial position to service providers and disrupting the trust between service providers and their endusers, government is fueling a spy-versus-spy arms race which is pushing both legal and illegal activities off shore and into deeper crypto. To see how counterproductive this is we now beginning to know what happened at Lavabit, the encrypted email service. The court ordered a turnover of keys and a wholesale access to the data with an agency promising to filter only relevant data. Instead of complying the founder shut down the service instead and is now helping to bring the litigation to light. This is a case that deserves to make its way to the US Supreme Court.
If we want any kind of network analysis at all (and I have argued that we might), then it has to be based on transparency and be done in a way that doesn’t pit service providers against their endusers or forces them to shutdown. At the moment we are doing the exact opposite which is a continuation of bad policies in past actions against Craigslist.
My blog post on Monday about privacy and DRM was read by some as suggesting that we abandon any and all notions of privacy over night. That was not the point I was trying to make so let me try again, this time with an analogy. We secure our homes by closing and (generally) locking the front door. That serves as a demarcation and keeps out a completely opportunistic thief. It does not, however, prevent anybody even remotely determined from entering. For that we rely on some combination of social norms and laws together with law enforcement. It hasn’t always been that way. There was a time when people tried to protect their belongings by building castles and fortresses. Obviously this was an expensive strategy and only accessible to those few living behind the walls. It also turned out to be a futile strategy as far back as the city of Troy.
So when it comes to privacy and encryption I feel much the same way. Of course our bank balances or medical records shouldn’t be public web pages by default and we should use authentication and something like SSL when we interact with those pages to prevent the casual sniffer from observing them, but beyond that the benefits from applying more crypto diminish incredibly rapidly. For instance, should the bank encrypt their disks? Maybe, but will that block someone who is carrying out a focused attack from the inside? Unlikely. The same goes for medical records. Search queries. And so on. There will be more leaks of more data in the future because ultimately none of these systems can be secured perfectly (among other things against Trojans).
From an overall perspective then (and using a heuristic for prioritization that I wrote about just last week), we should not be applying our talents to ever more clever encryption schemes where we face dramatically diminishing returns. Instead, we should be working on laws and social norms. First and foremost among those right now should be that the government cannot conduct any secret broad scale surveillance. Second we should expand any non-discrimination provisions that we have to explicitly include known medical conditions. There is a lot more and it will provide great subject matter for many posts to come.
In the future we can’t have privacy for the same reason that the record labels and Hollywood can’t have DRM and that’s ultimately a good thing. DRM has proven a fool’s errand because it is not compatible with general purpose computing. At some point a song or movie has to be decrypted in order to be played back and short of tamper proof and “trusted” hardware at that moment it can be digitally copied (and even if we had all of that it could be re-recorded or re-filmed during playback). That is by now reasonably well understood even by Hollywood and the music industry.
Yet we seem to be making the same mistake when it comes to our personal information. Too many of us, including well meaning “privacy advocates” and governments (especially in the EU), want our personal data to sit encrypted somewhere and want to control who can access it when and keep some unforgeable record of how the data was accessed and changed over time. That is as big a fool’s errand as DRM.
Unfortunately even Larry Lessig, whom I greatly respect, seems to believe in this possibility. In a recent piece he writes:
But trust and verify, with high-quality encryption, could. And there are companies, such as Palantir, developing technologies that could give us, and more importantly, reviewing courts, a very high level of confidence that data collected or surveilled was not collected or used in an improper way. Think of it as a massive audit log, recording how and who used what data for what purpose. We could code the Net in a string of obvious ways to give us even better privacy, while also enabling better security.
This is, I am sorry to say, a fantasy. There are simply too many subsystems and intermediate components involved many of which have the data in the clear out of necessity (including keyboards and screens). Most keyboard and screens are already quite sophisticated. Adding a bit of circuitry to send the information elsewhere over a wireless connection would be easy.
Also, all of encryption relies on keys. And those keys too have to be stored somewhere — in places that cannot ultimately be trusted because you don’t know what’s in the silicone they are running on. Are you really going to store your private keys on a USB stick you keep around your neck? Way too much risk of losing everything and even that key could be designed to leak your keys. Of course most people keep their keys online, encrypted and protected with a password, that wait — you type on a keyboard — that runs on, well you really have no idea what. There is always a hole at the end that you *cannot* close. This is the nature of information and computing.
It is only a question of time before some highly encrypted database of millions of medical records gets leaked or stolen along with its keys to show just how big a charade this all is. We incur all this cost and in the end it will turn out to be for naught. The music and movie industries have been learning this lesson the hard way for decades now and yet as society at large we seem doomed to repeat it.
There is only one way forward. Start constructing a society where it doesn’t matter that your personal medical record was just put online (by you or someone else). Or that your song was copied by a million people. That is the real challenge for this and many generations to come.
There is so much happening with privacy right now that it is enough to make my head spin. What is clear though is that individuals, companies and government all want it both ways.
Some parts of government want private enterprise to do a better job of protecting individuals’ information from other individuals and companies. At the same time other parts of government are looking for wholesale access to individual data bypassing any and all privacy policies and constitutional rights. Sorry, you can’t have it both ways.
Similarly individuals want to be able to share information with more than 500 of their best friends on Facebook and yet have it be private. Sorry, you can’t have it both ways.
Companies want to gather tons of information about their customers but not disclose much or anything about their own activities or have third parties collect that information. Sorry, you can’t have it both ways.
We are entering unchartered territory here because of our amazing information gathering and sharing capabilities. As I have said before we need to start with a discussion of values first. I don’t believe that privacy is a value in and of itself. If you want to see my own grappling with this complicated topic, here are all my past posts on privacy.
One of the major issues we are struggling with in this flood of data is the question of what data belongs to whom and in particular how much access and control endusers should have over their data (or for that matter what “their” data even means). In California a “Right to Know" bill has been introduced that would require companies to let endusers access what data has been stored about them and which third parties that data has been shared with. The definition of personal data in the bill is quite broad "including inferences or conclusions drawn from other information" if those are shared with third parties. The bill has the support of EFF and the ACLU. Not surprisingly over a dozen companies and several trade groups have come out against this arguing that it would put an unreasonable burden on them.
I actually think this kind of regulation could be very helpful even though there are some details that need to be thought through. For instance, it will generally be easier for large companies to comply with this as they have more resources that smaller companies, so there might be some time period or scale threshold for which companies would be exempted. I also think it is critical that completely electronic request (a button in the user’s profile or setting page) and electronic delivery in plain text or even something like JSON can be used to satisfy the requirement. The current draft still mentions such things as “addresses” as if it should or could be possible for users to request this over the phone or by mail.
This kind of act could be particularly powerful in conjunction with another set of regulation that I would really like to see: legalizing personal internet bots. By that I mean a law that makes it clear that as an enduser I can authorize a third party service to interact with a service on my behalf. And if I have explicitly authorized the third party service then this cannot be a terms of service violation. The combination would allow for the emergence of third party services that monitor information on my behalf across other services. This would be all that we need for market solutions to emerge around privacy. With the right amount of work both of these bills could be quite concise.
When I first heard about SnapChat I was immediately reminded of the attempts of a friend of mine to establish a DRM'd email platform that would let recipients read an email but not do anything else with it that hadn't been explicitly authorized (printing, forwarding, etc). There was some amazingly fancy crypto technology involved. In the end though it was never possible to close up the many holes inside general purpose computing technology. Those of course include the ultimate hole — the so-called A-Hole which here stand for Analog Hole: the ability to take a picture of the screen. Music and video companies have of course found the same truth the hard way in their own DRM efforts. So I was not at all surprised to see this headline: Not-So-Ephemeral Messaging: New SnapChat “Hack” Lets Users Save Photos Forever. Anything that gives you a false sense of security or control will come to bite you eventually. Personally I consider this a feature and not a bug of digital technology.
Yesterday I posted about how the current Do Not Track debate is muddling the underlying issues. I got a great reply from Mike Yang on Twitter that rightly pointed out at P3P had been a mess in part because Microsoft jumped the gun on it. That got me to think about an even broader context here which is the shift to mobile. A do not track battle on the web only is even more absurd in the context of a rapid shift in where people spend time and can be tracked even better (at least in the new iOS).
Mike then pointed out that Android has a pretty good permission system, which I agree with. The system is easy for end users to understand and doesn’t get in the way because you need to approve it once when you start using an application and then only if an upgrade to that application wants more permissions.
So the better idea here might be to start with mobile and then extend that model to teh web. When you first visit a site there would be a one time permission dialog. Websites in the EU already do this with regard to cookies. Now one might think that this becomes very cumbersome. But with a standard, browsers could be configured so that you don’t even see the dialog if a site is only requesting permissions that you are willing to always grant.
Of course web sites don’t operate in a sandbox so there would be some trust involved. But a standard like this would also make it possible to construct automated services that can crawl the web, register for sites and services, monitor marketing systems and see if sites are abiding by their requested permissions.
The permissions themselves should be things that can be worded in relatively plain English (substitute your language here), such as “Permission to send emails to you” or “Permission to share anonymized information with third parties for marketing purposes.” This approach would also make it possible to weave in things that are one off now, such as sites permitting access to location or potentially local storage.
It will take some work, but I think one could come up with something that works across both mobile and the web with the same language. That would be a real win for consumers and also provide operating clarity for companies.
The Do Not Track discussions that are currently going on are fascinating because they highlight the huge gap between how the internet actually works and how people are talking about policy. Politicians are giving consumers a false sense that there is an easy “on/off” switch for tracking. And industry groups aren’t helping the debate by making it easy to argue that they are using self policing as a fig leave. All of this completely drowns out the difficulty of the underlying problem: our online activities leave a huge data footprint because of the many different connected systems that data passes through. To really not be trackable, consumers would have to start using a network such as Tor which is clearly not a mass market behavior. Anything else implies some level of trackability. So the question really is more one of who does tracking and to what ends. For industry this means more transparency and more consumer friendly tools for understanding and changing their browser behavior. It may be time to revisit prior efforts along those lines such as P3P.
I wrote a blog post last week about the current Privacy Theater in the US, where the government is simultaneously pushing stricter privacy regulations and huge backdoors that would completely undermine privacy. The backdoors come in the form of the Cyber Intelligence Sharing and Protection Act or CISPA. The folks at Lumin Consulting have put together a good infographic that illustrates how CISPA undermines privacy:
I am actually sympathetic to the basic idea behind CISPA, which is to make it easier to share incident data as a way to identify and protect against attacks. But the way that CISPA goes about it is wrong on two important levels. First, it would stuff the incident information into the existing agency and vendor world instead of making it widely available on the Internet. Wide availability would let researchers, hobbyists and new vendors all work on improving security. In other words it would enable the Internet to help protect the Internet.
The second big mistake in CISPA is that it uses broad language when what we need is a tight and well specified sharing protocol. I am not suggesting that such a protocol can be devised to cover all types of attacks and attack related information but rather that by starting with something tight we can go from no public data to a lot of public data. For instance, reporting IPs involved in DDOS attacks would be a great and very precise starting point. The way the government can help here is by helping to define the reporting standard and starting small instead of shooting for some all encompassing solution.