>300 subscribers
>300 subscribers
Share Dialog
Share Dialog
Yesterday it was revealed that Microsoft is clearly using Google search results and people’s clicks as an input to Bing’s algorithms. That raises a bunch of interesting questions about who owns what online.
First, let’s talk about where Microsoft could have this data from. There are two possible sources: Internet Explorer and any Microsoft browser plugins (e.g., conceivably something like Silverlight could do this). In either case this may or may not be a violation of the terms of service that people agreed to when installing the software. Since nobody reads these and since they tend to allow companies to do pretty much anything it is not that likely that this is a legal infraction. It may be a surprise to users though. Then again maybe they won’t care. I continue to be amazed how little people know or care about what their browser does. Personally, I use multiple browsers and lots of private sessions to minimize my footprint. Following this episode I will investigate running some kind of sniffer at the TCP/IP level to figure out what else may be going on in terms of communication. I generally don’t have a problem with services collecting data to make the service better, but I like to feel somewhat in control. I am still hoping that eventually we can shift the locus of control of this data to the enduser, as Seth Goldstein tried years ago with Attention Trust (5 years ago!).
Second, what about Google’s ownership of their search results? Google aggressively crawls sites – in fact so aggressively that they use links passed around in gmail to seed their crawler (I know this because DailyLit links that are entirely hidden on the site but get sent via email were being crawled until I adjusted robots.txt accordingly). Now generally that is to the sites’ advantage – they want to be found after all. Here it is a case of a direct competitor doing the same to Google, but in a distributed fashion via browsers so (a) harder to detect and (b) can’t be stopped via robots.txt (which in any case is only voluntary). The results pages that associate a specific search term with a specific set of results are likely to be protected by copyright – this is not a “fact” like say my home address and telephone number. More importantly these results pages constitute some meaningful value, which is why people come there in the first place. Extracting that value without contributing anything back (no traffic, no data) is where I feel a line is being crossed (there is a different question as to whether google has crossed this line in other instances making the basis for this rant).
Will be interesting to see how this one plays out on both fronts – enduser and Google ownership. In the meantime, I love the whole cloak and dagger aspect of Google’s honeypot operation and (one can only surmise) the initial secret effort by Microsoft.
Yesterday it was revealed that Microsoft is clearly using Google search results and people’s clicks as an input to Bing’s algorithms. That raises a bunch of interesting questions about who owns what online.
First, let’s talk about where Microsoft could have this data from. There are two possible sources: Internet Explorer and any Microsoft browser plugins (e.g., conceivably something like Silverlight could do this). In either case this may or may not be a violation of the terms of service that people agreed to when installing the software. Since nobody reads these and since they tend to allow companies to do pretty much anything it is not that likely that this is a legal infraction. It may be a surprise to users though. Then again maybe they won’t care. I continue to be amazed how little people know or care about what their browser does. Personally, I use multiple browsers and lots of private sessions to minimize my footprint. Following this episode I will investigate running some kind of sniffer at the TCP/IP level to figure out what else may be going on in terms of communication. I generally don’t have a problem with services collecting data to make the service better, but I like to feel somewhat in control. I am still hoping that eventually we can shift the locus of control of this data to the enduser, as Seth Goldstein tried years ago with Attention Trust (5 years ago!).
Second, what about Google’s ownership of their search results? Google aggressively crawls sites – in fact so aggressively that they use links passed around in gmail to seed their crawler (I know this because DailyLit links that are entirely hidden on the site but get sent via email were being crawled until I adjusted robots.txt accordingly). Now generally that is to the sites’ advantage – they want to be found after all. Here it is a case of a direct competitor doing the same to Google, but in a distributed fashion via browsers so (a) harder to detect and (b) can’t be stopped via robots.txt (which in any case is only voluntary). The results pages that associate a specific search term with a specific set of results are likely to be protected by copyright – this is not a “fact” like say my home address and telephone number. More importantly these results pages constitute some meaningful value, which is why people come there in the first place. Extracting that value without contributing anything back (no traffic, no data) is where I feel a line is being crossed (there is a different question as to whether google has crossed this line in other instances making the basis for this rant).
Will be interesting to see how this one plays out on both fronts – enduser and Google ownership. In the meantime, I love the whole cloak and dagger aspect of Google’s honeypot operation and (one can only surmise) the initial secret effort by Microsoft.
No comments yet