Binging Google

Yesterday it was revealed that Microsoft is clearly using Google search results and people’s clicks as an input to Bing’s algorithms. That raises a bunch of interesting questions about who owns what online.

First, let’s talk about where Microsoft could have this data from. There are two possible sources: Internet Explorer and any Microsoft browser plugins (e.g., conceivably something like Silverlight could do this). In either case this may or may not be a violation of the terms of service that people agreed to when installing the software. Since nobody reads these and since they tend to allow companies to do pretty much anything it is not that likely that this is a legal infraction. It may be a surprise to users though. Then again maybe they won’t care. I continue to be amazed how little people know or care about what their browser does. Personally, I use multiple browsers and lots of private sessions to minimize my footprint. Following this episode I will investigate running some kind of sniffer at the TCP/IP level to figure out what else may be going on in terms of communication. I generally don’t have a problem with services collecting data to make the service better, but I like to feel somewhat in control. I am still hoping that eventually we can shift the locus of control of this data to the enduser, as Seth Goldstein tried years ago with Attention Trust (5 years ago!).

Second, what about Google’s ownership of their search results? Google aggressively crawls sites – in fact so aggressively that they use links passed around in gmail to seed their crawler (I know this because DailyLit links that are entirely hidden on the site but get sent via email were being crawled until I adjusted robots.txt accordingly). Now generally that is to the sites’ advantage – they want to be found after all. Here it is a case of a direct competitor doing the same to Google, but in a distributed fashion via browsers so (a) harder to detect and (b) can’t be stopped via robots.txt (which in any case is only voluntary). The results pages that associate a specific search term with a specific set of results are likely to be protected by copyright – this is not a “fact” like say my home address and telephone number. More importantly these results pages constitute some meaningful value, which is why people come there in the first place. Extracting that value without contributing anything back (no traffic, no data) is where I feel a line is being crossed (there is a different question as to whether google has crossed this line in other instances making the basis for

More from Continuations

Continuations

Feb 4

Philosophy Mondays: Human-AI Collaboration

Today's Philosophy Monday is an important interlude. I want to reveal that I have not been writing the posts in this series entirely by myself. Instead I have been working with Claude, not just for the graphic illustrations, but also for the text. My method has been to write a rough draft and then ask Claude for improvement suggestions. I will expand this collaboration to other intelligences going forward, including open source models such as Llama and DeepSeek. I will also explore other moda...

Cover image for Intent-based Collaboration Environments

Continuations

Dec 30

Intent-based Collaboration Environments

AI Native IDEs for Code, Engineering, Science

Continuations

Dec 29

Web3/Crypto: Why Bother?

One thing that keeps surprising me is how quite a few people see absolutely nothing redeeming in web3 (née crypto). Maybe this is their genuine belief. Maybe it is a reaction to the extreme boosterism of some proponents who present web3 as bringing about a libertarian nirvana. From early on I have tried to provide a more rounded perspective, pointing to both the good and the bad that can come from it as in my talks at the Blockstack Summits. Today, however, I want to attempt to provide a coge...