Limiting the power of the Googlebots
Monday, December 3, 2007/
A coalition of online publishers are joining together in an effort to increase their control over when and how much of their content appears on search engines, Time reports.
Currently websites are able to exert some control over which of their pages search engines have access to by inserting a text known as robots.txt. In effect, these files contain a set of instructions to the netcrawlers that search engines use to map and index the web, allowing the website to block indexing of individual web pages, specific directories or the entire site.
The coalition of publishers wants to extend the kinds of commands they can put into these robots.txt files to expand the control they have over their content by, for instance, limiting how long search engines may retain copies in their indexes or telling the crawler not to follow any of the links that appear within a web page.
The publishers say this will better enable them to express terms and conditions on access and use of content – in particular, they’re concerned about their information staying on search engines long after they’ve locked it on their sites, or excerpts and headlines being used without their permission.