Author: M.Prince (9 May 07 1:11pm)
We identify Google's robots through http:BL. Any result returned from the http:BL DNS query with a "0" as the fourth (last) octet means that the robot in question belongs to a search engine.
Web admins can do with this information what they want. Some may not want Google trawling certain parts of their site. Others may want to hide email addresses from Google -- since directly harvesting the Google cache is an increasing way spammers are getting email addresses. Some may want to welcome Google's robots with open arms. Http:BL leaves the choice to you and, if installed correctly, there shouldn't be any way for Google to tell whether it's running or not.
I'm a fan of empirical examples, so here's one. The site which has been running http:BL the longest is Project Honey Pot. During that time we have gone from Page Rank 6 to Page Rank 8. Make of the results what you will.
|