Author: C.Dijkgraaf (24 Feb 05 7:39pm)
There is one (maybe more) bots out there that are sending a random string for the agent.
I've been thinking about how to trap these as they are showing Harvester charecteristics (ie. hitting a page that had an e-mail address on it, and only that page).
A regular expression off [b-dB-Df-hF-Hk-nK-Np-tP-Tv-zV-Z]{5} tends to trap it quite well, with only one other bot with the agent of "NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html; nutch-agent@lis" matching that expression. (due to the tchCSVS part of the string)
So possibly by using that regular expression, and then another regular expression to eliminate Nutch and any other legit bots would trap this bot with the fake agent string.
|