Author: A.Z4 (6 Jun 07 6:25pm)
Alex,
Your bot(MJ12bot) gets in to spider trap. does not follow robots.txt all the time.
Just picture this: there is a page with with a pixel link on it. human can't see it, you are not Google(yet) . So if your bot follows the pixel link, it gets banned and reported.
It is possible that some bad folk use fake user-agents, they do it all the time. Unfortunately if someones bot comes to a site and claims to be nice these days comming from Dynamic IP Ranges, it would certanly get banned on the sites like mine in a heart bit.
The user agent could say it is a SLURP bot, but unless there is a way to verify it, its not a slurp.
UAs
MJ12bot/v1.0.4 (http://majestic12.co.uk/bot.php?+)
MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+)
MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)
MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)
IPs
129.241.111.168
193.64.31.23
205.209.170.162
205.209.170.172
205.209.170.177
205.209.170.201
205.209.183.161
212.191.65.241
213.115.58.34
213.216.247.249
213.84.192.184
216.105.213.176
66.108.41.54
67.22.9.58
67.68.226.158
67.71.157.176
68.185.24.2
68.48.242.29
69.159.10.232
69.159.37.151
69.243.48.152
71.168.107.138
74.135.126.189
81.167.8.147
82.52.23.150
82.99.36.100
84.184.147.193
84.248.180.84
84.48.34.91
you might also want to programm it to :
once it gets a 403 to stop crawling the site.
check whois on
205.209.183.161(Managed Solutions Group, Inc.) Hosting
205.209.170.201(Managed Solutions Group, Inc.) Hosting
68.185.24.2(Charter Communications MDFRD-OR-68-185-0) Dynamic
what do you expect?
|