Author: C.Dijkgraaf (11 Mar 05 4:13am)
Yes, what file types bots request would be interesting information to capture.
I think bots that do download images shouldn't lose points unless
1) you put your images in a folder and banned bots from that folder
or 2) they don't issue a HTTP_IF_MODIFIED_SINCE when they next load the image (if they did then a particular bot would only use up bandwidth once for an image).
#1) I think is allready covered by obeys robots.txt, but
#2) would probably be worth doing as a seperate item for images.
I'm currently working on adapting some of my PHP pages to issue a 304 status if
1) the main PHP file hasn't change 2) the maximum modified date of database data shown in the page is less or equal to HTTP_IF_MODIFIED_SINCE.
This is to reduce the bandwith that bots use, because I have a lot of pages that are generated from a database.
I've seen one bot that even sets the HTTP_IF_MODIFIED_SINCE when fetching robots.txt (well done to that programmer, and thats why I've got a bonus point for that in my scoring system).
There are some specialists bots out there now, some that specifically are cataloging images, and even one that just catalogs favourite icons of webs.
Others bots loads JavaScript files to scan that for links, as some sites using JavaScript menus don't have links in the page that bots can follow.
Yes I've seen both the user agents pages very usefull thanks, I've allready used that information in to ban certain bots from my guestbook pages.
Yes I agree on the rapidly changing user agents, or those showing a wrong or missleading name (I allready had a -1 for wrong name).
I would also give points off for bots that use one user agent to request robots.txt and another to request the pages, I've seen several bots do that and I think it is a bad practice.
Colin
|