Author: M.Prince (8 Aug 06 12:24pm)
Sorry to take some time to reply.
We've been working on something we call HTTP:BL for quite some time. The system works much the same as what you've described, but distributed over a wide net of honey pots rather than just your own. Generally, the system will work like other DNSBLs in that you'll be able to query against our DNS servers to see whether a particular IP address visiting your site belongs to a harvester or other bad robot.
Determining known harvesters is easy, and we should be able to provide that functionality in the next couple months. The harder part is other "bad robots." Many of the techniques that you described above are things we're thinking about. For example, adding a "nofollow" meta tag to the honey pot page, including a specially formed link on the page, and seeing whether the robot follows it. If it does, we know there's a problem and they get labeled a "bad robot."
I'm concerned about allowing modification of the honey pot scripts themselves on a case-by-case basis. While I am 100% sure that you have good intentions, it becomes a logistical problem for us to manage a bunch of users with their own ideas on modifying the scripts (some of whom may not have good intentions).
I think what may make the most sense for you is for you to create your own trap gateway page. From that page you could do a lot of the things you describe above in order to generate data for your own system. You could also then link to the Project's honey pots and any other kinds of traps that you want. I think that gets you most of the benefits without having to require you to modify the honey pot scripts themselves.
Finally, if you have more ideas on how we can use the collective data coming off all the honey pots in order to do some of the things you propose, please do let us know. We're very much in the alpha-design stage of HTTP:BL, so we can make a lot of adjustments and accommodations to good ideas if you and other folks send them to us.
Thanks for your input, and sorry again for taking a while to respond.