Author: S.Goodman (20 Feb 05 6:00pm)
As a follow-up to the comment that I made on a previous post in this thread, my general concern is making it harder for harvesters to distinguish honeypot script-generated addresses from real ones. I suspect that something is not working as it should due to the very high ratio of harvested addresses to spam received at those addresses. I know there is often a delay, but my sense is that this is too slow and I am afraid that many harvesters have learned to identify this script.
I'm sure that the legal contract is necessary for you to eventually go after spammers who use harvesters or you wouldn't have put it there. My question is whether it is necessary to have the generated addresses on the same page? In other words, is it sufficient for the legal contract to exist anywhere on the site in order to protect the whole site?
If posting the contract on a site does protect the entire site, it would be possible to have calls to honeypot address scipts on regular pages mixed in with regular content. I wouldn't mind putting a visible link to the legal contract page on the site home page, perhaps labelled "Terms and Conditions". The contract page could then be static, non-obfuscated and human-readable. Having curious visitors read the contract would advance the anti-spamming cause in general and Project Honeypot in particular. For that matter, we could even add some standard language to the contract that site owners would generally like and could modify to suit their purposes. For example, the site owner owns the site contents, copying distribution or incorporation into derivative works is only permitted with written permission from the site owner, the site may be linked to in a manner specified by the site owner, etc. Having boilerplate protective language that was gone over by a lawyer would be a good inducement to get people to post this contract on their sites.
The script that generates honeypot addresses could then go anywhere on the site. We might wish to put robot exclusion commands to prevent indexing of generated addresses around the script call, but that could be part of the code that people install. You would still get your google-bot heartbeat but the generated address would not be indexed. For that matter, is it really important to avoid the address being indexed by a search engine? IANAL, but an addition to the contract might be able to cover the case of addresses harvested from a search engine that got them from a site covered by the contract. The site content is our property, so it seems reasonable that the search engine index would be considered a derivative work on which we could still assert restrictions as to permissible use. This is just a layman's guess, so maybe one of the lawyers could comment.
If feasible, this might accomplish two desirable things. First, if the visibile, non-obfuscated legal contract becomes a sufficient deterrent to get harvesters to avoid a site, that's a win. If that happens, it should be fairly easy to publicize this and get people to post the contract (and sometimes the honeypot script) on many sites. If on the other hand, harvesters aren't scared away by the contract, they have no way of discerning real addresses from generated ones. Well, they can, for example by repeatedly visiting the site and discarding addresses that change too frequently, but that would just be the next step in the arms race and we can deal with it.