Author: M.Prince (10 Feb 05 7:49pm)
There is some method to our madness. Let me explain....
First, it's possible for us to pass an instruction to the page from our servers that will hide or replace the logo and just about any other element of the page. This means that if spiders begin to filter on any page that links to Project Honey Pot we can simply turn that element of the honey pot pages off, or replace it with something different.
Second, we're actually trying to bait spiders somewhat. We wanted to create something that was easy for them to filter on and would be easy for legitimate websites to include on their pages. We do that not only with the logo/links at the bottom, but also the no-email-collection meta tag at the top of the page. If we can get spider authors to build into their code instructions that filter on these then any website owner can include them on their pages and be safe from spiders. And, again, we can alter the honey pot pages to randomize even that element.
There is no doubt that this will become an arms race, but I like that the spiders are on the defensive here. As soon as they start building in code to avoid certain pages I think we've begun to win this battle. Making spiders selective -- creating "false positives" for them -- would be a huge victory for the Project.
We've already begun to randomly turn off the logo at the bottom of the page and track whether that has any effect on whether an address is harvested. As soon as we begin to notice a significant statistical difference, we'll make the information known so webmasters can exploit it.
Hopefully this makes some sense. Keep the feedback coming! And thanks for your help with the Project.
|