Author: J.Johnson (9 Jan 05 4:35pm)
J.Cridland (28 Dec 04 5:00pm) wrote:
(Not withstanding that you don't mention SpamCop.net in your 'how to report spam', but I've said quite enough!)
Thank you very much for pointing out some possible misunderstandings I may entertain, but I have a few questions regarding your post as well. Before I do, I have a few observations that I think I can address with a reasonable degree of confidence.
I could not agree more fully with your praise for projecthoneypot.org. They provide a great service to us all, and I like the fact that I do not need to hide their trap behind a robots.txt. A number of spambots are however outside the reach of law enforcement, and will continuously traverse a site, and I think it is useful to be able to block them and not have to spend the time needed to look them up on a regular basis. This goes for bots like baidu.com's biadu spider, and the one used by cyveillance.com and others of their ilk, which are just badly behaved spiders. They do at least appear not to be harvesting email addresses.
As for badly behaved individuals. I will not regret not having to waste the time needed to investigate their unauthorized access either, should they decide to misbehave more than once. I do not use the .htaccess trap on each site that I maintain however, and will not disagree that it is a bit much for anyone who just wants to record and track spambots. I will not judge those who do so either though. I have several other reasons for using this trap however.
I maintain a site that provides free access to a large database of information for which several other sites charge a service fee. I pride myself too for doing a more thorough and comprehensive job in creating the database, and actually began this pursuit as a consequence of someone downloading the database. The .htaccess trap therefore serves the purpose of preventing anyone from capitalizing on something I spent months developing and still spend hours each week maintaining.
The same domain also provides literary critique group forums that require a significant level of security to prevent the manuscripts from being indexed by search engines that do not honor the robots.txt, and the .htaccess trap also provides an added level of security against certain shortcomings in the security of IM identities that may be posted in the member profiles against my advice.
My next question is, wouldn't you advise using a robots.txt file to disallow access to a trap file that would use .htaccess to block an IP, and thereby avoid blocking spiders that one might normally want to index a site? If you would recommend against using the robots.txt for this purpose, why would you do so? I'm puzzled.
Perhaps I need to clarify that the pages that load in the iframes are not intended to be parsed by search engines - they are after all trap files that are disallowed by the robots.txt. I have not heard that spambots avoid iframes, and would appreciate your definitive statement regarding the viability of a trap that uses them to trap such a creeper. I have only trapped a few dozen spambots and other unwelcome intruders, and wonder how many I am missing. Is there a ratio I can apply to estimate how many I missed?
I do admit to erring when I paired the PHP trap with my .htaccess trap. This caused several bots to be blocked from the PHP trap, and my recommendation that a delay be used is intended to allow time for the bot to hit the PHP trap prior to being blocked. A simple HTML redirect is all that's needed, and doing this works quite well.
At your suggestion, I think I will point to SpamCop.net. Perhaps those who may visit ih8Spammers, but lack the wherewithal to make an independent effort, will use it. I had intended to do a little more research into the procedures that each of the major RBL sites employed for sending spammers to the nether reaches of the space-time continuum before doing so, but throwing a quick comment about SpamCop.net's quickie reporting procedure on the page won't hurt.
Post Edited (10 Jan 05 10:42pm)