Author: S.Byrne (25 Jun 14 2:06am)
As many forums and blogs add the login URLs to the robots.txt file to keep out legit bots such as Google and Bing, generally spam bots check the those links anyway, since they are not going to avoid spamming a forum just because the robots.txt file says bots are not allowed to access the login link.
So in theory, the same holds true where the honeypot URL is in the robots.txt file.
As I'm sure many spammers are aware that forums place login and comment links in the robots.txt file, putting the honeyput URL in the robots.txt file could actually lead to bots landing in the honeypot that test every URL in the robots.txt file, that otherwise may not follow empty 'a href' links or links it determines are not visible.
If you do this, you will likely need to add a rel="nofollow" to every link pointing to the honeypot. Although Google does not follow links blocked by the robots.txt file, it does tend to index pages within those locations that are linked to, so could end up leading to Google indexing the honeypot link with some sort of title, since it would not be able to crawl that page to see the 'noindex' header. Should this happen, the rel="nofollow" should stop it getting any ranking value.
|