Message Board

Tracking Harvesters/Spammers

Older Posts ]   [ Newer Posts ]
 Google a harvester?
Author: A.Blanchard   (8 May 05 3:19pm) is number 2 on the Global Spam Harvester List. It belongs to Google, with forward and reverse DNS entries and ARIN assigns it to Google. It's really Google. How did it become a Spam Harvester? Has anyone contacted google, or done a google search on the email addresses google supposedly harvested?
It's hard to imagine that Google is willingly helping spammers, and if Google is on the list by mistake, it kind of throws a lot of doubt on the usefulness of the whole project.
 Re: Google a harvester?
Author: M.Janssen   (8 May 05 5:46pm)
Spammers can also Google for emailaddress-patterns.. or they might just have their harvester go through a Google result page which mentions the emailaddress as a part of the honeypot page 'preview text'.
 Re: Google a harvester?
Author: A.Blanchard   (8 May 05 7:21pm)

If Google follows the directives found at the top of the honeypot page:

<meta name="robots" content="noindex,follow">
<meta name="robots" content="noarchive">

Then any email address in the honeypot page should not be accessable to anyone using Google. There was even talk of not serving up any honey email addresses to known search engines such as Google. So the question still remains, if a particular email address is served up only once to Google, then how did any email get sent to that address without the complicity of Google,or a bug in the Project Honey Pot code.

I think this deserves investigation by someone with access to the Project Honey Pot logs.
 Re: Google a harvester?
Author: M.Prince   (9 May 05 1:24am)
In at least one case, it appears that Google did not respect the noarchive/noindex metatag. Here's a search that will pull up the honey pot that handed out the address that resulted in the listing:

While the page appears in Google's index (which it should not), the "Cached" link no longer appears to be valid. That's good. However, the fact that it still appears in the index at all is bad.

We're investigating why this happened, but the initial evidence appears to point to a potential (albeit rare) bug in the Google system for respecting certain robots meta tags. This is the only instance we've seen. Over time the IP address should drop off the top harvesters list. However, for now, because they appear to sometimes not respect the noarchive tag, Google may essentially be acting as a tool for spam harvesters who can mine the company's cached pages even on sites that take the proper measures to prevent that from happening.
 Re: Google a harvester?
Author: C.Dijkgraaf   (26 Jun 05 6:38pm) is another Google IP address that got itself listed,
This one has had two spam e-mails within the last week, the other a month and 2 weeks.
Has Google been informed of their robot not obeying the meta tags?
 Re: Google a harvester?
Author: R.Bard   (3 Jul 05 7:29pm)
I think you may want to look at the robot tags.

Should it not be
<meta name="robots" content="noindex,nofollow">

The way it is above would tell the bot to not index but follow links...?

The no archive tag should have kept the page from being in the cache though...unless the page had been cached before the tag was added and would explain why it is no longer is cached.

Post Edited (3 Jul 05 6:29pm)
 Re: Google a harvester?
Author: R.Mas2   (18 Aug 05 1:29pm)
Google has many indexer-robots, some of which are definitely broken or they are some sort of testing machines. Some of my normal HTML pages that don't exist anymore on the www are still in the index with a indexing date of 1-1-1970. Resubmitting them to the indexer didn't help to remove them from the index. The guys@google'll find out themselfs at one moment that those entrys aren't valid anymore and remove them form their database.

Post Edited (19 Aug 05 6:18am)
 Re: Google a harvester?
Author: J.Schippers   (26 May 07 10:09am) is a google harvester as well. Is it perhaps wise to add rel="nofollow" to the hidden link tags? Perhaps Google is less buggy with handling that.

do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–18, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email