Message Board

Tracking Harvesters/Spammers

Older Posts ]   [ Newer Posts ]
 New Features Launched Today
Author: M.Prince   (25 Aug 06 11:30pm)
We just turned on a number of new features that allow you to better search and understand harvesters. These features are all accessible through the "Top Harvesters" web page:

Where that page was fairly static in the past, now it allows you to customize the views widely. You can now:

- Show harvesters sorted by a number of categories (most recent, oldest, most damaging, most threatening, most anonymous, etc...)
- Limit the harvesters displayed within these categories to certain countries (try clicking on a flag), regions, and even cities (e.g., "show me the most damaging harvesters operating out of my home town").
- Search by the useragent used by the harvester
- And many more combinations of the above

Whatever search is important to you ("show me the most damaging harvesters targeting my sites and operating out of the USA") we now provide you a link so you can quickly jump right back to it. Moreover, there's an RSS feed for every conceivable search that allows you to syndicate the list of harvesters you care about and display it on your own site (e.g., you can now display on your own site the list of harvesters targeting that particular site so long as you have a honey pot living somewhere on it).

There's more to come, but this gives you a glimpse of some of the cool things we have planned. In the meantime, please let us know if you have any comments or suggestions.
 Re: New Features Launched Today
Author: S.Enbom   (4 Sep 06 12:09pm)
Thanks for these features.

I started using the RSS feeds in such a way that I made a script that downloads them , greps out the IP addresses, removes duplicates, and adds them to my .htaccess file (deny from) once every 24 hours. In addition I use a bot-trap which bans bots that access a folder they shouldn't access (

I got an email from Project Honey pot a while ago about new features, but don't have it anymore, and can't find info about them here on the message board. Will there be a server you can query for data from like one can from ?
 Re: New Features Launched Today
Author: M.Nordhoff   (4 Sep 06 8:53pm)
Well, here's a copy of the email you can read:
 Re: New Features Launched Today
Author: S.Enbom   (8 Sep 06 1:48am)
Thanks M. Nordhoff,

So, I implemented the data from Project Honeypot to deny IP's of known harvesters on my site. I'm using a script so humanBeings can unban themselves though.

Didn't take long before I got a mail from someone in the United Arab Emirates about an open proxy there that he, and many others, are using:

Doesn't look like a nice open proxy judging from the google results, and wandalism around the net from it.

I don't get so much traffic to my site, but I really don't like email harvesters pummeling it mindlessly, so I'm doing something about it.

I'll try to make the 403 - forbidden + unbanning process as userfriendly as possible.
 Re: New Features Launched Today
Author: S.Enbom   (13 Sep 06 4:30am)
I noticed quite quickly that I've had to whitelist some IPs. I actually got an email from a humanbeing in UA who was accessing my site through an open proxy his ISP forces him to use.

Here's what I've had to unban so far
 Re: New Features Launched Today
Author: M.Prince   (13 Sep 06 4:44pm)
Interesting. Thanks for the heads up.

We're building an Apache module to interact with the http:BL. The engineer who is working on it is actually working on the whitelisting functionality right now. We're trying to make it general so that you can use it however you want. For example, if you want people to have to email you in order to get on to your site, it'll support that. If you want to automatically whitelist them if they pass a CAPTCHA, it'll support that. And the code will be open source so you can tweak away at it as you want.

Interesting that so many IPs are being used by confirmed humans, and that you'd hear from them so quickly. It will definitely be interesting to see what happens as we increase the ability of webmaters to control who comes onto their sites.
 Re: New Features Launched Today
Author: S.Enbom   (14 Sep 06 1:28pm)
I only got an email about one IP/proxy but have whitelisted the others having learned they are widely used by people too.

As it seems most harvesters on my site give the user agent Java I've simply banned it.

Does anyone by the way know anything about these particular harvesters? Pwned computers?

I've put some php on my 403 so I get an email with agent/ip/referer if someone ends up on my 403. I then google the IPs and try to see if it was someone that might have been a real human being.

Will people who can't install Apache modules (because it's not their own server) be able to use http:BL somehow please?
 Re: New Features Launched Today
Author: M.Prince   (14 Sep 06 6:06pm)
Hope you'll add a link to a honey pot from your 403 also so we can continue to track those harvesters who you've already banned.

We are planning on setting up a DNS server that the Apache module, or anyone who is a member of Project Honey Pot, can query against. As a result, if you can do a DNS query from PHP or wherever then you'll be able to incorporate http:BL into your blocking.

Ultimately, we think that an Apache module (or IIS module, or something living at the webserver level) is really the way to do it right, however. We want to make sure we've got at least a skeleton of that built and running for those people who do have their own servers and want the protection. Maybe eventually you can convince your host to install it to protect everyone running off the server! But, in the meantime, yes... you will be able to take advantage of the http:BL data.

 Re: New Features Launched Today
Author: S.Enbom   (16 Sep 06 3:51am)
Yes, have my honeypot on the 403, along with some poison email addresses for the harvesters.

Still get alot of spam to my pot:

My host would probably greatly benefit from taking advantage of the http:BL data because they have really stretched their budget to the point where their servers are giving "internal server" errors every once in a while. Email harvesters do take up lost of resources.
 Re: New Features Launched Today
Author: S.Enbom   (16 Sep 06 5:53am)
For anyone interested, here is the script I use to download the rss data and ban the bad bots from my site. As I'm not very skilled at coding it's pretty naive code but it works. Wish I knew how to implement smarter whitelisting.


cp $DIR/blacklist.dat $DIR/blacklist.old
grep HTTP $DIR/blacklist.dat >>$DIR/blacklist.log
echo downloading project honey pot data
# /usr/local/bin/wget -q -i $DIR/scripts/wget.list -O - >$DIR/scripts/harvesters.rss
echo grepping ips from data
cat $DIR/scripts/harvesters.rss |grep -ho "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" >$DIR/scripts/blacklist.tmp
echo grepping ips from old blacklist
cat $DIR/blacklist.dat |grep -ho "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*" >>$DIR/scripts/blacklist.tmp
echo removing whitelisted ips
sed -e 's/' -e 's/' -e 's/' -e 's/' -e 's/' -e 's/' -e 's/213.42.
2.11//g' -e 's/' -e 's/' -e 's/' -e 's/' -e 's/' -e 's/' -e 's/
3//g' -e 's/' $DIR/scripts/blacklist.tmp >$DIR/scripts/blacklist.tmp2
echo sorting + removing duplicates + removing blank lines
sort $DIR/scripts/blacklist.tmp2 |uniq |sed '/^ *$/d' > $DIR/blacklist.dat
echo copying .htaccess.fresh to
cp /home/cybe/.htaccess.fresh /home/cybe/
echo adding deny from + ips to .htaccess
cat /home/cybe/blacklist.dat |while read line; do echo deny from "${line}"; done >>/home/cybe/
echo moving to .htaccess
mv $DIR/ $DIR/.htaccess
echo added ips
diff $DIR/blacklist.old $DIR/blacklist.dat |grep -ho "[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*"
echo old amount of ips
wc $DIR/blacklist.old

echo new amount of ips:
wc $DIR/blacklist.dat

do not follow this link

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–18, Unspam Technologies, Inc. All rights reserved.

contact | wiki | email