[Skip navigation]

Message Board

Tracking Harvesters/Spammers

[ Older Posts ] [ Newer Posts ]

Working with SpamCop etc ?

Author: J.Cath (26 Jan 05 3:02am)

Are there any plans to work with SpamCop / IronPort to push the spammer data into their black lists etc to help spread the impact to spammers quickly ?
Not sure how viable it is but with a number of other blacklists / block schemes out there it would be good to see the blacklisting and legal activities consolidated to ensure there's no redundancy or duplication that could be better served elsewhere

Re: Working with SpamCop etc ?

Author: M.Prince (26 Jan 05 8:41am)

We're willing to talk to anyone interested in our data. We're currently working on getting data to the SURBL, which is a blocklist based on the URLs that appear in spam messages. I really like their approach because they are extremely cautious and maintain an extensive white list in order to ensure that legitimate companies do not suffer as a result of their system.

I'm somewhat hesitant to share our data on spam servers with pure IP RBLs because it can be so easily manipulated. For example, imagine a spammer downloads and installs a honey pot. If that spammer then collects some email addresses handed out by that honey pot and signs them up for legitimate newsletters, those newsletters' mail servers' IPs will be listed by our system as potential spam servers. I wouldn't want that data to be fed automatically to an RBL.There may be a lot of ways to minimize this risk, and we're both looking into ones that have been suggested already, and are open to other suggestions.

In the meantime, it's worth noting that the data on harvesters is much more difficult to manipulate. If, for example, the same situation as above occurs, while we get a "false positive" on the spam server, the spammer's IP still gets correctly listed as a harvester. This is why we do publish a list of the top harvester IP addresses, but make it more difficult to see a list of the top suspected spam server IPs -- we're more confident about the data on the former than the later.

Still, if anyone from IronPort (or any other RBL maintainer) would like to contact us, we'd be happy to talk with them about how we can work together.

Re: Working with SpamCop etc ?

Author: A.Daviel (20 Feb 05 3:15pm)

I'd be interested in feeding an RBL with the harvested data; I run an in-house RBL
at a research organization where one could plausibly argue that non-work-related email and lists are a privilege not a right and are not guaranteed.

Currentlly our RBL is populated by in-house spamtraps and also some dynamic statistics from
SpamAssassin ( the last 5 messages from this ip had an average score of
more than 5). Addresses are removed after a few days, and can be removed via a Web form given in the rejection message. As far as I am aware, we have had no, or very few, problems - though some broken mail systems may rewrite the error message so a legit user may not see the URL. The RBL is occasionally useful in rejecting a spike of spam sent from one ip address, though is otherwise much less effective than CBL which we also use for rejection.

One possibility is that you acknowledge the chance of false positives and
recommend users do not reject permanently, only reject temporarily (greylist) or
use in a scoring system such as SpamAssassin

Andrew Daviel aka advax@triumf.ca

Re: Working with SpamCop etc ?

Author: R.Kay (8 Aug 06 4:32am)

I'm also setting up some of my own trap addresses to feed data into my in-house automated IP RBL. This is balanced by a whitelisting system, so that if a source of ham also spams it doesn't get blacklisted (and the blacklisting wouldn't be effective anyway).

As it is expected to take some months before much data arrives through the spamtrap addresses which are setup to feed directly into my own system, it would be very useful for me also to be able to regularly receive reports of IP addresses which are spamming addresses within domains I have donated to Project Honeypot, preferably in real time, if there is any likelyhood of spam senders using sorted address lists. Currently due to the domain donation, the mail to these addresses is intentionally redirected so I can't scan it. I could still usefully use the IP origin data to feed into my own in-house blacklisting system.

Providing domain donators the IP spam origin data directly relating to their donated domains for in-house RBLs would be of no use to a spammer who setup a honeypot for the purpose of discrediting publicly provided RBLs.

Re: Working with SpamCop etc ?

Author: M.Prince (8 Aug 06 12:52pm)

An interesting idea, and something we've thought about a little. Let me tell you two concerns, one specific, the other general.

The specific concern is simply technical. We've got a lot of users who have donated MX records. Already managing them is a bit of a challenge based on the distributed manner in which we've architected the system. For example, our mail servers are separate, and sometimes quite remote from, our database and web servers. The mail servers ever query the database. Instead, the database periodically accesses the mail servers and retrieves any in-coming messages that are in its queue. Because there's no way for the mail servers to know what domains were donated by what users, it would be hard to relay the information in anything very close to real-time.

We could do it in slightly-delayed time via RSYNC or something. The database query would, however, be fairly expensive. While not a big deal if it's only running a few times a day, the more real-time we make it, the more we have to hammer the DB. I'll talk to our engineers about whether there's a way we can do it more efficiently, but it would introduce a lot of complexity into an already complex process.

The more general concern is the same concern I've expressed over traditional RBL systems elsewhere on this board. Because of the nature of the Project Honey Pot system, it would be possible for a spammer to poison the incoming data feed rather easily. We have systems in place to catch it and mitigate its damage, and we may even catch it pretty quickly, but there are all sorts of problems (legal, social, technical) associated with even temporarily blocking a legitimate mail server like Amazon.com's.

The harvester data is MUCH less fragile than the spam server data, which is why we're working on creating the HTTP:BL. I think this will be a unique service and will provide real benefit to the entire Project Honey Pot community. We're also going to be sharing -- assuming after their evaluation they want it -- the URLs of spam messages we receive with the SURBL. I like the SURBL because they hand-check the URLs they include in their system in order to ensure they aren't "www.amazon.com" or whatever. This helps eliminate the problem I described above of false positives.

Creating some sort of DNSBL is something that, if you look through these boards, a NUMBER of people have requested since the inception of the Project. It is something we continuously reevaluate. However, while I'm confident if we decided to we could find ways to overcome the technical problems described above, I have yet to hear a convincing argument which allows me to believe we could overcome the more general problems. We're open to such arguments, we just haven't heard them yet.

Thanks for your input and help with the Project!

Matthew.

Re: Working with SpamCop etc ?

Author: R.Kay (11 Aug 06 4:13pm)

Rsync a few times a day would be better than no IP data, due to the problems of providing this data in real time.

On the wider question of reputation data if the Honeypot project is successful at catching harvesters you must have quite a useful volume of spam messages coming into your traps. Using reputable whitelists (e.g. bondedsender) might help avoid the pitfalls. The more sources of DNSBL data are available, the more accurate SpamAssassin type use of these becomes possible, when a number of spam indicators are used and this, like Spamcop, is just one of them.

Anyone using a DNSBL should be aware that false positives are possible, but if an automated source is never going to be totally reliable this, together with other sources, allows better automated decisions to be made than otherwise.

Thanks for all your efforts.

Privacy Policy | Terms of Use | About Project Honey Pot | FAQ | Cloudflare Site Protection | Contact Us

Copyright © 2004–26, Unspam Technologies, Inc. All rights reserved.