Message Board

Tracking Harvesters/Spammers

I Got Spam

Author: R.Watson2 (22 Jul 06 10:07am)

I received a notification that my honeypot had caught a spam harvester. Being curious I went to the link on the honey pot website... and saw what the spammers domain name was.

I received junk mail from the same domain to an email address that i have on my website. I don't like the idea of having to munge my email address, to me that is just annoying when I visit someones website to not be able to click on the email link and email them a message.

So, in light of that, is there any MX restrictions or spam lists that I can subscribe to that utilize the honeypot project so that any domain/ip or whatever caught harvesting from my website will be block from me receiving their emails to my regular address? (spews, spamhaus, spamcop, sorbs, blacklists, etc?)

Post Edited (22 Jul 06 10:08am)

Re: I Got Spam

Author: J.NIVARD (23 Jul 06 5:33am)

This project is a farce, it doesn't help you to stop spam, it helps Unspam LLC company to stop spam for its clients... The harvester you caught is probably not blacklisted somewhere...except at Unspam LLC.

Re: I Got Spam

Author: R.Watson2 (24 Jul 06 9:47pm)

Not exactly what I was expecting to hear on this forum.

I have been further reading that they have been waiting on a couple of milestones to occur, which now have. Apparantly, waiting for a "critical mass" of over 1 million hits and a beefier/better connected data center.

For now, I'll stick with the program and see what develops over the next couple of months. If there is no help forthcoming soon to stop me from receiving spam I don't know how long I'll participate waiting in angst.

Re: I Got Spam

Author: M.Prince (25 Jul 06 4:21pm)

J.NIVARD:
Not sure what you're talking about. We don't have any private clients for whom we provide anti-spam services. Unspam does some work with governments, but even there we're not using the PHPot data in any sort of blacklist or filter The sum total of money we've made off Project Honey Pot has been from the Google Ads we serve on a couple of pages.... and that barely pays for the coffee we give our engineers to help get them through long nights of coding. I'm happy to talk with you further about where your misconceptions have come from.

R.WATSON2:
You're right that we've been waiting for some milestones. To be honest, the Project underwent a lot of neglect while we were off doing work that actually pays the bills. Now that those things are (from a technical perspective) wrapped up, we're back focusing on PHPot. Starting in the next few days we'll be turning on new features. For example, the ability to look up IP addresses.

We're VERY hesitate to create some sort of traditional RBL/DNSBL. Not only are these services extremely resource intensive -- having to build the resources to be able to respond quickly to each email message that is received -- but the nature of Project Honey Pot is that it could quickly be abused by someone who wanted to corrupt the RBL data. Let me give you an example.

If we setup an RBL that published the IPs of spam senders, a spammer could setup their own honey pot and scrape the email addresses it contains. They could then take these known spam trap addresses and use them to sign up for Amazon. When Amazon emailed out a confirmation message, we would potentially list them as a "spammer" and all the problems of existing RBLs would ensue. You can make RBLs work, but they have to be very carefully monitored by human editors. Since this isn't our primary line of business, we just don't have the resources to do that.

That said, there are some other things we can do which are not nearly as subject to abuse. For example, why provide a filter to stop the spam you receive if, instead, we can provide you a RBL/DNSBL that allows you to block known harvesters from accessing your webpage in the first place. Again, that sounds easy, but it's tricky. As you can see from the stats page, the average harvester takes 30 days to send a message to a spam trap it's harvested. By that time, adding it to the HTTP:BL (our name for such an RBL service) wouldn't do people using it much good. We can do some other tricks in order to make educated guesses as to what IPs belong to harvesters. We've been experimenting with this. If we can get it to work in a way that actually does some good, we'll publish information about it on the site.

We also can potentially share data with other RBL-like services that do have the resources in order to monitor who they list. One of my favorites is the SURBL, which is run by Jeff Chan. We've owed Jeff a feed of URLs that appear in our messages for some time, and now that PHPot has our focus again, it's coming soon. If you want to control your inbound spam, I would suggest subscribing to SURBL.

Finally, to both R.WATSON2 and J.NIVARD, if you have ideas on how to use the data then please let us know. We're open to anything. However, understand that we need to think through all the consequences of how we use the data to make sure we don't end up polluting the stream. But, if you have a good idea on how we can be a real benefit, please let us know.

Re: I Got Spam

Author: R.Watson2 (25 Jul 06 5:28pm)

You guys are like way smarter than me...

Everydns/Opendns turned me on to you guys. So, here I am. I just want to help with what I can. (And it's got to be pretty dumbed down for me to understand.) But, I just want to make sure that good things will come of it.

I'll add some more domains and some mx records. I don't know what else I can do.

Re: I Got Spam

Author: J.NIVARD (4 Aug 06 2:07am)

TO PRINCE:

You're memory is low, see your post here: http://www.projecthoneypot.org/board/read.php?f=4&i=213&t=204#reply_213

[quote]
At the same time, and somewhat quietly, we've begun to share our data with anti-spam companies. These companies are using the spam feed in order to make filters better. We like to stay quiet about who we share the data with in part because we want the data to stay as clean as possible. Filter and sender reputation companies tell us that the feed coming off the Project is one of the cleanest they see. It's also wicked fast. Being able to turn around a message within seconds of it being sent to a spam trap means that a spammers campaign can often be shut down shortly after it has started.
[end of quote]

1-So don't make me believe your money comes from the Google Ads..."to pay coffee"...

2-This page http://www.projecthoneypot.org/about_us.php shows a "Business Inquiries" link to contact you.

3-The "tm" on the side of the Project Honey Pot logo makes me believe that you have/will register it as a trademark...

4-A blocklist to stop harvesters??... It's like stuffing your ears to hear better...

5-Since you found the weakpoints of the DNSBL, I will tell you the weak points of your system: blacklisting the email servers (the MX @ mxmailer.com) you are using for collecting the SPAM. If the spammers do that, their SMTP deliveries will not occur and you wouldn't be able to collect your spamtraps.
And the trick about using a spamtrap @ AMAZON applies to your project too, Sir! Do you want me to give it a try??

Give me access to the collected data, I will set up the DNSBL myself.

Post Edited (22 Aug 06 2:45pm)

Re: I Got Spam

Author: M.Prince (4 Aug 06 12:16pm)

J.Nivard:
Wow. I'm going to have to change your username on the Board to Dr. Grumpy! ;-)

1. You are absolutely correct: we do share the data with some companies, we just don't charge anyone anything for it. For example, the Internet Law Group, which has been a partner from day one, gets a feed off Project Honey Pot. They represent companies like AOL and use it to help track down spammers who are violating the law. (Jon Praed, the firm's senior partner, has been referred to as the "Spamhunter General" by the UK press.)

There are some IP reputation services that get our data in exchange for data they share back with us. At times, we've provided collected corpuses of our spam to researchers for them to study (we released a copy of about 100,000 messages at CEAS a year ago which anyone could download). And there are some other folks who we provide feeds to in exchange for using software or services (such as John Graham-Cumming's excellent software Polymail which we're going to begin using to classify spam messages we receive into different categories; as soon as we get it running we'll be able to say: "This harvester is primarily responsible for Viagra spam, this other harvester is primarily responsible for phishing messages, etc..." It's going to be cool!).

Finally, I really like the SURBL and think it's run very responsibly by Jeff Chan and his crew. We've owed them a feed for a LONG time and are finally getting that setup. If you want to benefit directly from the spam sent to Project Honey Pot, I'd suggest signing up with the SURBL service (which I believe is free to most, if not all, users). (PS - one of the reasons I like the SURBL is that they actively check the domains before they include them on their list, whitelisting legitimate companies. This eliminates the "Amazon" problem I described above.)

At some point in the future might we charge commercial entities to access the data, but we don't currently do so. And, even if we do provide the data to commercial entities, we would shy away from giving it to traditional RBLs because of all the documented challenges such systems face, and the particular challenges I've discussed above with the Project Honey Pot system.

2. Yes, there is a way to write to us with "business inquiries". Occasionally people do. Up to this point, we've typically said no. However, at some point we may charge someone for access to the data. We just don't currently.

3. Two of the three people who started Unspam, myself included, are attorneys. Given that, it's a wonder we didn't put a "TM" after every sentence on the website. I doubt we'll file for a registered trademark at any point. But, even if we did, so what? Groups like the Red Cross have trademarks -- and enforce them vigorously -- does that make them any less public minded?

4. I'm not sure what you mean by your analogy. Maybe you mean that if we stop harvesters then we won't be able to track them anymore. That actually is a concern, but there are some pretty obvious solutions. For example, we could route known harvesters to a honey pot page immediately upon visiting a HTTP:BL protected site. But the more salient point is this: our goal is to stop harvesting, not just to track it.... or in the language of your analogy, we'd be thrilled if we could effectively plug our ears! But maybe I'm just not understanding what you're getting at.

5. I agree this was a weakness in our system. We've recently started doing some things that help minimize the ability to sniff out honey pots in the way you describe. We constantly monitor the ratio of spam received at traditional spamtrap addresses (like the ones you describe) versus those we specially construct. What has surprised me is that we can see no statistical evidence spammers are doing what you describe. That doesn't mean it won't happen in the future. If it does, we have some tricks in our back pocket to adjust. As with many parts of the spam problem, this is an arms race. The nice thing in this instance is that the spammers have to adapt to us -- putting them on the weaker side of the race -- rather than us adapting to them.

Please don't hesitate to write to us if you have ideas on how to improve the Project or use its data in interesting, creative, thoughtful, or productive ways. We're always willing to consider them.

Re: I Got Spam

Author: J.NIVARD (5 Aug 06 2:16am)

Dear Mr Prince,

I really appreciated your reply. I will reply and comment in details later.
I added SURBL to my DNSBL list and see how it goes ;-).

Regards.

Post Edited (5 Aug 06 3:34am)

Re: I Got Spam

Author: D.Grumpy (5 Aug 06 11:28am)

Hi Again Prince,

---> nickname changed to Dr Grumpy ;-).

Before going in a more detailed reply, can you answer this question:
Is a mailbox protected by the data collected via the Project Honey Pot gets no spam?... I am sure you are trying it, don't you?

Regards.

Re: I Got Spam

Author: M.Prince (5 Aug 06 1:59pm)

Dr. Grumpy --
Actually, we don't internally use the data for any sort of sender RBL. (We have begun experiments with the HTTP:BL, as described above, and that has been very effective at blocking harvesting on our sites.) I don't believe that anyone with whom we share the data directly does so either. The SURBL, with whom we hope we'll be sharing spammers URLs within the next couple weeks, is the only RBL-like service that we've authorized to use our data: internally or externally. As a team, we from the beginning have been skeptical of traditional, IP-based RBLs. So I'm not sure how to answer your question.

Our volumes of spam only recently began to pick up significantly. I still think we're only capturing a fraction of the spam campaigns being sent on a daily basis. We spent much of last week reviewing -- by hand -- the messages we've received. One thing that's interesting is how international the spam we get is. Personally, most of the spam I receive is either in English or a little Chinese/Japanese. The Project Honey Pot system gets tons of Russian, German, Isreali, etc.

In order for us to capture more spammers we need to get honey pots on more high-volume websites. That's not exactly precise. What I think is actually the case is that we need to get more in-bound links from high-volume websites to our existing honey pots. We've got a plan to do exactly that which we'll be sending out a notice about early next week to anyone with an active honey pot installed. There's a way we believe we can take advantage of our existing, installed resources to track even more.

Another thing that I think we need to do in order to see more of fthe spam picture is start allowing our honey pots to appear in Google's cache. We play with harvesting software occasionally and a distrubing thing we've noticed is that much of it allows you to harvest directly from the Google cache. We take substantial efforts to ensure our honey pots don't end up stored by Google. However, that means we aren't tracking all the harvesting that is happening there. We have some plans to begin tracking the harvesting through Google. However, we want to be careful how we do it.

Anyway, I don't have a good answer to your question of how effective the data we currently have would be. My guess is that it would be helpful but not a complete solution on its own. And, again, I worry that if we start actually helping the blocking of messages the response of spammers will cause more harm to the Project than it is worth. But, as I said, we're always open to discuss ideas.

PS - One of the things we would be more excited about would be using our spam feed in order to auto-train some Bayesian filter. That would create far fewer potential problems and be a lot more interesting, IMHO.

Post Edited (5 Aug 06 3:56pm)

Re: I Got Spam

Author: D.Grumpy (19 Aug 06 10:14pm)

Hi Again Mr Prince,

a- Ok, thinking again & again about the Project HoneyPot gets me to the same conclusion: you are collecting data from the public; now it's time to share the collected data back to the public! Why do you care so much about how the public will be using the data, since THEY collected them?...

b- Until then, the arguments you gave against it are "HoneyPot poisoining"... Well, poisoning can be done right now, WITHOUT handing out the data!! Do you want me to try ??
So are you done putting up this argument in each post ?

c- HTTP:BL is, again, stuffing its own ears when you want to hear better. And: who cares about stopping the harvesters?? Who is interested by this?? Look at your own ad: http://www.projecthoneypot.org/images/no_flash_banner.gif it says "stop spammers" not "stop harvesters" !!!!!!

d- Will the data be given to the SURBL or not? If yes, when?

e- I am an "IP reputation services" supporter:
*big companies' mail servers can be whitelisted (it was your point with the SURBL).
*if an IP is in the Project HoneyPot database: it has nothing to do here, whether the email is spam or not! The sender has been identified to send email to spamtraps: it is the IP owner's job not to get listed, period~.

f- Now if you want to make money (and I am sure that this was your idea from the start, considering your replies), just make it like this: one IP whitelisting is US$500/year (or make it your own price). Take the money here it is: from the marketers/spammers, not from the email users whom contributed to build your business (by putting Project Honey Pot scripts on their websites!!!).

If you don't hand over the data to the members: they won't collect data anymore if they don't find any reward to it... and I will convince every single member here not to continue contributing to this project.

Dr Grumpy.

Post Edited (19 Aug 06 11:00pm)

Re: I Got Spam

Author: S.Stern (21 Aug 06 5:57pm)

Two more messages in this thread, and I think Nazis will be invoked. In any case, I'm using SURBL through SpamAssassin and it helps. So, if I can help SURBL, it's well worth the 20 minutes I invested in setting up PHPot on our servers. I was impressed when Matt gave a presentation a couple of years ago at a Chicago Internet Society meeting. Unspam is trying to do something about spam. If they can make a couple of bucks doing it, so much the better.

As far as http:bl is concerned, I've given up. I'm in the process of pulling email addresses completely off our websites and using a secure form to send (and log) mail sent through us. (There is a PHPot link embedded in the footer of that form.)

Re: I Got Spam

Author: M.Prince (22 Aug 06 3:20am)

No Nazis; I respect D.Grumpy's opinion, I just don't share it. If someone comes up with a good solution to the problems I'm concerned with then I'd love to create a solution our members can use. D.Grumpy, to be clear, here's the problem I am concerned with:

- We both agree it is possible to abuse the system by creating false positives. As you point out, you could do it today if you wanted to.
- Today, if we get false listings, little harm is done. We notice it and correct is as soon as we can, but we don't have the resources like SURBL to actively
- If we implemented a RBL the consequences of a false positive would be significant. We'd potentially stop legitimate mail from high-profile mailers from being delivered. This would either cause harm or cause people to not use the RBL. In either case, it's not a very useful service.
- The legal liability of maintaining a vulnerable RBL is not something we're willing to bear. I don't know if you're an attorney; I am. Even if we turned the data over to you and you implemented the RBL the liability would likely flow back to us for any false positives if those false positives could foreseeably cause harm. I think this whole thread makes it pretty clear that the harm is foreseeable.

Again, if you come up with a solution to these concerns -- not just repeating what you've said before in an even grumpier tone -- then I'm happy to consider them.

In terms of the SURBL, I'm happy to announce (quietly) that we're now sharing data with SURBL. (Quietly because the more the bad guys know where data comes from, the more they can avoid the traps.) I don't know if they've fully integrated it into their system. However, if you want to get the benefit of Project Honey Pot's in-coming spam data the SURBL is the way to go. Jeff Chan and his team do a great job. In our experience, with nothing but the SURBL, your spam volume will go down about 80%. We hope with the new data we're providing them we can bump that up a percentage point or two.

HTTP:BL is coming together. I still don't understand your analogy to the cotton and the ears. In the process of building it, we're learning a lot about DNS -- which is not our area of expertise by any means. D.Grumpy, while you may only want to stop spammers, I want to stop anyone in the spam chain: spammers, harvesters, spam hosts, dictionary attackers, whoever. If I can keep a harvester from getting my email address to being with then I'll ultimately get less spam. Our stats show that this is especially powerful with phishing and other especially dangerous fraud-based spam. In general, these phishers tend to use an address once and then never reusing it again. Stop them from harvesting and we'll stop a large percentage of phishing. We're excited about it, and it's coming soon.

THanks for the comments S.Stern! That presentation to the Chicago Internet Societ Meeting feels like a LONG time ago. Thanks for the kudos!

Re: I Got Spam

Author: D.Grumpy (22 Aug 06 12:57pm)

Hello again Mr Prince :-),

*!*!* Let me reply then: *!*!*

1- we both agree that the ProjectHoneyPot spamtraps can be abused/poisoned, ok we're progressing!
This "poisoning ability" comes from the ProjectHoneyPot characteristic. Unfortunately, the ProjectHoneyPot is public: so anyone can subscribe now and start poisoning. BUT if all the spamtraps goes "private" now, no spamtrap-poisining can be done (I bet all the subscribed members until now are not spammers). So the only solution for you I can think about is: close new memberships! And send an email to all members to explain your decision & the project progress ;).
Also: if a false positive is coming from a member (let's say, an email incoming from Amazon or Ebay), just close his/her account..."and you're done" (n.b.: this last quoted sentence is a registered trademark of Amazon --- ah, let's be careful, some lawyers are wandering around here ;).

2- if you talk about "material resources", it can always be overcomed (by donation or cooperation). Now regarding "monitoring false positive", if you use the solution suggested at 1, you won't have to monitor anything. Again: if an IP is catched into a spamtrap (and if it's not spamtrap-poisining -- solution 1 may be sufficient), well you've done your job!!

3- OK I have an idea for this, but I won't tell it because it's not patented....Hahaha... Sorry Sir... Or maybe if I hire you as my attorney you will patent it for me? ;-)
Anyway, I think solution 1 is sufficient: if someone is using an IP that also send emails to spamtraps, it is the responsability of:
** the IP owner !
or
** the sender of the email (who is using this blacklisted server) !
Am I missing something here?...

4- I am not an attorney (even though my father is, and I'm always interested in laws). I am out of the US (=in Japan), and with all the respects I have to US laws: considering the principle of territoriality, if "you" put your DNSBL outside the US, you won't have to show up in court. (Note that "you" is between quotes, of course it should be another entity, because you could be responsible in the US if you're controlling this foreign entity... Anyway I'm sure you got my point here). And, as far as I'm concerned, there is no international treaty about the legal liability of running/using a DNSBL, am I wrong Mr Attorney? ;-)

*!*!* Some comments now *!*!*

a) yes, the "cotton in the ears" analogy refers to the idea of stopping harvesters at the website level. Let's just continue to monitor them! (see below).

b) you said: "I want to stop anyone in the spam chain: spammers, harvesters, spam hosts, dictionary attackers, whoever". Hum, sorry to crush your dreams, but it's utopic. I will give you another analogy. Let's imagine that you are a Doctor/Physician and you're making research against the flu virus. What you're telling me now is: you're a doctor that wants to:
level 1- stop the flu virus in the air (harvesters)
level 2- stop the flu virus to get into the body (spammers using spamhosts)
level 3- stop the disease in the body (spam emails)
level 4- observe the virus effects/origins (monitoring only -- ProjectHoneyPot).

When you're going to see the doctor, you JUST want level 3 (try asking your doctor "but why can't you just stop the virus in the air?? I'm tired to come & see you" -- get back to me with his/her reply).
Level 2 can be achieved with a vaccine (DNSBL?...).
=> So are you still going for level 1 now??...
===> Anyway, for now you're only a Doctor at level 4...
====> Let's make an extra level 5 for you: developping a new microscope that stops doing level 4... (= HTTP:BL). It's downgrading!!!!!!!!!!!!!!!!!

*!*!*
PS: ah, I made a hard effort not to be grumpy... :-P Even though I feel that you're only browsing this message board to get solutions/suggestions to your problems... :-| Did I feed you well today Sir?? :-D
Now make the data public...

Post Edited (22 Aug 06 2:48pm)

Re: I Got Spam

Author: M.Prince (22 Aug 06 6:00pm)

Hi D.Less-Grumpy:
Five quick things:

1) There are lots of "doctors" on the Internet running RBLs, if you're looking for a quick fix I'd suggest you visit one of those. I particularly like SURBL. That Jeff Chan has quite a bedside manner and can cure a good portion of "flu" cases right quick.

2) How cool would it be if someone were really working on stopping the flu while it was still in the air. I guess I've always kinda wanted to be that guy, not just another doctor.

3) Shutting down new members just so we can setup yet-another plain vanilla RBL? You're really a cut-off-your-nose-to-spite-your-face kinda person, aren't you?

4) Would that it were the case that we could simply setup shop in Nauru and absolve ourselves of legal liability. Unfortuantely, while I've heard Nauru is a beautiful country, I don't think I can convince our developers to pack up their bags and head out. And, so long as we're located in the US, legal liability can attach to us. Check with your dad, he'll confirm.

Incidentally, the Nauruians are BIG Project Honey Pot supporters... seriously.... probably more honey pots installed per capita than any other country. At one point they donated the country's entire TLD (.nu, I think) to the Project. We convinced them that wasn't such a good idea, although I kinda wish we hadn't.

5) You're right, you caught me! I am only browsing the message boards for suggestions on how to make Project Honey Pot better. Damn, gotta stop doing that....

Matthew.

Re: I Got Spam

Author: D.Grumpy (22 Aug 06 7:37pm)

Hi Again Mr Prince,

1) I'm already using the SURBL, thanks for the tip.

2) Ok, I guess that this settled the debate. We've ended up on a utopian, guys... What do we say in such circumstances? "Good luck~". I guess your favorite author is Thomas More?

3) At least I imagined a solution for YOUR concern... And like the King ruling over his land, I can only accept your decision (to reject it). Please forgive me your Highness for such a bad advice.
But after having a dozen of spammers poisoning your project, you won't have any other alternative~.

4) Hum... I also thought about a real foreign entity buying the data from you (let's remain legal here -- & I'm not responsable for what you do~). BUT since you want to keep control of everything (the english word for that is "totalitarianism"?), in my mind: the only foreign entity being able to contract with you will be an entity MaDe by you (because you don't want to share your data if the way they are used is not the way YOU want~). Again, the data are collected from/by the public (but not FOR the public, see below).

5) Finally, I can see that this project is not for the sake its members. I've been warned, thank you, now I can choose to leave or not.

Post Edited (22 Aug 06 8:08pm)

Re: I Got Spam

Author: D.Burkhalter (27 Apr 07 7:49pm)

I don't think your understand the problem. Most spam is sent by botnets. One infected computer in a botnet can send one MILLION emails in a day. Of course the RBL's blacklist the computer within one day. But the botnets always find more computers to infect so it's a vicious never ending cycle.

I ran a mail server for two years for my little company. I wrote software that attempted to track all that spam. 95% of the spam was one-shot spam. Realizing that it was a pointless exercise, I let someone else manage my mail for me.

A really good use of the data would be to share this info with the offending ISP. The worst ISP's were the cable companies (comcast, RR, and Shaw.) They should suspend that account until the owner secures his computer.

Mr. Prince, I'd suggest that you look into that with Jeff Chen. Maybe you guys can work together to get the crappy ISP's to do their part in making the internet a safer place and take a tougher stance on spambots. If the ISP isn't being a good netizen, the whole domain needs to be blocked.

D Burkhalter

Re: I Got Spam

Author: M.Prince (28 Apr 07 5:40pm)

D. Burkhalter: I don't know if you saw, but on Friday we announced a Monitor service where ISPs and other owners of IP space can watch for malicious behavior. Here's info:

http://www.projecthoneypot.org/5days_friday.php

Matthew.

Re: I Got Spam

Author: D.Burkhalter (1 May 07 7:38pm)

Thanks matthew.

D Burkhalter