Author: M.Prince (11 Apr 05 1:54pm)
Yeah, I'm surprised too how slowly the spam volume has increased. We built the infrastructure to handle from day one thousands and thousands of messages. We've received only a trickle. And, while it's trending upward, it's doing so much more slowly than I would have guessed.
Some things we've tested and believe we can statistically say. First, the construction of the spamtrap addresses themselves do not make a difference in whether they receive spam messages. In other words:
bob.smith@example.com
is as likely to get spam as:
orangeelephant34@imahpot.example.com
We've looked at the data broken down by the username, the domain name, whether the domains are two-level, three-level, or more.... regardless, it appears the same statistical percentage of addresses are sent to any of the subgroups as to the whole.
Initially we had a grey box at the bottom of every honey pot page with some text and links to the Project Honey Pot site. We tested a number of harvesters and found that some, when set in the "avoid honey pots" mode, would avoid pages with that box. That's great news if you're running a legitimate site as it gives you a way to avoid receiving spam at your current addresses, but it potentially compromised the value of the hpots. As a result, we turned off the grey box most of the time and got about a 15% increase in the number of harvesters identified per period of time. Not a huge bump up in spam, but some.
We've tried turning off the legal text to see if harvesters are scanning on that somehow and that has made no statistically difference in what addresses are picked up. We've also tracked individual addresses. While many appear to receive one message and never receive a second (typically the fraud-based messages like phishing or Nigerian 419 scams), if a spamtrap address receives, say, 5 messages then it appears to continue to receive messages at a regular clip. In other words, it doesn't look like spammers are, after the fact, culling spamtraps from their lists.
We do notice that harvesters target bigger sites much more than smaller ones. Therefore, we are working to get more hpots installed on high-traffic sites. Soon we'll be launching a free service for ISPs and businesses to monitor their IP space for spammers and harvesters in exchange for their installing a honey pot somewhere. We're also going to be launching the http:BL service, which will allow you to block access to known harvesters if you're an active member of the Project. Hopefully these services will increase the number of installed honey pots, which, in turn, may increase the amount of spam and harvester traffic.
I'm encouraged that our top spam-sending countries line up with the top spam-sending countries reported by major anti-spam vendors. That's some indication that our sample is somewhat representative. I'm also stunned that almost 5.75% of traffic to honey pots turns out to ultimately be caused by harvesters. That's a much higher percentage than I ever would have guessed, and makes me believe that it's not likely that harvesters are avoiding our pages.
My working hypothesis is that real spam volume to an address only occurs once an email address makes its way onto some of these "100M Email Addresses for $19.95" CDs. Right now it doesn't appear many (if any) of our addresses have made it there. That could be that we just haven't been online long enough. It could be that somehow the CD makers are removing our hpots before adding them.
The process of how address volume builds is hidden from the typical user since it's hard to distinguish individual spammers from each other if you're receiving all your mail at one address. For the most part, it appears that most of our addresses only receive one or two messages per day. In other words, most of the spammers we're seeing aren't completely bombarding all the addresses they have with zillions of message, but instead trying to limit the messages they send to one or two per day. This makes business sense, if you think about it. It could be that spammers only start trading lists with one another after they feel like they've totally used up their value. Maybe we haven't hit that point yet. I can't imagine it's too far off, and that's when I'd expect the real volume will begin.
....then again, I've been saying this for a while now. So if anyone else has an idea, let us know....
|