Author: M.Prince (18 Sep 06 10:35pm)
Harvesters are associated with mail servers based on email addresses. Probably easiest to explain it through the chain of events. Imagine a harvester visits a particular honey pot and is handed a spam trap email address. For example:
Harvester IP: 192.168.0.123
Spam Trap: john.smith@xyz-internet.com
Honey Pot: #123543
Timestamp: June 23, 2006 @ 4:55pm
Sometime later the same harvester could visit another honey pot:
Harvester IP: 192.168.0.123
Spam Trap: tjohnson@pillars.ragtag.com
Honey Pot: #543153
Timestamp: July 2, 2006 @ 3:22am
Since each spam trap handed out is unique and only handed out once, we know that if the particular address receives a message then it can reliably be tied back to the harvester to which it was handed. In this case, imagine our mail servers receive a message:
Connecting Server's IP: 192.168.100.5
Sending to: tjohnson@pillars.ragtag.com
Timestamp: August 20, 2006 @ 1:09pm
Now the sender's IP (192.168.100.5) can be associated with the harvester's IP (192.168.0.123). Two more scenarios. First, if another server sends a message to tjohnson@pillars.ragtag.com then we can also associate that sending IP with the original harvester IP. Second, if another sending IP sends to the john.smith@xyz-internet.com address then it too can be included in the profile of the spam servers associated with the harvester.
One thing to remember is that a significant percentage of spam being sent today is being relayed through "zombie" machines. Since more than one spammer may share the same zombie network, you'll often see multiple harvesters associated with a single spam server. In fact, the current ratio is about 50 spam servers for each harvester. That number is actually probably SIGNIFICANTLY higher in reality, and we're making some changes to the project to see a broader swath of the spam servers.
One thing you didn't ask about are dictionary attackers. They're the easiest of all to describe. Since we have a TON of domains pointing at our mail servers, we watch for what email addresses those servers receive mail at. If the address sent to is not one that we have affirmatively handed out then it is likely someone is just attacking the domain space with random usernames. If we notice a pattern over a certain period of time, then the IP gets listed as exhibiting dictionary-attack-like behavior.
Hopefully that all makes sense and answers all your questions.
|