Author: M.Prince (8 Aug 06 12:52pm)
An interesting idea, and something we've thought about a little. Let me tell you two concerns, one specific, the other general.
The specific concern is simply technical. We've got a lot of users who have donated MX records. Already managing them is a bit of a challenge based on the distributed manner in which we've architected the system. For example, our mail servers are separate, and sometimes quite remote from, our database and web servers. The mail servers ever query the database. Instead, the database periodically accesses the mail servers and retrieves any in-coming messages that are in its queue. Because there's no way for the mail servers to know what domains were donated by what users, it would be hard to relay the information in anything very close to real-time.
We could do it in slightly-delayed time via RSYNC or something. The database query would, however, be fairly expensive. While not a big deal if it's only running a few times a day, the more real-time we make it, the more we have to hammer the DB. I'll talk to our engineers about whether there's a way we can do it more efficiently, but it would introduce a lot of complexity into an already complex process.
The more general concern is the same concern I've expressed over traditional RBL systems elsewhere on this board. Because of the nature of the Project Honey Pot system, it would be possible for a spammer to poison the incoming data feed rather easily. We have systems in place to catch it and mitigate its damage, and we may even catch it pretty quickly, but there are all sorts of problems (legal, social, technical) associated with even temporarily blocking a legitimate mail server like Amazon.com's.
The harvester data is MUCH less fragile than the spam server data, which is why we're working on creating the HTTP:BL. I think this will be a unique service and will provide real benefit to the entire Project Honey Pot community. We're also going to be sharing -- assuming after their evaluation they want it -- the URLs of spam messages we receive with the SURBL. I like the SURBL because they hand-check the URLs they include in their system in order to ensure they aren't "www.amazon.com" or whatever. This helps eliminate the problem I described above of false positives.
Creating some sort of DNSBL is something that, if you look through these boards, a NUMBER of people have requested since the inception of the Project. It is something we continuously reevaluate. However, while I'm confident if we decided to we could find ways to overcome the technical problems described above, I have yet to hear a convincing argument which allows me to believe we could overcome the more general problems. We're open to such arguments, we just haven't heard them yet.
Thanks for your input and help with the Project!