Author: M.Prince (23 Jan 05 6:23pm)
I think that's a great idea. We've talked with the author of DSpam, a well-regarded Bayesian filtering system, about potentially trying to do something like this. We're happy to work with other developers and, soon, will be releasing our first corpus of spam messages we've received so far to help anyone studying the problem.
My only concern is that it may invite abuse. For example, if it became common knowledge that we were doing this, you could imagine a spammer installing a honey pot, pulling down some addresses from it, and intentionally sending legitimate mail to it -- effectively poisoning the well. If this concern could be overcome then absolutely, I think it could be a great way to pre-seed Bayesian filters with a corpus of known spam.
The other potential problem may be that we're getting a ton of phishing messages at our honey pots. These phish messages are generally very similar to legitimate bank messages, but with just one or two links changed. I'd want to make sure the Bayesian filter was smart enough that it could pick up the subtle difference, and not just start blocking every message from PayPal.
But we're definitely open to exploring ways to work with filtering companies. If there's some way our data can benefit you, please contact us and we'll see if we can help!
Thanks for the suggestion.