Message Board

Bugs & Development

Comment spam

Author: L.Veltkamp (30 Aug 06 11:54am)

I'm wondering what your plans for dealing with comment spam are.

I've started blocking the IPs of comment spammers and plan on cleaning out the list periodically, I've got some that are going to stay there. At the moment, I'm keeping track of IPs that hit me a lot and unscientifically searching Google for them to see if I can turn them up other sites that publicly display IPs with comments. That's enough information in cases where an IP has hit my site a lot such as one that appeared on my site 82 times in a span of two months. However, I'd like a better way of checking.

At the moment, I've got a honeypot link in my 403 error page just to see what would happen and it looks like even the worst comment spammers don't do any harvesting. When I look up the IP that spammed 82 times, I just get the message saying it might just be a harmless web spider, which I know it isn't, so I'm wondering just how that's going to work.

Re: Comment spam

Author: M.Prince (30 Aug 06 2:49pm)

Our plan, which is still in the formation stages, is to start putting forms on honey pots and watch who submits to them. Well-behaved web robots don't submit to forms. If we see submissions, we can not only record who is doing the submitting (by their IP) but also what they submit (e.g., what domain they're trying to promote). Just like now sending to a spam trap address labels you a harvester, submitting to a comment spam trap form will soon label you a comment spammer.

This information will get fed back into the HTTP:BL, which we hope to have up and running in the next few months. We're in the process of writing an Apache module that will allow you to query a DNS server and block known-bad robots -- be they comment spammers, harvesters, or whatever. (Dave's in the other room right now typing away on the C code for it.) As you very correctly note, we have a lot of "suspect" IPs that are stumbling across honey pots and doing things like comment spamming, but not harvesting, so we cannot verify they are "bad" yet. The plan is to try and provide as much informaiton on these IPs and allow you, the website admin, to make the choice about who you want to let onto your site.

I'm not sure whether the module/DNS infrastructure will be done before we start tracking comment spammers or not. However, I hope that at some point this fall we hope to have a fairly robust tool for blocking bad guys be they harvesters, comment spammers, or other misbehaving robots. As we get more people installing honey pots, we hope the tool will only grow more robust.

The current plan is to release the module under the GPL, so hopefully people will improve on it and port it to other platforms (like IIS). Again, all of this is still in the formation stages. If you have suggestions, please let us know.

Re: Comment spam

Author: L.Veltkamp (31 Aug 06 12:43am)

Sounds good. Information is mainly what I'm after. A lot of scripts have functions that will mark if a comment is spam or not, but they still sit around waiting to be approved or denied. I understand that it's to keep legitimate comments from winding up being deleted, but if I'm flooded with stuff (flood prevention doesn't seem to do much sometimes in the event there are multiple IPs), I actually wind up more likely to delete legitimate comments in addition to the spam comments. There's no fix that's going to clear up problems 100%, but that's why I like having options and actually knowing what they are.

One of the things I'd like to see is sample comments in addition to just IPs. The usual fake pharmacy and casino ads you see in e-mail spam still show up, but there are some things that seem to be exclusive to comment spam. For example, there's a lot of comment spam related to World Of Warcraft that mostly orginates in China. It not only winds up in blogs but on forum threads as well. But have I seen it in e-mail spam? Never.

I'm certainly not trying to rush anything. I've just been thinking.

Post Edited (1 Sep 06 11:45pm)

Re: Comment spam

Author: H.Nienhuys (13 Oct 06 12:49pm)

My experience is that comment spammers only target known weblog and forum software rather than using artificial intelligence to find out what data to enter into the form fields. You would have to call your url something like viewtopic.php, or put the string 'powered by Wordpress/PhpBB/etcetera' on the page and make sure it can be found with a search engine.

I have used custom-made comment pages without any spam protection and never received any comment spam through them. But once I started using one of the lesser known PHP forum packages, I started receiving loads of comment spam within a few months.

Re: Comment spam

Author: A.Timmer (17 Oct 06 2:40am)

I'm having a small arguement with a comment spammer, who's spamming my blog, at this moment. He's hard to track as he's working throught a proxy leaving all kinds of different IP's of a lot of countries in my log. This makes it hard blocking him through .htaccess, but I know it's the same one because he always has the same route on my site.

Now this is a very dumb (and so very funny) bot.

At first he tries to spam an example form on one of my pages. Unfortunally to him this form has a "blank" submission - the submit button doesn't submit anything (hehehe), the form is only there to explain somethings about forms to my visitors.

After trying this he's heading for my blog where he started spamming the comments a couple of weeks ago. It wasn't that much, and only on the older posts, so I let him go for a few days to have a look what he's doing.
Then I removed the spam and converted the submission of the comment form into external javascript. He doesn't seem to understand that because he's still visiting a couple of times a day but the spam has stopped.

In the meantime I made a nice bottrap and made a new comment form for my blog, leaving the old one there (in a hidden div) and replaced the javascript submission by the original one. This time, when trying to submit he will be redirected to the bot trap and block his IP himself.

I'll just wait and see...

Re: Comment spam

Author: L.Veltkamp (30 Apr 07 7:59pm)

Thanks for getting something going! It's just that even after reading the updates, I'm just a bit curious as to how it works. As it is, I average 10 comment (typically more, but no less) spam messages a day scattered on various scripts. Here's hoping some of them take the bait. It seems all I get are comment spammers. My stats show 482 traps issued but just one harvester. *blinks*

Re: Comment spam

Author: M.Prince (30 Apr 07 9:14pm)

We see some comment spammers, but not all of them. We see the least discriminate. So, for example, if a comment spammer is looking only for certain pages created by certain blogging software, they are unlikely to stumble across our traps. If they are looking for any possible form, then they will eventually submit to us.

The system generally works by putting trap forms on existing honey pots. If these forms are filled out and submitted, we record all the information you'd suspect. Again, this works great if 1) the comment spammers aren't looking for a particular blog page and format, and 2) there are sufficient links from your website that the comment spammers run across the honey pots.

We're looking for additional sources of data to beef up our comment spammer resources. While I think we see a good portion of the email harvester universe, we need help to get more of a sense of the comment spammers.

Re: Comment spam

Author: L.Veltkamp (1 May 07 12:48pm)

In my case, I think they're just looking for any sort of a form especially since they mess up the formatting most of the time.

What I'm trying now is redirecting from a form they like to attack. I'd been using it as sort of my own "honeypot" since only spammers seemed to be interested in commenting and I'd send addresses they'd try to advertise to URIBL but I've got enough to shift through without that script. It should keep them busy for a short while.

Post Edited (1 May 07 9:27pm)

Re: Comment spam

Author: M.Nordhoff (2 May 07 8:46am)

Some (most? all?) Web servers support going to /somescript.php/whatever/you/want, so you could try putting hidden links to like /honeypot.php/forum/viewtopic.php and put something phpBB-like on it. The VeriSpider could check if each website supports it.

Re: Comment spam

Author: L.Veltkamp (2 May 07 10:57am)

Hm...that gave me another idea. On most sites, I use randomly generated link text. I wonder if the spiders would favor "Post Comment." It sounds like a stupid idea but...well...we're dealing with some stupid software.

Re: Comment spam

Author: L.Veltkamp (14 May 07 9:58am)

So far, not much. 521 traps issued, one harvester, no comment spammers.

Mind boggling!

Edit: I finally caught one! PARTY!

Edit Again: Now I'm sad again. About 600 traps. One comment spammer, one harvester. x.x

Post Edited (25 May 07 11:33pm)