Open Discussion: Fighting Spam in the New World

Posted by Jeremy Voorhis Fri, 09 Jun 2006 04:39:00 GMT

Today, Joshua Harvey and I removed the Globalize project’s Trac. Why? Because we had no filtering mechanism in place, and human intervention is damn near worthless in the fight against spam.

Even before it became a problem for us, I had marveled at the low rate of spam the official Ruby on Rails trac had received, considering it is a public forum that requires no identification to post. I had the chance to chat up Dan Peterson, the Ruby on Rails systems administrator, and he disclosed that he uses mod_security to do filter the bulk of the illicit links. Kudos, Dan!

I am enumerating over the current technology for providing good spam filtering. Recently, Rick Olson has posted about his experiences with Akismet which sounds very promising. The interesting part of the post is the comments – shortly after announcing his integration of Mephisto – his attractive and streamlined bloggish CMS, or CMS-ish blog application – Akismet went kaput. Anecdotal evidence aside, let’s suppose Akismet is really great. Akismet requires you pay a licensing fee for commercial use, and building my application to depend on a third-party service just for comment filtering goes against my sense of aesthetics – any kind of application with a public form could use spam filtering, and it would make sense to make this a service which can be maintained system-wide, like a firewall.

Let’s open a discussion about what can be done to improve the situation at large. Feel free to share any links or advice about spam-fighting techniques in the comments. I have included some of mine below.

Links

Techniques

Web server

Of course, these techniques could all be implemented at the application level if you really wanted to.

  • filtering contents of POST requests with mod_security
  • throttling post requests from a particular IP
  • Deny POST requests from IPs identified on dsbl.org

Application-level

  • Integration with Akismet/other commenting services
  • Requiring comments be previewed
  • Captchas
  • Authentication required (no public comments)
    • could be less heavy-handed with OpenID, but that hasn’t been widely adopted yet

Comments

  1. Sebastian Gräßl said about 9 hours later:

    The Opinion forum engine (hom.leetsoft.com/opinion) uses a very simple but powerful techinque. Getting the Comment form only through Ajax. This helps, because spambots only scan the current dom and do not recognize when they follow a link which inserts a form.

  2. JV said 2 days later:

    Security through obscurity!

  3. Sven said 3 days later:

    I’ve written down some thoughts on this a some weeks ago when I switched to Typo:

    http://www.artweb-design.de/articles/2006/05/20/a-strategy-against-blog-spam

    Basically: I think some kind of Bayesian filter/knowledge combined with (rather) small networks could make a significant difference.

  4. Carlos said 6 days later:

    I’m rolling my own blog in Ruby on Rails (as a learning experience) and I had it in my head to somehow use SpamAssassin to classify comments according to a SCL (Spam Confidence Level).

    Here’s my thought, when a comment is received (after checks for blacklists and stuff) run it through Ruby’s interface to SpamAsssasin (spamc – http://raa.ruby-lang.org/project/spamc/) to get a SCL number.

    You could then define actions according to the SCL number. For example:

    Anything with a SCL that is lower than 4 automatically gets posted, anything with a SCL of 5 or higher gets put in a moderation queue and anything with an SCL of 8 or higher automatically gets rejected.

    SpamAssassin is fairly robust when dealing with spam in email but I don’t know how it would react to spam in comments.

    Anyway, just my two cents.

  5. Sebastian Gräßl said 12 days later:

    Akismet (http://akismet.com/) looks great. Rick Olsen integrated the Service into mephisto.

(leave url/email »)