The War on Spam

arrowlogo.gif Crafty spammers wage a high-tech game of cat-and-mouse, and about once every year I spend some time tweaking and updating filters to respond to new threats. While mortgage, pornography, and virus e-mails are now effectively caught, there’s an insurgence of recent pharmaceutical junk mail that my filters haven’t quite been able to cope with. Until now.

SpamAssassin 3.0 includes new static rules, and with it a rule set to detect common pill spam. It’s also a major upgrade from my previous 2.6.3 installation, and comes with a host of other features. Here’s an overview of the techniques I use to fight, and hopefully win, the war against spam. As a bonus, all of these tools seamlessly integrate with SpamAssassin:

  • Pyzor is collaborative, networked system to detect and block spam using identifying digests of messages. It is written in Python.
  • Vipul’s Razor is another very popular distributed spam catalog and filter system. Detection is done with statistical and randomized signatures that efficiently spot mutating spam content.
  • Distributed Checksum Clearinghouse is a system of millions of users, tens of thousands of clients and more than 250 servers collecting and counting checksums related to more than 150 million mail messages on week days. The counts can be used by SMTP servers and mail user agents to detect and reject or filter spam or unsolicited bulk mail.
  • The Sender Policy Framework is rapidly gaining momentum with large e-mail providers such as American Online and Google. It should help minimize joe-jobs from large providers.
  • Spamhaus provides the SBL, a realtime database of IP addresses of verified spam sources (including spammers, spam gangs and spam support services), maintained by the Spamhaus Project team and supplied as a free service to help email administrators better manage incoming email streams.
  • Spamhaus also offers the URIDNSBL. While e-mail headers can be forged, spammers can’t hide the the URLs of the web sites that they are promoting. The URIDNSBL plugin checks URLs in message bodies against online blacklists of spammer-operated web sites. It’s very effective, and it’s nailing just about any spam that comes my way.
  • Hashcash detects ham, not spam. A hashcash stamp constitutes a proof-of-work which takes a parameterizable amount of work to compute for the sender. The recipient can verify received hashcash stamps efficiently. It takes about two to three seconds of processing time per e-mail to compute a valid hashcash stamp.
  • And finally, one of my personal favorites, though it’s been less and less effective against spam each year. The Bayesian classifier tries to identify spam by looking at words or short character sequences that are commonly found in spam or ham. The classifier must be personally trained for maximum effectiveness.

While I doubt that spam will ever go away, as long as it doesn’t land in my inbox, that’s good enough for me. Bring it on.

3 thoughts on “The War on Spam

  1. Funny touch with the bring it on link. I saw it and thought, ‘What could that link to?’

  2. I briefly thought about making it link to my e-mail address, but that’s already plastered all over my site. Do you have a weblog?

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>