DSPAM Notes

I've been using SpamAssassin for the last three years, and about six months ago I decided to switch to DSPAM. There were several reasons for the switch. First, SpamAssassin has a huge memory footprint, by my standards. My email is processed on a Linode host, and the "machine" only has 128MB to be shared among Apache, OpenSSH, Postfix, and all other services. SpamAssassin hogged memory and forced the machine into swap thrashing. Second, SpamAssassin's traditional detection methods are outdated, and it seems to have incorporated statistical learning methods as an afterthought. DSPAM focuses on the more successful techniques. Third, SpamAssassin has no simple retraining mechanism for incorrect classification, while DSPAM has a web interface and also accepts bounced messages.

Unfortunately, DSPAM is a beast to install. I estimate that it took me about 10 hours to get things basically working. The problem is that DSPAM's configurability is far more advanced than its documentation. Potential users of DSPAM are already using Sendmail, Postfix, Exim, QMail, or some other mail server. They may wish spam filtering to occur at any of a number of stages in the mail delivery process. They may or may not wish to use the web management interface, and they may or may not choose to allow retraining by bouncing or retraining with command-line tools. Each combination of these configuration choices produces a complex system with a variety of challenging subtleties, particularly involving file ownership and permissions.

My Setup

I installed DSPAM on a CentOS box (almost identical to Redhat Enterprise Linux 4). The MTA is Postfix. Procmail is used for the last stage of delivery. DSPAM's web interface is available, and users can email to "spam@yourdomain.com" to retrain a message as spam (as opposed to DSPAM's default of spam-username@yourdomain.com, which is impractical).

The following notes describe what I had to do to get this working. Most of the details I've included are important. Usually it should be clear why (at least to experienced system administrators). I've tried to be more descriptive when discussing the more esoterical details (like anything involving suexec). If the "why" isn't clear, please send me an email, so I can fix it.

I gathered a lot of information from the DSPAM Wiki. Go there for more information on setting things up if your mail server configuration and requirements are considerably different from mine.

Installation of DSPAM

Compile DSPAM
Download and unpack the DSPAM distribution (I got dspam-3.6.2, for the record). I couldn't find an RPM, and it was easy to compile anyway. I did "./configure --sysconfdir=/etc --with-dspam-home=/var/dspam". Some other potentially useful options include "--enable-large-scale" and "--enable-clamav". Make and make install work as expected.
Create a "dspam" user and a "dspam" group.
If you're going to be setting up the web management system (which I am pleased with), the dspam user had better have a UID above 500, and the dspam group must have a GID greater than 100. I don't think you want to recompile suexec. If you aren't using RHEL, and you think your particular suexec might have different requirements, just run "suexec -V" and look at what AP_UID_MIN and AP_GID_MIN are set to.
Create /var/dspam.
The /var/dspam directory holds user quarantines, learning files, log files, etc. It should be owned by dspam:dspam (user:group) and should be chmodded to 771, 775, 2771, or similar. You might as well create /var/dspam/data with the same ownership and permissions.
Set ownership and permissions for dspam.
/usr/local/bin/dspam (or where ever you put it) needs to be owned by root:dspam and permissions should be 2755 (setgid dspam).
Make /etc/dspam.conf.
Unless you have a good reason for another storage driver, set "StorageDriver /usr/local/lib/libhash_drv.so". You'll want to set "HashAutoExtend on" for sure.
Make /var/dspam/txt.
Edit firstrun.txt, firstspam.txt, and quarantinefull.txt to suit your needs (found in txt/ in the DSPAM distribution), and copy them to /var/dspam/txt/.

Configuring Delivery (Postfix and DSPAM)

Make Procmail setuid root.
DSPAM needs to call Procmail for delivery. If it's not setuid root, it can't run as the correct user. The ownership for procmail should be root:mail. Permissions should be 4755.
Set Procmail as DSPAM's delivery agent.
Put 'TrustedDeliveryAgent "/usr/bin/procmail"' and 'UntrustedDeliveryAgent "/usr/bin/procmail -d %u"' into dspam.conf. While you're there, set 'Preference "spamAction=quarantine"' and any other defaults you want.
Set DSPAM as Postfix's delivery agent.
Put "mailbox_command = /usr/local/bin/dspam --deliver=innocent --user $USER -- -d %u" into Postfix's main.cf.
Setup the retraining addresses.
Add 'transport_maps = hash:/etc/postfix/transport' and 'local_recipient_maps = proxy:unix:passwd.byname $alias_maps $transport_maps' to Postfix's main.cf. Create /etc/postfix/transport (with lines 'spam@yourdomain.com dspam-retrain:spam' and 'ham@yourdomain.com dspam-retrain:innocent'). Create the dspam-retrain transport method by adding the following to Postfix's master.cf:
dspam-retrain   unix    -       n       n       -       10      pipe
  flags=Ru user=dspam argv=/usr/local/bin/dspam-retrain $nexthop $sender $recipient

Setting up the Web Interface

Put the CGI files into /var/www/dspam.
The CGI files must go somewhere under /var/www, and /var/www/dspam works well for most people. If you try to put it somewhere else, suexec will refuse to run. If you aren't on RHEL, run "suexec -V" and look for AP_DOC_ROOT to find the directory under which your dspam directory will have to be. Note that suexec looks for the full path (symlinks can't be used to avoid the problem).
Install mod_auth_something_that_works_for_you.
Install mod_auth_shadow, mod_auth_imap, or some other Apache authentication module that will allow your users to authenticate to DSPAM's CGI interface.
Add /var/www/dspam to httpd.conf
Make sure to set "SuExecUserGroup dspam dspam" (I think this can only be done on the VirtualHost level, so be careful about what else is in there). Under "", you'll want to have "DirectoryIndex dspam.cgi", "Options ExecCGI", "AddHandler cgi-script .cgi", and the particular authentication options that your mod_auth_whatever requires (including "Require valid-user).

Personal Procmail Rules for DSPAM

While I think the web quarantine is great, I personally prefer to use Mutt for it. I use Procmail rules to separate the wheat from the tares (the ham from the spam). For this to work, DSPAM needs to be told to mark spam messages without quarantining them. Just add "spamAction=deliver" to your personal dspam prefs file.

The following is a simple but incomplete rule for filtering spam. It will put all spam into a mail box (maildir format, in this case, because of the trailing slash) called "spam".

:0
* ^X-DSPAM-Result: Spam
spam/

Unfortunately, not all spam is created equal. I personally prefer to dump obvious spam to a separate mail box or to /dev/null. The following rule delivers spam with a DSPAM confidence level of .85 or above to the "superspam" box and sends all other spam to the "spam" box (both in maildir format).

:0
* ^X-DSPAM-Result: Spam
{
        :0
        * ^X-DSPAM-Confidence: 0\.(9|8[5-9])
        superspam/

        :0
        spam/
}

If you want the cutoff for superspam to be .8 instead of .85, the end of the regex would become "0\.[89]". For .7, it would be "0\.[7-9]". To further customize these rules, I recommend consulting the "procmailrc" and "egrep" man pages.