/oss/haraka

The basics

Email is a ubiquitous fixture of computing. Unlike other widely deployed and used digital systems (like the web, for instance), it is extremely difficult to self-host. This is due largely to two facts:

  1. it has a very long and complex history
  2. it has a few entrenched stakeholders with little incentive to welcome newcomers

SMTP (the main protocol by which email servers exchange mail with other servers) was designed for a simpler time and has been extended as new needs emerged. It deals with a variety of encodings to allow different kinds of computer systems to interact, it authenticates remote clients according to a variety of schemes, and it validates messages in an attempt to reduce the amount of spam (or unsolicited messages) that gets received or forwarded. Minor misconfigurations in a mail server can result in the inability to receive or send messages, and it can require quite a bit of expertise to get it right.

To further complicate matters, most of the world's email is sent and received by a small number of companies, most of which are very happy with their privileged positions. Even when an email administrator does everything correctly, these providers may refuse to interoperate with independent providers for reasons that are often unclear. Even when mails are delivered, messages from unfamiliar servers are often routed into their recipients spam folders. Many admins simply give up and use one of the big email providers, even if they host literally every other technology they rely on.

Haraka to the rescue

Fortunately, it is much easier to receive email than to send it, and there are still many practical applications for having an SMTP server that only listens. I use Haraka, an SMTP server written in NodeJS (server-side JavaScript) for exactly that purpose.

Haraka is relatively easy to set up. It uses a plugin system that is easily extensible, and since it's written in JavaScript it can be easily inspected and modified by a very large number of programmers. I adapted some examples from their documentation and wrote the following plugin, which receives mails and writes them directly to the filesystem according to the Maildir format, where every mail is stored in its own file.

var fs   = require('fs');
var rand = function () {
    return Math.floor(Math.random() * Number.MAX_SAFE_INTEGER);
};

exports.hook_queue = function(next, connection) {
    var dest = '/home/ansuz/Mail/new/mail-' + rand() + '.eml';
    var ws = fs.createWriteStream(dest, {flags:'a'});
    ws.once('close', function () {
        return next(OK);
    });
    connection.transaction.message_stream.pipe(ws);
};

Since the Maildir format just uses files and folders in particular structure it's very simple to check mails with a variety of tools. I use Mutt to read and erase mail from the server's command line interface. A separate program could be used to authenticate remote users and serve the maildir over the internet via one of the protocols that email clients like Thunderbird use, but I haven't bothered with this (yet).

a screenshot of the Mutt CLI email client

Checking mail with Mutt

Tracking which services leak emails

Like most people, I do have email addresses with the big providers, but I still use this server for a number of useful tricks. The first is that it's quite easy to accept mail for any domain and address. Gmail and a number of other providers accept mail for variations of your email address. For example, if you typically use the bobby@gmail.com you can sign up for an online service with bobby+service@gmail.com. If you start receiving a lot of mail for that address you'll be able to tell who shared, sold, or otherwise leaked your address. Smart (and disrespectful) service providers may realize that you're using this feature and strip the additional identifier to target your primary address, as seen below.

> "bobby+service@gmail.com".replace(/\+[^@]*@/, '@')
'bobby@gmail.com'

Services might know to expect common tricks like the +service hack, but they are unlikely to have an automated system in place for identifying and bypassing this kind of label. Addresses like service@my.website have their own problems (spam), but they serve their purpose.

Summary

With Haraka's simplicity and my experience with JavaScript means that I can write arbitrarily complex policies for receiving mail with little effort. I'd expect it to be more of a challenge for a novice programmer, but even then I think it could be a fun and rewarding personal project.

Haraka strikes a great balance of being full-featured enough to be useful in production for commercial businesses while still being approachable for amateurs.

If you have questions about my usage of Haraka or ideas for more fun tricks, contact me. Depending on when you read this I might have completed more articles for this series on my favourite open-source software.