Transferring my digital live to Google

I should actually be ashamed.. a long long time ago I designed and created my own e-mail system called DBmail. The general idea was that e-mail and e-mail provisioning is structured data and structured data should be stored in a database. This way maintenance, searching, scalability, backup and data consistency should be better covered than using a filesystem for storage (all depending on the database of course, but DBmail supports many). This, combined with the fact that I have easy access to my own online servers should have actually resulted in me using DBmail for all my e-mail needs. Well, since a couple of days, this is no longer the case.

A few months ago I talked my dad into transferring his e-mail (and domain) to Google by subscribing to Google Apps. He seems very happy since then (i’m measuring by the number of help requests me and my brothers are receiving). Since i was doing my own e-mail i was also doing my own spamfighting which started to take a lot of my time. FYI, my domain eelco.com is receiving about 20.000 unwanted messages per day. My wife’s domain marloes.info does about the same so i need to block about 40.000 messages per day just for my wife and me. Then i’m also hosting e-mail for a couple of domains for friends, so add another 20.000 to that. If i’m not blocking those messages they (the friends and family) start telling me i’m letting to much spam through. An effective remedy is called greylisting. Greylisting uses a feature of the SMTP protocol to request a resend when a message is being delivered the first time from an unique sender. If the sender is using a regular fully SMTP compliant mailserver the mail will be resend. If the sender is a spammer he probably uses a bulk sender which does not fully comply with SMTP and therefore will not send the message again. So the spam message is blocked. Problem with this approach is that a lot of messages that are sent for the first time will have a arbitrary delay.. which kind of sucks when you’re waiting for a certain subscription message to come through.

The spamfighting is actually the biggest reason i switched to Google Apps for my e-mail. Google uses it’s uge Gmail userbase to identify spam; every time you click the report spam button the Gmail system learns about your spam message and prevents it from being delivered in the future, also for other Gmail users. This works great, i’m getting even less spam then on my own server.

Another really nice feature of the Google mail platform is the search feature. It’s extremely quick and uses Google search technology to index your e-mail messages. So with this feature in mind i wanted use import my complete digital life into the Google mail datbase. Google has a nice option which uses imap to import from other mail accounts. The problem is my e-mail dates back to 1991 (still got all of it in backup) and i really wanted to be able to import that as well. First i tried a dirty hack by ‘resending’ my old e-mail to an intermediate account which i could then import using Google imap import. That didn’t work and quite a few people received really strange bounces, sorry about that folks :-)

Then i discovered the Google E-mail API method for importing e-mail. This is an XML based system which accept standard rfc822 based message in an XML envelope accompanied by a one or more labels for inclusion into the Google e-mail database. Since a lot of my old e-mail is in the Cyrus DB or Maildir format i needed to create a little script that recurses through those storage formats, identify e-mail and send them through the API into the Google db. I wanted to fresh up (actually re-learn) my Python skills so i did the whole thing in a Python script.

You can download the script (GoogleMailPy) here. It’s quite dirty but it works and it supports the Google response messages (Google doesn’t like it when a lot of e-mail is being pushed within a certain timeframe. The script takes this in to account and uses the Google Double Back strategy to Play Niceā„¢). The script defaulty adds two labels: GoogleMailPy (to identify all mail processed by the script) and the directory path in which the original e-mail message was found (with cyrus and Maildir this is also the folder path in your e-mail so it makes kind of sense to use this as temporary label until you find a better one).

GoogleMailPy uses no external Python libs so it should work out-of-the box. I’ve put in some comments in the source if you’d like to put in some feature of your own. I’ve just imported about 32000 messages and everything seems to be working ok. Using the script is of course totally at your own risk and it’s probably stuffed with bugs :-)

Usage is easy:

1. first set the right user credentials in the script. Check for the SETTING comments (There are three, one for turning certain labeling features on or off, one for the user credentials and one for the right userpath).
2. Just call the script with a Maildir of Cyrus DB directory path.

Please let me know about your experiences in the comments of this post.
I’m now checking out the other Google APPS (especially the wordprocessor and spreadsheet). We might connect it to our new client portal at the office to generate and share reports. More on that later.