SamxtnZ

Posted on Apr 29

I spent 2 weeks building a free Python script to migrate IMAP email — here's what I learned

#devops #opensource #python

I needed to migrate a full email account from one server to another.

Simple task, right?

Three days later I had tried four different tools, corrupted two MBOX files, and accidentally uploaded 8,000 emails twice to the destination server with wrong timestamps.

So I built my own. It took three weeks to get right. Here is everything I had to figure out — and the mistakes that almost nobody talks about.

The problem with existing tools

Most email migration tools fall into one of three categories:

Cloud-based services that route your emails through their servers. You pay monthly, and your data touches infrastructure you don't control.
Desktop apps that cost $50–$200 for a one-time task and lock you to Windows.
Generic Python IMAP scripts on Stack Overflow that handle the happy path and break the moment your provider does something non-standard.

I wanted something that ran locally, cost nothing, handled the edge cases correctly, and worked on any operating system.

The five things I had to get right

1. Aruba uses two different IMAP hostnames

If you are in Italy and use Aruba email, you have probably hit this. Aruba has two different IMAP endpoints depending on your account type:

imaps.aruba.it
imap.aruba.it

Most tools hardcode one. If yours is the other, you get a connection error with no useful explanation. The fix is simple — try both in sequence:

IMAP_SERVERS = {
    'aruba.it': ['imaps.aruba.it', 'imap.aruba.it'],
}

for hostname in IMAP_SERVERS.get(domain, []):
    try:
        imap = imaplib.IMAP4_SSL(hostname, 993)
        imap.login(email, password)
        return imap  # success — stop trying
    except Exception:
        continue     # try next hostname

This pattern works for any provider that might have multiple endpoints.

2. MBOX files silently corrupt if you skip one line of code

The MBOX format uses a line starting with From (with a space) as a message separator. The problem: email bodies can also contain lines that start with From. When a reader parses the MBOX file, it treats those lines as message boundaries and splits your email into phantom messages.

The fix is to escape those lines by prepending >:

email_bytes = email_message.as_bytes()

escaped = b'\n'.join(
    b'>' + line if line.startswith(b'From ') else line
    for line in email_bytes.split(b'\n')
)

Every line inside the message body that starts with From becomes >From. Compliant readers know to strip the > when displaying. Non-compliant readers at least don't corrupt the file structure.

Most MBOX implementations I tested — including popular Python scripts — skip this. You only notice the corruption when you try to import the file somewhere and get double the message count.

3. Email timestamps break without timezone preservation

When you upload a message to a new IMAP server using the APPEND command, you can pass an internaldate — the timestamp that tells the server when the email was received.

The naive approach:

# WRONG — loses timezone, shifts timestamps
date_tuple = email.utils.parsedate(msg['Date'])
internaldate = imaplib.Time2Internaldate(time.mktime(date_tuple))

time.mktime() assumes local timezone. If your server is in UTC and your emails have a +0200 offset, every message lands two hours off. For 12,000 emails, that is 12,000 wrong timestamps in the destination's sort order.

The correct approach:

from email.utils import parsedate_to_datetime

date_header = msg.get('Date', '')
internaldate = None

if date_header:
    try:
        dt = parsedate_to_datetime(date_header)
        # Only use if timezone-aware
        # Naive datetimes get discarded — server assigns current time
        # which is wrong, but less wrong than a shifted timestamp
        if dt.tzinfo is not None:
            internaldate = dt
    except Exception:
        pass

imap.append(folder, None, internaldate, raw_message)

parsedate_to_datetime() returns a timezone-aware datetime when the header includes offset information. Passing that directly to imaplib.append() preserves the original timestamp correctly.

4. Duplicate messages accumulate if you can re-run

If a migration gets interrupted — connection drops, rate limit hit, power cut — and you restart, a naive script uploads everything again. You end up with every email twice on the destination.

The solution is to fingerprint every existing message on the destination before starting:

def _message_fingerprint(msg):
    # Message-ID is globally unique per RFC 2822
    mid = (msg.get('Message-ID') or '').strip()
    if mid:
        return mid

    # Fallback for messages without Message-ID
    date    = (msg.get('Date')    or '').strip()
    sender  = (msg.get('From')    or '').strip()
    subject = (msg.get('Subject') or '').strip()
    size    = str(len(msg.as_bytes()))
    raw     = f"{date}|{sender}|{subject}|{size}"
    return "fp:" + hashlib.sha1(raw.encode()).hexdigest()

Before uploading anything, fetch headers-only from the destination and build a set of fingerprints:

status, data = imap.fetch(id_range, '(BODY.PEEK[HEADER])')
existing_fingerprints = {_message_fingerprint(msg) for msg in parsed}

Then skip any message whose fingerprint is already in the set. The migration becomes safe to re-run as many times as needed.

5. Memory usage blows up if you load the mailbox into RAM

The obvious approach — fetch all messages, store them in a list, iterate — works fine for small mailboxes. For a 50,000-email account it uses gigabytes of RAM and often crashes.

The fix is generators. Instead of:

# BAD — loads everything into memory
messages = []
for eid in all_ids:
    raw = imap.fetch(eid, '(RFC822)')
    messages.append(raw)

for raw in messages:
    process(raw)

Use:

# GOOD — one message in memory at a time
def collect_messages(imap, all_ids):
    for eid in all_ids:
        status, data = imap.fetch(eid, '(RFC822)')
        if status == 'OK' and data and data[0]:
            yield data[0][1]

for raw in collect_messages(imap, all_ids):
    process(raw)

A 200,000-email mailbox uses the same RAM as a 200-email one. The generator yields one message, it gets processed and written to disk, memory is released, next message comes in.

The result

After three weeks of building and testing across Gmail, Outlook, Aruba, Yahoo, and iCloud, I packaged everything into a single Python file called Please Backup.

What it does:

Backup any IMAP mailbox to EML (one file per email) or MBOX format
Migrate between any two IMAP servers with deduplication
Full migration mode — backup and upload in one run, credentials entered once
Zero external dependencies — pure Python standard library
Runs on Windows, macOS, Linux — anywhere Python 3.7+ is installed

$ python please_backup.py

╔══════════════════════════════════════════════════╗
║  Please Backup · Created By SamxtnZ · 2026      ║
╚══════════════════════════════════════════════════╝

What would you like to do?
  1. 📥 Download / Backup emails from a server
  2. 📤 Upload / Migrate emails to a new server
  3. 🔄 Full migration (backup then upload)

MIT licensed. Free forever. No cloud involved — your emails go directly from the source IMAP server to your disk or destination server.

GitHub: github.com/SamxtnZ/please-backup

What I would do differently

If I started over, two things:

Add OAuth2 support from day one. Gmail and Outlook are actively deprecating App Passwords for IMAP. Right now the tool works with App Passwords, but that window is closing. OAuth2/XOAUTH2 is the right long-term solution and it is significantly more complex to implement.

Add a resume/checkpoint file. If a 50,000-email migration is interrupted at message 49,000, the deduplication prevents re-uploading what already made it — but the backup phase restarts from zero. Writing a checkpoint file every 500 messages would solve this cleanly.

Both are on the roadmap. If you want to contribute, pull requests are open.

Key takeaways

If you are building anything with IMAP:

Try multiple hostnames per domain — providers like Aruba use more than one
Escape From lines in MBOX output — this is not optional, it is spec
Use parsedate_to_datetime() and check tzinfo before passing to append()
Fingerprint before uploading — Message-ID first, SHA1 fallback
Stream with generators — never load a full mailbox into a list

These five things are the difference between a script that works in testing and a tool that works on real email accounts in production.

Please Backup is open source and MIT licensed.
GitHub: github.com/SamxtnZ/please-backup
Issues, pull requests, and provider compatibility reports welcome.

DEV Community