DEV Community

Alex Spinov
Alex Spinov

Posted on

I Replaced a 3-Person Data Entry Team With 200 Lines of Python

Before you get angry — they all got promoted

Let me explain.

A logistics company was paying 3 people to manually copy shipping data from emails into spreadsheets. 8 hours a day. Every day.

The error rate was ~5%. Each error cost them $200-$500 in misrouted shipments.

I wrote a Python script that:

  1. Reads incoming emails via IMAP
  2. Extracts shipping details with regex patterns
  3. Validates against their database
  4. Exports to their ERP system via API
  5. Flags anything it's not confident about for human review

The 3 data entry people became the "quality assurance team" — they now review the 2% of entries the bot flags, handle exceptions, and train the system on new email formats.

Result: Error rate dropped from 5% to 0.3%. Processing time from 8 hours to 12 minutes.


The actual code pattern

I can't share the client's code, but here's the stripped-down pattern:

import imaplib
import email
import re
import csv
from datetime import datetime

def connect_mail(server, user, password):
    mail = imaplib.IMAP4_SSL(server)
    mail.login(user, password)
    mail.select('inbox')
    return mail

def extract_shipping_data(body):
    patterns = {
        'tracking': r'(?:tracking|track)[:#\s]*(\w{10,30})',
        'weight': r'(\d+\.?\d*)\s*(?:kg|lbs)',
        'destination': r'(?:ship to|deliver to)[:\s]*(.+?)(?:\n|$)',
        'date': r'(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})',
    }
    data = {}
    for field, pattern in patterns.items():
        match = re.search(pattern, body, re.IGNORECASE)
        data[field] = match.group(1).strip() if match else None

    # Confidence score: how many fields did we extract?
    data['confidence'] = sum(1 for v in data.values() if v) / len(patterns)
    return data

def process_emails(mail):
    _, nums = mail.search(None, 'UNSEEN', 'SUBJECT "shipping"')
    results = []

    for num in nums[0].split():
        _, data = mail.fetch(num, '(RFC822)')
        msg = email.message_from_bytes(data[0][1])
        body = msg.get_payload(decode=True).decode('utf-8', errors='ignore')

        shipping = extract_shipping_data(body)
        shipping['from'] = msg['From']
        shipping['received'] = msg['Date']

        if shipping['confidence'] >= 0.75:
            results.append(('auto', shipping))
        else:
            results.append(('review', shipping))

    return results

def save_results(results):
    timestamp = datetime.now().strftime('%Y%m%d_%H%M')

    with open(f'auto_{timestamp}.csv', 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=['tracking','weight','destination','date','from','received','confidence'])
        writer.writeheader()
        for status, data in results:
            if status == 'auto':
                writer.writerow(data)

    review = [d for s, d in results if s == 'review']
    if review:
        with open(f'review_{timestamp}.csv', 'w', newline='') as f:
            writer = csv.DictWriter(f, fieldnames=['tracking','weight','destination','date','from','received','confidence'])
            writer.writeheader()
            writer.writerows(review)
        print(f'⚠️  {len(review)} entries need human review')

    auto = [d for s, d in results if s == 'auto']
    print(f'{len(auto)} entries processed automatically')

# Run
mail = connect_mail('imap.gmail.com', 'shipping@company.com', 'app-password')
results = process_emails(mail)
save_results(results)
Enter fullscreen mode Exit fullscreen mode

The uncomfortable truth about automation

Automation doesn't eliminate jobs. It eliminates tasks.

The 3 people who used to do data entry are now:

  • Person 1: Manages the exception queue + trains the system
  • Person 2: Moved to customer support (knows the data inside out)
  • Person 3: Became the junior data analyst (was already noticing patterns in the data)

All three got raises.


What I learned

  1. Confidence scores are everything. Don't try to automate 100%. Automate the 90% that's easy and flag the rest.
  2. Start with the regex, not ML. For structured data like emails, regex + validation beats any ML model.
  3. The humans become auditors. They're now more valuable because they handle the hard cases.

I build automation tools for a living — 77 tools on Apify, hundreds of custom scripts.

What's the most impactful automation you've built or seen? Not the most technically complex — the one that had the biggest real-world impact.


If your team is drowning in manual data processing, reach out. I specialize in exactly this.


More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs

Top comments (0)