Before you get angry — they all got promoted
Let me explain.
A logistics company was paying 3 people to manually copy shipping data from emails into spreadsheets. 8 hours a day. Every day.
The error rate was ~5%. Each error cost them $200-$500 in misrouted shipments.
I wrote a Python script that:
- Reads incoming emails via IMAP
- Extracts shipping details with regex patterns
- Validates against their database
- Exports to their ERP system via API
- Flags anything it's not confident about for human review
The 3 data entry people became the "quality assurance team" — they now review the 2% of entries the bot flags, handle exceptions, and train the system on new email formats.
Result: Error rate dropped from 5% to 0.3%. Processing time from 8 hours to 12 minutes.
The actual code pattern
I can't share the client's code, but here's the stripped-down pattern:
import imaplib
import email
import re
import csv
from datetime import datetime
def connect_mail(server, user, password):
mail = imaplib.IMAP4_SSL(server)
mail.login(user, password)
mail.select('inbox')
return mail
def extract_shipping_data(body):
patterns = {
'tracking': r'(?:tracking|track)[:#\s]*(\w{10,30})',
'weight': r'(\d+\.?\d*)\s*(?:kg|lbs)',
'destination': r'(?:ship to|deliver to)[:\s]*(.+?)(?:\n|$)',
'date': r'(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})',
}
data = {}
for field, pattern in patterns.items():
match = re.search(pattern, body, re.IGNORECASE)
data[field] = match.group(1).strip() if match else None
# Confidence score: how many fields did we extract?
data['confidence'] = sum(1 for v in data.values() if v) / len(patterns)
return data
def process_emails(mail):
_, nums = mail.search(None, 'UNSEEN', 'SUBJECT "shipping"')
results = []
for num in nums[0].split():
_, data = mail.fetch(num, '(RFC822)')
msg = email.message_from_bytes(data[0][1])
body = msg.get_payload(decode=True).decode('utf-8', errors='ignore')
shipping = extract_shipping_data(body)
shipping['from'] = msg['From']
shipping['received'] = msg['Date']
if shipping['confidence'] >= 0.75:
results.append(('auto', shipping))
else:
results.append(('review', shipping))
return results
def save_results(results):
timestamp = datetime.now().strftime('%Y%m%d_%H%M')
with open(f'auto_{timestamp}.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['tracking','weight','destination','date','from','received','confidence'])
writer.writeheader()
for status, data in results:
if status == 'auto':
writer.writerow(data)
review = [d for s, d in results if s == 'review']
if review:
with open(f'review_{timestamp}.csv', 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=['tracking','weight','destination','date','from','received','confidence'])
writer.writeheader()
writer.writerows(review)
print(f'⚠️ {len(review)} entries need human review')
auto = [d for s, d in results if s == 'auto']
print(f'✅ {len(auto)} entries processed automatically')
# Run
mail = connect_mail('imap.gmail.com', 'shipping@company.com', 'app-password')
results = process_emails(mail)
save_results(results)
The uncomfortable truth about automation
Automation doesn't eliminate jobs. It eliminates tasks.
The 3 people who used to do data entry are now:
- Person 1: Manages the exception queue + trains the system
- Person 2: Moved to customer support (knows the data inside out)
- Person 3: Became the junior data analyst (was already noticing patterns in the data)
All three got raises.
What I learned
- Confidence scores are everything. Don't try to automate 100%. Automate the 90% that's easy and flag the rest.
- Start with the regex, not ML. For structured data like emails, regex + validation beats any ML model.
- The humans become auditors. They're now more valuable because they handle the hard cases.
I build automation tools for a living — 77 tools on Apify, hundreds of custom scripts.
What's the most impactful automation you've built or seen? Not the most technically complex — the one that had the biggest real-world impact.
If your team is drowning in manual data processing, reach out. I specialize in exactly this.
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
Top comments (0)