This article contains affiliate links. I may earn a commission at no extra cost to you.
title: "Build a Smart Email Classifier with Python and Hugging Face in 15 Minutes"
published: true
description: "Learn to automate email management with AI classification using Python, Hugging Face transformers, and Gmail API"
tags: ai, python, automation, email, huggingface
cover_image:
Build a Smart Email Classifier with Python and Hugging Face in 15 Minutes
If you're drowning in emails like most business owners, you've probably wondered: "Can AI actually help me organize this chaos?" The answer is yes, and it's easier than you think.
Today, we'll build a practical email classifier that automatically sorts incoming messages into categories like urgent, spam, support, and sales. No machine learning PhD required – just Python and some clever use of pre-trained models.
What We're Building
Our email classifier will:
- Use Hugging Face's pre-trained models for text classification
- Connect to Gmail via API to fetch real emails
- Categorize messages into business-relevant buckets
- Run as a lightweight background service
- Cost less than $10/month for most small businesses
Prerequisites
You'll need:
- Python 3.8+
- A Gmail account with API access enabled
- Basic familiarity with Python and APIs
Step 1: Setting Up Hugging Face Transformers
First, let's install our dependencies:
pip install transformers torch google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client
Now, let's create our email classifier using a pre-trained model:
from transformers import pipeline
import torch
class EmailClassifier:
def __init__(self):
# Use a lightweight model optimized for text classification
self.classifier = pipeline(
"zero-shot-classification",
model="facebook/bart-large-mnli",
device=0 if torch.cuda.is_available() else -1
)
# Define our business categories
self.categories = [
"urgent customer issue",
"spam or promotional",
"customer support request",
"sales inquiry",
"internal communication",
"newsletter or update"
]
def classify_email(self, subject, body):
# Combine subject and first 500 chars of body for classification
text = f"{subject} {body[:500]}"
result = self.classifier(text, self.categories)
# Return the top prediction with confidence score
return {
'category': result['labels'][0],
'confidence': result['scores'][0],
'all_scores': dict(zip(result['labels'], result['scores']))
}
# Test our classifier
classifier = EmailClassifier()
# Example email
test_subject = "URGENT: Website is down, customers can't checkout"
test_body = "Hi team, we're getting reports that our e-commerce site is completely inaccessible. This is affecting sales immediately."
result = classifier.classify_email(test_subject, test_body)
print(f"Category: {result['category']}")
print(f"Confidence: {result['confidence']:.2f}")
Step 2: Connecting to Gmail API
To work with real emails, we need Gmail API access. First, enable the Gmail API in your Google Cloud Console and download your credentials file.
import pickle
import os
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
import base64
import email
class GmailConnector:
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']
def __init__(self, credentials_file='credentials.json'):
self.service = self._authenticate(credentials_file)
def _authenticate(self, credentials_file):
creds = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
credentials_file, self.SCOPES)
creds = flow.run_local_server(port=0)
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
return build('gmail', 'v1', credentials=creds)
def get_recent_emails(self, max_results=10):
"""Fetch recent unread emails"""
try:
results = self.service.users().messages().list(
userId='me',
q='is:unread',
maxResults=max_results
).execute()
messages = results.get('messages', [])
emails = []
for message in messages:
msg = self.service.users().messages().get(
userId='me',
id=message['id']
).execute()
email_data = self._parse_email(msg)
emails.append(email_data)
return emails
except Exception as error:
print(f'An error occurred: {error}')
return []
def _parse_email(self, message):
"""Extract subject, sender, and body from Gmail message"""
headers = message['payload'].get('headers', [])
subject = next((h['value'] for h in headers if h['name'] == 'Subject'), 'No Subject')
sender = next((h['value'] for h in headers if h['name'] == 'From'), 'Unknown Sender')
# Extract body text
body = self._get_email_body(message['payload'])
return {
'id': message['id'],
'subject': subject,
'sender': sender,
'body': body
}
def _get_email_body(self, payload):
"""Recursively extract email body text"""
body = ""
if 'parts' in payload:
for part in payload['parts']:
body += self._get_email_body(part)
else:
if payload.get('mimeType') == 'text/plain':
data = payload.get('body', {}).get('data')
if data:
body = base64.urlsafe_b64decode(data).decode('utf-8')
return body
Step 3: Putting It All Together
Now let's create our main email processing script:
import time
import json
from datetime import datetime
class SmartEmailProcessor:
def __init__(self):
self.classifier = EmailClassifier()
self.gmail = GmailConnector()
self.processed_emails = set()
def process_new_emails(self):
"""Process unread emails and classify them"""
emails = self.gmail.get_recent_emails(max_results=20)
results = []
for email_data in emails:
if email_data['id'] not in self.processed_emails:
classification = self.classifier.classify_email(
email_data['subject'],
email_data['body']
)
result = {
'timestamp': datetime.now().isoformat(),
'email_id': email_data['id'],
'sender': email_data['sender'],
'subject': email_data['subject'],
'category': classification['category'],
'confidence': classification['confidence']
}
results.append(result)
self.processed_emails.add(email_data['id'])
# Log high-confidence urgent emails
if ('urgent' in classification['category'].lower() and
classification['confidence'] > 0.8):
print(f"🚨 URGENT EMAIL DETECTED: {email_data['subject']}")
return results
def run_continuous(self, check_interval=300): # 5 minutes
"""Run as a background service"""
print(f"Starting email classifier service...")
print(f"Checking for new emails every {check_interval} seconds")
while True:
try:
results = self.process_new_emails()
if results:
print(f"Processed {len(results)} new emails")
# Here you could save to database, send notifications, etc.
time.sleep(check_interval)
except KeyboardInterrupt:
print("Service stopped by user")
break
except Exception as e:
print(f"Error processing emails: {e}")
time.sleep(60) # Wait a minute before retrying
# Run the processor
if __name__ == "__main__":
processor = SmartEmailProcessor()
# Test with current emails
results = processor.process_new_emails()
for result in results:
print(f"📧 {result['subject'][:50]}... -> {result['category']} ({result['confidence']:.2f})")
# Uncomment to run as service
# processor.run_continuous()
Step 4: Deployment as a Background Service
For production use, create a simple systemd service file (/etc/systemd/system/email-classifier.service):
[Unit]
Description=Smart Email Classifier
After=network.target
[Service]
Type=simple
User=your-username
WorkingDirectory=/path/to/your/script
ExecStart=/usr/bin/python3 /path/to/your/script/email_processor.py
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
Then enable and start:
sudo systemctl enable email-classifier
sudo systemctl start email-classifier
Cost Analysis and Scaling
For Small Businesses (< 1000 emails/day):
- Compute costs: ~$5-15/month (small VPS)
- Gmail API: Free (up to 1 billion quota units/day)
- Total: Under $20/month
Performance Expectations:
- Processing speed: ~2-3 emails per second
- Accuracy: 85-92% for clear categories
- Memory usage: ~500MB-1GB depending on model
Scaling Tips:
- Use smaller models like
distilbert-base-uncased-finetuned-sst-2-englishfor faster processing - Implement email batching for high-volume scenarios
- Add caching for repeated sender patterns
- Consider fine-tuning on your specific email patterns
Real-World Improvements
Once you have the basics working, consider these enhancements:
- Custom categories: Train on your specific business emails
- Integration: Connect to Slack, Teams, or your CRM
- Smart routing: Automatically forward urgent emails
- Analytics: Track email patterns and response times
Conclusion
In just 15 minutes, we've built a practical AI-powered email classifier that can genuinely improve your email workflow. The beauty of using pre-trained models is that you get sophisticated text understanding without the complexity of training your own models.
This isn't just a tech demo – it's a real solution that small businesses are using today to stay on top of customer communications. The key is starting simple and iterating based on your actual email patterns.
Try it out with your own emails and see what patterns emerge. You might be surprised at how well AI can understand the nuances of business communication.
What email automation challenges are you facing? Share your experiences in the comments below!
Top comments (0)