Centralizing Email Infrastructure on AWS: Lessons from Building SESMailEngine
How we solved the distributed email chaos problem with a serverless, event-driven architecture that scales to zero
If you've ever built SaaS applications on AWS, you know the story. Email sending starts simple—a Lambda here, an SES call there. Then it becomes a tangled mess of scattered services, inconsistent bounce handling, and a sender reputation you can't quite trust.
I lived this nightmare while operating several workloads sending emails via SES. Each service had its own Lambda, each called SES directly, and each handled bounces independently. Initially, this worked. But over time, maintaining reputation and monitoring failures became a nightmare.
I needed something different: a centralized, resilient, and serverless solution that could scale to zero when idle, handle bounces automatically, manage suppression lists, and integrate seamlessly with multiple services.
This is the story of how we built SESMailEngine.
The Problem: Why Distributed Email Sending Fails
Most AWS architectures for email evolve organically:
- Service A sends password resets
- Service B sends invoices
- Service C sends notifications
Each function calls SES directly. The consequences compound:
❌ No centralized suppression list — bounced addresses get hit repeatedly
❌ No consistent bounce handling — each service reinvents the wheel
❌ No shared reputation protection — one bad actor hurts everyone
❌ Tracking scattered everywhere — debugging becomes archaeology
Even one misbehaving Lambda or forgotten check can silently hurt your SES sending reputation. And once AWS suspends your account at 5% bounce rate, you're in trouble.
The Pattern That Works: One Sender, Many Producers
Instead of letting each service send emails independently, we implemented a different pattern:
Every service requests an email to be sent, but a single system handles the sending.
This enables:
✅ Centralized suppression — one source of truth for blocked addresses
✅ Automatic bounce handling — hard bounces suppress, soft bounces retry
✅ Cross-service protection — bad addresses blocked everywhere
✅ Full audit trail — every email tracked with retry history
✅ Scale to zero — costs nothing when idle (~$0.05/month)
Architecture Overview
Here's the complete system architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ PRODUCER SERVICES │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Service A │ │ Service B │ │ Service C │ │
│ │ (Lambda) │ │ (ECS) │ │ (EC2) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ EventBridge │ ← Central event bus │
│ │ Custom Bus │ │
│ └────────┬────────┘ │
│ │ │
└─────────────────────────────┼───────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ EMAIL SENDER LAMBDA │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ 1. Parse event │ │
│ │ 2. Check suppression list (DynamoDB) │ │
│ │ 3. Check bounce rate quota │ │
│ │ 4. Load & render template (S3 + Jinja2) │ │
│ │ 5. Send via SES │ │
│ │ 6. Track in DynamoDB │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────┬───────────────────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ DynamoDB │ │ S3 │ │ SES │
│ │ │ │ │ │
│ • EmailTracking │ │ • Templates │ │ • SendEmail │
│ • Suppression │ │ • Versioning│ │ • Config Set │
│ • GSI indexes │ │ • Encryption│ │ • Event Tracing │
└──────────────────┘ └──────────────┘ └────────┬─────────┘
│
│ SES Events
▼
┌──────────────────┐
│ SNS │
│ Feedback Topic │
└────────┬─────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ FEEDBACK PROCESSOR LAMBDA │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ • Process bounces → Update tracking, add to suppression │ │
│ │ • Process complaints → Add to suppression │ │
│ │ • Process deliveries → Update tracking status │ │
│ │ • Process opens → Track engagement │ │
│ │ • Schedule retries → SQS for soft bounces │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────┬───────────────────────────────────────────────┘
│
▼
┌──────────────────┐
│ SQS Retry │
│ Queue │
│ │
│ 15-min delay │
│ Single retry │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Email Sender │ ← Processes retries
│ Lambda │
└──────────────────┘
Key Design Decisions
1. EventBridge as the Entry Point
We chose EventBridge over API Gateway or direct Lambda invocation for several reasons:
- Decoupling — Producers don't need to know about the email system's internals
- Built-in retries — EventBridge retries failed Lambda invocations automatically
- Dead letter queues — Failed events are preserved for investigation
- Batch support — Up to 10 events per API call
import boto3
import json
events = boto3.client('events')
response = events.put_events(
Entries=[{
'Source': 'my.application',
'DetailType': 'Email Request',
'EventBusName': 'sesmailengine-EmailBus',
'Detail': json.dumps({
'to': 'recipient@example.com',
'templateName': 'welcome',
'templateData': {
'userName': 'John',
'companyName': 'My Company'
}
})
}]
)
2. Jinja2 Templates in S3
Templates live in S3 with versioning enabled. Each template has three files:
templates/
└── welcome/
├── template.html # HTML body (Jinja2)
├── template.txt # Plain text fallback
└── metadata.json # Subject, sender, variables
The metadata.json supports Jinja2 variables too:
{
"subject": "Welcome to {{ companyName }}!",
"senderName": "{{ companyName }} Team",
"requiredVariables": ["userName", "companyName"]
}
3. Smart Bounce Handling
Not all bounces are equal. We handle them differently:
| Bounce Type | Action | Rationale |
|---|---|---|
| Hard bounce (NoEmail) | Immediate suppression | Address doesn't exist |
| Soft bounce (MailboxFull) | Retry once after 15 min | Temporary issue |
| 15+ soft bounces in 30 days | Permanent suppression | Problematic address |
This balanced approach provides fast feedback while protecting against truly bad addresses.
4. Bounce Rate Protection
SES suspends accounts at ~5% bounce rate. We proactively check before each send:
┌─────────────────┐ ┌──────────────────┐
│ Email Request │────▶│ Bounce Rate │
│ │ │ Check (cached) │
└─────────────────┘ └────────┬─────────┘
│
┌────────────┴────────────┐
│ │
Rate OK Rate Exceeded
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Proceed with │ │ Block Email │
│ Email Send │ │ Track as │
└───────────────┘ │ "failed" │
└───────────────┘
Results are cached for 5 minutes to optimize DynamoDB costs—saving 97%+ on queries for high-volume senders.
The Data Model
Email Tracking Table
Every email gets a tracking record with full lifecycle:
{
"emailId": "email-123456789",
"toEmail": "recipient@example.com",
"templateName": "welcome",
"status": "delivered",
"sesMessageId": "0000014a-f896-...",
"timestamp": "2024-12-14T10:30:00Z",
"deliveredAt": "2024-12-14T10:31:00Z",
"openedAt": "2024-12-14T11:15:00Z",
"openCount": 3,
"retryAttempt": 0,
"originalEmailId": "email-123456789",
"ttl": 1710432000
}
Status Lifecycle
┌──────┐ ┌───────────┐ ┌──────────┐
│ sent │────▶│ delivered │────▶│ opened │
└──────┘ └───────────┘ └──────────┘
│
├───────▶ soft_bounced ───▶ delivered (retry succeeded)
│ │
│ └───────────▶ failed (retry exhausted)
│
└───────▶ bounced (permanent - suppressed)
Suppression Table
Simple and effective—if an email exists here, don't send:
{
"email": "bounced@example.com",
"suppressionType": "bounce",
"reason": "hard-bounce",
"addedToSuppressionDate": "2024-12-14T15:45:00Z"
}
Integration Examples
Python
class EmailClient:
def __init__(self, event_bus_name: str):
self.events = boto3.client('events')
self.event_bus_name = event_bus_name
def send_email(
self,
to: str,
template_name: str,
template_data: dict,
metadata: dict = None,
) -> dict:
detail = {
'to': to,
'templateName': template_name,
'templateData': template_data,
}
if metadata:
detail['metadata'] = metadata
return self.events.put_events(
Entries=[{
'Source': 'my.application',
'DetailType': 'Email Request',
'EventBusName': self.event_bus_name,
'Detail': json.dumps(detail),
}]
)
# Usage
client = EmailClient('sesmailengine-EmailBus')
client.send_email(
to='user@example.com',
template_name='welcome',
template_data={'userName': 'John'},
metadata={'campaignId': 'onboarding-2024'}
)
Node.js / TypeScript
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
const client = new EventBridgeClient({});
await client.send(new PutEventsCommand({
Entries: [{
Source: 'my.application',
DetailType: 'Email Request',
EventBusName: 'sesmailengine-EmailBus',
Detail: JSON.stringify({
to: 'recipient@example.com',
templateName: 'welcome',
templateData: { userName: 'John', companyName: 'My Company' }
})
}]
}));
Performance Characteristics
We've load-tested this architecture extensively:
| Scenario | Throughput | Success Rate |
|---|---|---|
| Light load (100 emails) | 50-100/sec | 100% |
| Standard load (1,000 emails) | 100-200/sec | 100% |
| High load (10,000 emails) | 150-300/sec | 100% |
| Extreme load (50,000 emails) | 150-300/sec | >98% |
Latency Breakdown
| Stage | P50 | P95 |
|---|---|---|
| EventBridge Publish | 20-50ms | 80-150ms |
| Email Sender Lambda | 200-500ms | 800-1500ms |
| Delivery Notification | 5-15s | 20-40s |
Cost Comparison
The serverless architecture dramatically reduces costs:
| Monthly Volume | SESMailEngine | SendGrid | Mailgun |
|---|---|---|---|
| 10,000 emails | ~$0.01 | $20+ | $35+ |
| 62,000 emails | ~$0.06 | $20+ | $35+ |
| 100,000 emails | ~$3.90 | $20+ | $35+ |
Most services fall within AWS Free Tier for typical usage.
Monitoring & Observability
CloudWatch Alarms
The system creates several alarms automatically:
| Alarm | Trigger | Why It Matters |
|---|---|---|
| EventBridge DLQ Depth | Messages > 0 | Failed email requests |
| SES Bounce Rate | > 3% | Account suspension risk |
| SES Complaint Rate | > 0.05% | Spam reputation damage |
| Lambda Errors | > 50 sustained | System health issue |
Dead Letter Queues
Every failure path has a DLQ:
EventBridge → Email Sender fails → EventBridge DLQ
SQS Retry → Retry fails → Retry DLQ
SNS → Feedback Processor fails → Feedback DLQ
No email is silently lost. Ever.
Lessons Learned
1. Status Priority Matters
SES events can arrive out of order. An "open" event might arrive before "delivery". We implemented status priority to prevent overwrites:
opened(4) > delivered(3) > bounced(6) > soft_bounced(2) > sent(1)
2. Cache Aggressively, But Wisely
Bounce rate checks hit DynamoDB twice per email. Caching for 5 minutes saves 97%+ on queries while still catching rate spikes within acceptable windows.
3. Single Retry is Enough
We initially implemented 3 retries over 45 minutes. Too aggressive—a single failed email chain could permanently suppress an address. Single retry after 15 minutes provides faster feedback while letting customers decide on further action.
4. Track Everything
Every consumed event creates a tracking record. Retryable errors don't create records because the event will be processed again. This principle—no silent email loss—is fundamental.
Why Centralization Matters
By centralizing email sending:
✅ Reduce operational complexity — One system to monitor, not dozens
✅ Protect SES reputation account-wide — Bad addresses blocked everywhere
✅ Ensure consistent tracking — Full audit trail for compliance
✅ Add new producers safely — No changes to core logic needed
Key Takeaways
- Serverless architecture scales to zero — Costs nothing when idle
- Centralized sender protects reputation — One suppression list for all services
- Event-driven design enables decoupling — Producers don't know about email internals
- Retry queues ensure zero data loss — Every email tracked or preserved in DLQ
- Fully AWS-native — EventBridge, Lambda, DynamoDB, S3, SES, CloudWatch
Get Started
SESMailEngine is available as a complete solution you can deploy to your own AWS account.
The documentation includes:
- Complete setup guide with Python installer
- Integration examples for Python, Node.js, Java
- Template customization guide
- Troubleshooting common issues
- Security architecture details
Have questions about building email infrastructure on AWS? Drop a comment below—I'd love to hear about your experiences and challenges.
Top comments (0)