DEV Community

Cover image for Centralizing Email Infrastructure on AWS with SESMailEngine
Uros M.
Uros M.

Posted on

Centralizing Email Infrastructure on AWS with SESMailEngine

Centralizing Email Infrastructure on AWS: Lessons from Building SESMailEngine

How we solved the distributed email chaos problem with a serverless, event-driven architecture that scales to zero


If you've ever built SaaS applications on AWS, you know the story. Email sending starts simple—a Lambda here, an SES call there. Then it becomes a tangled mess of scattered services, inconsistent bounce handling, and a sender reputation you can't quite trust.

I lived this nightmare while operating several workloads sending emails via SES. Each service had its own Lambda, each called SES directly, and each handled bounces independently. Initially, this worked. But over time, maintaining reputation and monitoring failures became a nightmare.

I needed something different: a centralized, resilient, and serverless solution that could scale to zero when idle, handle bounces automatically, manage suppression lists, and integrate seamlessly with multiple services.

This is the story of how we built SESMailEngine.


The Problem: Why Distributed Email Sending Fails

Most AWS architectures for email evolve organically:

  • Service A sends password resets
  • Service B sends invoices
  • Service C sends notifications

Each function calls SES directly. The consequences compound:

No centralized suppression list — bounced addresses get hit repeatedly

No consistent bounce handling — each service reinvents the wheel

No shared reputation protection — one bad actor hurts everyone

Tracking scattered everywhere — debugging becomes archaeology

Even one misbehaving Lambda or forgotten check can silently hurt your SES sending reputation. And once AWS suspends your account at 5% bounce rate, you're in trouble.


The Pattern That Works: One Sender, Many Producers

Instead of letting each service send emails independently, we implemented a different pattern:

Every service requests an email to be sent, but a single system handles the sending.

This enables:

Centralized suppression — one source of truth for blocked addresses

Automatic bounce handling — hard bounces suppress, soft bounces retry

Cross-service protection — bad addresses blocked everywhere

Full audit trail — every email tracked with retry history

Scale to zero — costs nothing when idle (~$0.05/month)


Architecture Overview

Here's the complete system architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           PRODUCER SERVICES                                  │
│                                                                             │
│   ┌──────────────┐   ┌──────────────┐   ┌──────────────┐                   │
│   │  Service A   │   │  Service B   │   │  Service C   │                   │
│   │  (Lambda)    │   │  (ECS)       │   │  (EC2)       │                   │
│   └──────┬───────┘   └──────┬───────┘   └──────┬───────┘                   │
│          │                  │                  │                            │
│          └──────────────────┼──────────────────┘                            │
│                             │                                               │
│                             ▼                                               │
│                    ┌─────────────────┐                                      │
│                    │   EventBridge   │  ← Central event bus                 │
│                    │   Custom Bus    │                                      │
│                    └────────┬────────┘                                      │
│                             │                                               │
└─────────────────────────────┼───────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        EMAIL SENDER LAMBDA                                   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │  1. Parse event                                                      │  │
│   │  2. Check suppression list (DynamoDB)                               │  │
│   │  3. Check bounce rate quota                                         │  │
│   │  4. Load & render template (S3 + Jinja2)                           │  │
│   │  5. Send via SES                                                    │  │
│   │  6. Track in DynamoDB                                               │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────┬───────────────────────────────────────────────┘
                              │
           ┌──────────────────┼──────────────────┐
           │                  │                  │
           ▼                  ▼                  ▼
┌──────────────────┐  ┌──────────────┐  ┌──────────────────┐
│    DynamoDB      │  │      S3      │  │       SES        │
│                  │  │              │  │                  │
│  • EmailTracking │  │  • Templates │  │  • SendEmail     │
│  • Suppression   │  │  • Versioning│  │  • Config Set    │
│  • GSI indexes   │  │  • Encryption│  │  • Event Tracing │
└──────────────────┘  └──────────────┘  └────────┬─────────┘
                                                  │
                                                  │ SES Events
                                                  ▼
                                         ┌──────────────────┐
                                         │       SNS        │
                                         │  Feedback Topic  │
                                         └────────┬─────────┘
                                                  │
                                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                     FEEDBACK PROCESSOR LAMBDA                                │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │  • Process bounces → Update tracking, add to suppression            │  │
│   │  • Process complaints → Add to suppression                          │  │
│   │  • Process deliveries → Update tracking status                      │  │
│   │  • Process opens → Track engagement                                 │  │
│   │  • Schedule retries → SQS for soft bounces                         │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
└─────────────────────────────┬───────────────────────────────────────────────┘
                              │
                              ▼
                     ┌──────────────────┐
                     │   SQS Retry      │
                     │   Queue          │
                     │                  │
                     │  15-min delay    │
                     │  Single retry    │
                     └────────┬─────────┘
                              │
                              ▼
                     ┌──────────────────┐
                     │  Email Sender    │  ← Processes retries
                     │  Lambda          │
                     └──────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Design Decisions

1. EventBridge as the Entry Point

We chose EventBridge over API Gateway or direct Lambda invocation for several reasons:

  • Decoupling — Producers don't need to know about the email system's internals
  • Built-in retries — EventBridge retries failed Lambda invocations automatically
  • Dead letter queues — Failed events are preserved for investigation
  • Batch support — Up to 10 events per API call
import boto3
import json

events = boto3.client('events')

response = events.put_events(
    Entries=[{
        'Source': 'my.application',
        'DetailType': 'Email Request',
        'EventBusName': 'sesmailengine-EmailBus',
        'Detail': json.dumps({
            'to': 'recipient@example.com',
            'templateName': 'welcome',
            'templateData': {
                'userName': 'John',
                'companyName': 'My Company'
            }
        })
    }]
)
Enter fullscreen mode Exit fullscreen mode

2. Jinja2 Templates in S3

Templates live in S3 with versioning enabled. Each template has three files:

templates/
└── welcome/
    ├── template.html      # HTML body (Jinja2)
    ├── template.txt       # Plain text fallback
    └── metadata.json      # Subject, sender, variables
Enter fullscreen mode Exit fullscreen mode

The metadata.json supports Jinja2 variables too:

{
  "subject": "Welcome to {{ companyName }}!",
  "senderName": "{{ companyName }} Team",
  "requiredVariables": ["userName", "companyName"]
}
Enter fullscreen mode Exit fullscreen mode

3. Smart Bounce Handling

Not all bounces are equal. We handle them differently:

Bounce Type Action Rationale
Hard bounce (NoEmail) Immediate suppression Address doesn't exist
Soft bounce (MailboxFull) Retry once after 15 min Temporary issue
15+ soft bounces in 30 days Permanent suppression Problematic address

This balanced approach provides fast feedback while protecting against truly bad addresses.

4. Bounce Rate Protection

SES suspends accounts at ~5% bounce rate. We proactively check before each send:

┌─────────────────┐     ┌──────────────────┐
│  Email Request  │────▶│  Bounce Rate     │
│                 │     │  Check (cached)  │
└─────────────────┘     └────────┬─────────┘
                                 │
                    ┌────────────┴────────────┐
                    │                         │
              Rate OK                   Rate Exceeded
                    │                         │
                    ▼                         ▼
           ┌───────────────┐         ┌───────────────┐
           │ Proceed with  │         │ Block Email   │
           │ Email Send    │         │ Track as      │
           └───────────────┘         │ "failed"      │
                                     └───────────────┘
Enter fullscreen mode Exit fullscreen mode

Results are cached for 5 minutes to optimize DynamoDB costs—saving 97%+ on queries for high-volume senders.


The Data Model

Email Tracking Table

Every email gets a tracking record with full lifecycle:

{
  "emailId": "email-123456789",
  "toEmail": "recipient@example.com",
  "templateName": "welcome",
  "status": "delivered",
  "sesMessageId": "0000014a-f896-...",
  "timestamp": "2024-12-14T10:30:00Z",
  "deliveredAt": "2024-12-14T10:31:00Z",
  "openedAt": "2024-12-14T11:15:00Z",
  "openCount": 3,
  "retryAttempt": 0,
  "originalEmailId": "email-123456789",
  "ttl": 1710432000
}
Enter fullscreen mode Exit fullscreen mode

Status Lifecycle

┌──────┐     ┌───────────┐     ┌──────────┐
│ sent │────▶│ delivered │────▶│  opened  │
└──────┘     └───────────┘     └──────────┘
    │
    ├───────▶ soft_bounced ───▶ delivered (retry succeeded)
    │              │
    │              └───────────▶ failed (retry exhausted)
    │
    └───────▶ bounced (permanent - suppressed)
Enter fullscreen mode Exit fullscreen mode

Suppression Table

Simple and effective—if an email exists here, don't send:

{
  "email": "bounced@example.com",
  "suppressionType": "bounce",
  "reason": "hard-bounce",
  "addedToSuppressionDate": "2024-12-14T15:45:00Z"
}
Enter fullscreen mode Exit fullscreen mode

Integration Examples

Python

class EmailClient:
    def __init__(self, event_bus_name: str):
        self.events = boto3.client('events')
        self.event_bus_name = event_bus_name

    def send_email(
        self,
        to: str,
        template_name: str,
        template_data: dict,
        metadata: dict = None,
    ) -> dict:
        detail = {
            'to': to,
            'templateName': template_name,
            'templateData': template_data,
        }
        if metadata:
            detail['metadata'] = metadata

        return self.events.put_events(
            Entries=[{
                'Source': 'my.application',
                'DetailType': 'Email Request',
                'EventBusName': self.event_bus_name,
                'Detail': json.dumps(detail),
            }]
        )

# Usage
client = EmailClient('sesmailengine-EmailBus')
client.send_email(
    to='user@example.com',
    template_name='welcome',
    template_data={'userName': 'John'},
    metadata={'campaignId': 'onboarding-2024'}
)
Enter fullscreen mode Exit fullscreen mode

Node.js / TypeScript

import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';

const client = new EventBridgeClient({});

await client.send(new PutEventsCommand({
  Entries: [{
    Source: 'my.application',
    DetailType: 'Email Request',
    EventBusName: 'sesmailengine-EmailBus',
    Detail: JSON.stringify({
      to: 'recipient@example.com',
      templateName: 'welcome',
      templateData: { userName: 'John', companyName: 'My Company' }
    })
  }]
}));
Enter fullscreen mode Exit fullscreen mode

Performance Characteristics

We've load-tested this architecture extensively:

Scenario Throughput Success Rate
Light load (100 emails) 50-100/sec 100%
Standard load (1,000 emails) 100-200/sec 100%
High load (10,000 emails) 150-300/sec 100%
Extreme load (50,000 emails) 150-300/sec >98%

Latency Breakdown

Stage P50 P95
EventBridge Publish 20-50ms 80-150ms
Email Sender Lambda 200-500ms 800-1500ms
Delivery Notification 5-15s 20-40s

Cost Comparison

The serverless architecture dramatically reduces costs:

Monthly Volume SESMailEngine SendGrid Mailgun
10,000 emails ~$0.01 $20+ $35+
62,000 emails ~$0.06 $20+ $35+
100,000 emails ~$3.90 $20+ $35+

Most services fall within AWS Free Tier for typical usage.


Monitoring & Observability

CloudWatch Alarms

The system creates several alarms automatically:

Alarm Trigger Why It Matters
EventBridge DLQ Depth Messages > 0 Failed email requests
SES Bounce Rate > 3% Account suspension risk
SES Complaint Rate > 0.05% Spam reputation damage
Lambda Errors > 50 sustained System health issue

Dead Letter Queues

Every failure path has a DLQ:

EventBridge → Email Sender fails → EventBridge DLQ
SQS Retry → Retry fails → Retry DLQ  
SNS → Feedback Processor fails → Feedback DLQ
Enter fullscreen mode Exit fullscreen mode

No email is silently lost. Ever.


Lessons Learned

1. Status Priority Matters

SES events can arrive out of order. An "open" event might arrive before "delivery". We implemented status priority to prevent overwrites:

opened(4) > delivered(3) > bounced(6) > soft_bounced(2) > sent(1)
Enter fullscreen mode Exit fullscreen mode

2. Cache Aggressively, But Wisely

Bounce rate checks hit DynamoDB twice per email. Caching for 5 minutes saves 97%+ on queries while still catching rate spikes within acceptable windows.

3. Single Retry is Enough

We initially implemented 3 retries over 45 minutes. Too aggressive—a single failed email chain could permanently suppress an address. Single retry after 15 minutes provides faster feedback while letting customers decide on further action.

4. Track Everything

Every consumed event creates a tracking record. Retryable errors don't create records because the event will be processed again. This principle—no silent email loss—is fundamental.


Why Centralization Matters

By centralizing email sending:

Reduce operational complexity — One system to monitor, not dozens

Protect SES reputation account-wide — Bad addresses blocked everywhere

Ensure consistent tracking — Full audit trail for compliance

Add new producers safely — No changes to core logic needed


Key Takeaways

  1. Serverless architecture scales to zero — Costs nothing when idle
  2. Centralized sender protects reputation — One suppression list for all services
  3. Event-driven design enables decoupling — Producers don't know about email internals
  4. Retry queues ensure zero data loss — Every email tracked or preserved in DLQ
  5. Fully AWS-native — EventBridge, Lambda, DynamoDB, S3, SES, CloudWatch

Get Started

SESMailEngine is available as a complete solution you can deploy to your own AWS account.

📚 Full Documentation

The documentation includes:

  • Complete setup guide with Python installer
  • Integration examples for Python, Node.js, Java
  • Template customization guide
  • Troubleshooting common issues
  • Security architecture details

Have questions about building email infrastructure on AWS? Drop a comment below—I'd love to hear about your experiences and challenges.

Top comments (0)