DEV Community

Cover image for Keep It Running: Email Keep It Running: Email Infrastructure Operations Guide
Cyril Sebastian
Cyril Sebastian

Posted on • Originally published at tech.cyrilsebastian.com

Keep It Running: Email Keep It Running: Email Infrastructure Operations Guide

Making email infrastructure a pro.


Daily Operations (5 Minutes)

The Morning Health Check

Create this script and run it daily:

cat > ~/email-health.sh <<'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterday" +"%b %d")

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Email Health ($YESTERDAY)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

# Service status
systemctl is-active --quiet postfix && echo "Postfix" || echo "Postfix DOWN"
systemctl is-active --quiet ses-logger && echo "Logger" || echo "Logger DOWN"

# Email stats
SENT=$(grep "$YESTERDAY" /var/log/postfix/postfix.log 2>/dev/null | grep -c "status=sent")
DELIVERED=$(grep "$YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=delivered")
BOUNCED=$(grep "$YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=bounced")

echo ""
echo "📊 Volume"
echo "   Sent: $SENT"
echo "   Delivered: $DELIVERED"
echo "   Bounced: $BOUNCED"

if [ $SENT -gt 0 ]; then
  DELIVERY_RATE=$((DELIVERED * 100 / SENT))
  BOUNCE_RATE=$((BOUNCED * 100 / SENT))
  echo ""
  echo "📈 Rates"
  echo "   Delivery: ${DELIVERY_RATE}%"
  echo "   Bounce: ${BOUNCE_RATE}%"

  [ $BOUNCE_RATE -gt 5 ] && echo "   ⚠️  High bounce rate!"
fi

# Queue status
QUEUE=$(mailq | tail -1 | awk '{print $5}')
[ "$QUEUE" = "empty" ] && echo "" && echo "Queue empty" || echo "" && echo "⚠️  Queue: $QUEUE messages"

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
EOF

chmod +x ~/email-health.sh
Enter fullscreen mode Exit fullscreen mode

Run it:

./email-health.sh
Enter fullscreen mode Exit fullscreen mode

Automate it (runs at 9 AM, emails you results):

(crontab -l 2>/dev/null; echo "0 9 * * * ~/email-health.sh | mail -s 'Email Health Report' admin@yourdomain.com") | crontab -
Enter fullscreen mode Exit fullscreen mode

Essential Monitoring

Real-Time Log Watching

Monitor live email flow:

# Watch everything
sudo tail -f /var/log/postfix/*.log

# Watch only delivered emails
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=delivered"

# Watch bounces
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=bounced"
Enter fullscreen mode Exit fullscreen mode

Key Metrics to Track

Metric Target Alert If
Delivery Rate >95% <90%
Bounce Rate <5% >10%
Queue Size 0 >100
Service Uptime 99.9% Any downtime

Quick metric checks:

# Today's delivery rate
SENT=$(grep "$(date +%b\ %d)" /var/log/postfix/postfix.log | grep -c "status=sent")
DELIVERED=$(grep "$(date +%b\ %d)" /var/log/postfix/mail.log | grep -c "status=delivered")
echo "Delivery rate: $((DELIVERED * 100 / SENT))%"

# Average delivery time (milliseconds)
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'delay=\K\d+' | \
  awk '{sum+=$1; n++} END {print "Avg delay: " sum/n "ms"}'

# Top recipient domains
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'to=<[^@]+@\K[^>]+' | \
  sort | uniq -c | sort -rn | head -5
Enter fullscreen mode Exit fullscreen mode

Common Operations

Adding New Senders

# 1. Edit whitelist
sudo vim /etc/postfix/allowed_senders

# Add line:
# newsender@yourdomain.com    OK

# 2. Rebuild database
sudo postmap /etc/postfix/allowed_senders

# 3. Reload (no restart needed!)
sudo systemctl reload postfix

# 4. Test
echo "Test" | mail -s "Test" -r newsender@yourdomain.com test@example.com
Enter fullscreen mode Exit fullscreen mode

No downtime! Reload picks up changes instantly.


Removing Senders

# 1. Comment out or remove from whitelist
sudo vim /etc/postfix/allowed_senders
# #oldsender@yourdomain.com    OK

# 2. Rebuild and reload
sudo postmap /etc/postfix/allowed_senders
sudo systemctl reload postfix

# 3. Verify rejection
echo "Test" | mail -s "Test" -r oldsender@yourdomain.com test@example.com
# Should see: "Sender address rejected"
Enter fullscreen mode Exit fullscreen mode

Managing Mail Queue

View queue:

mailq
Enter fullscreen mode Exit fullscreen mode

Flush queue (retry all deferred emails):

sudo postqueue -f
Enter fullscreen mode Exit fullscreen mode

Delete specific email:

# Get queue ID from mailq
sudo postsuper -d QUEUE_ID
Enter fullscreen mode Exit fullscreen mode

Delete all queued emails:

sudo postsuper -d ALL
Enter fullscreen mode Exit fullscreen mode

Delete only deferred emails:

sudo postsuper -d ALL deferred
Enter fullscreen mode Exit fullscreen mode

Searching Email History

Find specific email:

grep "user@example.com" /var/log/postfix/*.log
Enter fullscreen mode Exit fullscreen mode

Find by sender:

grep "from=<sender@yourdomain.com>" /var/log/postfix/postfix.log
Enter fullscreen mode Exit fullscreen mode

Find bounces to specific domain:

grep "gmail.com" /var/log/postfix/mail.log | grep "bounced"
Enter fullscreen mode Exit fullscreen mode

Get complete email journey:

# Get message ID from sent log
MSG_ID=$(grep "user@example.com" /var/log/postfix/postfix.log | grep -oP 'status=sent \(250 Ok \K[^)]+' | head -1)

# Find all events for that message
grep "$MSG_ID" /var/log/postfix/*.log
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Guide

Problem 1: Postfix Won't Start

Symptoms:

sudo systemctl start postfix
# Job for postfix.service failed
Enter fullscreen mode Exit fullscreen mode

Fix:

# 1. Check config syntax
sudo postfix check

# 2. View detailed error
sudo journalctl -u postfix -n 20 --no-pager

# 3. Common issues:

# Port in use?
sudo lsof -i :25
# Kill conflicting process: sudo systemctl stop sendmail

# Permission issue?
sudo chown -R postfix:postfix /var/log/postfix
sudo chown -R postfix:postfix /var/spool/postfix

# Check line number from 'postfix check' output
sudo vim /etc/postfix/main.cf +LINE_NUMBER
Enter fullscreen mode Exit fullscreen mode

Problem 2: Emails Stuck in Queue

Diagnosis:

mailq  # Shows queued emails
sudo tail -100 /var/log/postfix/postfix.log | grep "status=deferred"
Enter fullscreen mode Exit fullscreen mode

Common causes and fixes:

Wrong SES credentials:

# Verify credentials
sudo postmap -q "[email-smtp.ap-south-1.amazonaws.com]:587" /etc/postfix/sasl_passwd

# Update if needed
sudo vim /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
sudo systemctl restart postfix
Enter fullscreen mode Exit fullscreen mode

Network blocked:

# Test SES connectivity
telnet email-smtp.ap-south-1.amazonaws.com 587

# Check security group allows outbound 587
# Check route table has internet gateway
Enter fullscreen mode Exit fullscreen mode

SES quota exceeded:

aws ses get-send-quota --region ap-south-1
# If near limit, wait or request increase
Enter fullscreen mode Exit fullscreen mode

After fixing, flush the queue:

sudo postqueue -f
Enter fullscreen mode Exit fullscreen mode

Problem 3: Logger Service Keeps Crashing

Check logs:

sudo journalctl -u ses-logger -n 50 --no-pager
sudo tail -50 /var/log/ses-logger-error.log
Enter fullscreen mode Exit fullscreen mode

Common fixes:

boto3 missing:

python3 -c "import boto3" || sudo yum install -y python3-boto3
sudo systemctl restart ses-logger
Enter fullscreen mode Exit fullscreen mode

Wrong queue URL:

# Get correct URL
QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)

# Update service
sudo sed -i "s|Environment=\"SQS_QUEUE_URL=.*\"|Environment=\"SQS_QUEUE_URL=$QUEUE_URL\"|" /etc/systemd/system/ses-logger.service

sudo systemctl daemon-reload
sudo systemctl restart ses-logger
Enter fullscreen mode Exit fullscreen mode

IAM permissions:

# Verify role attached
aws sts get-caller-identity

# Should show: PostfixSESLogger role
# If not, reattach IAM instance profile
Enter fullscreen mode Exit fullscreen mode

Problem 4: No Delivery Events in Logs

Diagnosis:

# 1. Check SQS queue has messages
aws sqs get-queue-attributes \
  --queue-url "$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)" \
  --attribute-names ApproximateNumberOfMessages \
  --region ap-south-1
Enter fullscreen mode Exit fullscreen mode

If messages are accumulating:

  • Logger not processing → Check sudo journalctl -u ses-logger

  • Restart logger → sudo systemctl restart ses-logger

If no messages in queue:

# 2. Verify SES publishing to SNS
aws ses get-identity-notification-attributes \
  --identities yourdomain.com \
  --region ap-south-1

# Should show all three topics configured

# 3. Reconfigure if needed
SNS_ARN=$(aws sns list-topics --region ap-south-1 --query "Topics[?contains(TopicArn, 'ses-events-topic')].TopicArn | [0]" --output text)

for EVENT in Delivery Bounce Complaint; do
  aws ses set-identity-notification-topic \
    --identity yourdomain.com \
    --notification-type $EVENT \
    --sns-topic "$SNS_ARN" \
    --region ap-south-1
done
Enter fullscreen mode Exit fullscreen mode

Problem 5: High Bounce Rate (>10%)

Analyze bounce reasons:

grep "status=bounced" /var/log/postfix/mail.log | \
  grep -oP 'reason=\(\K[^\)]+' | \
  sort | uniq -c | sort -rn | head -10
Enter fullscreen mode Exit fullscreen mode

Common reasons:

"User unknown" (invalid addresses):

# Extract bounced addresses
grep "status=bounced" /var/log/postfix/mail.log | \
  grep "bounce_type=Permanent" | \
  grep -oP 'to=<\K[^>]+' | \
  sort -u > bounced_addresses.txt

# Remove from your mailing list
Enter fullscreen mode Exit fullscreen mode

"Mailbox full":

  • Temporary issue, will resolve

  • Retry after 24 hours

"550 Spam":

  • Review email content

  • Check SPF/DKIM/DMARC setup

  • Verify sender reputation


Problem 6: Emails Going to Spam

Verification checklist:

# 1. Check SPF
dig +short TXT yourdomain.com | grep spf
# Should include: include:amazonses.com

# 2. Check DKIM
aws ses get-identity-dkim-attributes \
  --identities yourdomain.com \
  --region ap-south-1
# Should show: DkimEnabled=true, Status=Success

# 3. Check DMARC
dig +short TXT _dmarc.yourdomain.com
# Should return DMARC policy

# 4. Check SES reputation
aws ses get-account-sending-enabled --region ap-south-1
# Should be enabled
Enter fullscreen mode Exit fullscreen mode

Content checklist:

  • Avoid spam trigger words (FREE!, ACT NOW!)

  • Include unsubscribe link

  • Balance text/image ratio (60% text minimum)

  • Use a consistent "From" name and address

  • Authenticate with SPF/DKIM/DMARC


Performance Optimization

Postfix Tuning

For higher throughput:

sudo vim /etc/postfix/main.cf
Enter fullscreen mode Exit fullscreen mode

Add/update:

# Increase concurrent deliveries
default_destination_concurrency_limit = 50
default_destination_recipient_limit = 50

# Reduce queue lifetime
maximal_queue_lifetime = 1d
bounce_queue_lifetime = 1d

# Connection caching
smtp_connection_cache_on_demand = yes
smtp_connection_cache_destinations = email-smtp.ap-south-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Reload:

sudo systemctl reload postfix
Enter fullscreen mode Exit fullscreen mode

Logger Optimization

For high volume (>1000 events/min):

Edit /usr/local/bin/ses_logger.py:

# Increase batch size
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=30,  # Up from 10
    WaitTimeSeconds=20
)
Enter fullscreen mode Exit fullscreen mode

Restart:

sudo systemctl restart ses-logger
Enter fullscreen mode Exit fullscreen mode

Scaling Strategies

When to Scale

Metric Scale Trigger
CPU Usage Sustained >70%
Emails/day >40,000 (80% of quota)
Queue Size Sustained >100
Memory >80% used

Vertical Scaling (Bigger Instance)

Current performance by instance:

Instance vCPU RAM Emails/day
t3a.small 2 2GB 10,000
t3a.medium 2 4GB 50,000
t3a.large 2 8GB 100,000
c6a.xlarge 4 8GB 500,000

Security Hardening

Restrict Relay Access

Tighten network access:

sudo vim /etc/postfix/main.cf
Enter fullscreen mode Exit fullscreen mode
# Only specific IPs
mynetworks = 127.0.0.1, 10.10.3.125

# Or specific subnet
mynetworks = 10.10.0.0/21
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

Prevent abuse:

sudo vim /etc/postfix/main.cf
Enter fullscreen mode Exit fullscreen mode
# Max 100 connections/min per client
smtpd_client_connection_rate_limit = 100

# Max 100 emails/min per client
smtpd_client_message_rate_limit = 100
Enter fullscreen mode Exit fullscreen mode

Monitor IAM Usage

Enable CloudTrail for audit:

aws cloudtrail create-trail \
  --name email-infrastructure-audit \
  --s3-bucket-name my-audit-logs
Enter fullscreen mode Exit fullscreen mode

Resources

AWS Documentation:

Postfix:


Series Complete! 🎉

🔗 If this helped or resonated with you, connect with me on LinkedIn. Let’s learn and grow together.

👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.


Top comments (0)