Emmanuel Ulu for AWS Community Builders

Posted on Jul 1

From SAM to Terraform Rebuilding My Nigeria Power Outage Tracker with Modules

#aws #terraform #serverless #devops

INTRODUCTION

A few months ago I built a serverless power outage reporting system for Nigeria using AWS SAM. Citizens could submit outage reports via API, the system stored them in DynamoDB, and SNS sent email alerts when a threshold was hit.
It worked. But the infrastructure lived in a single template.yaml file with no modularity, no CloudWatch dashboard, no Dead Letter Queue, and the alert threshold was hardcoded.
So I rebuilt it from scratch using Terraform modules. Same problem. Same architecture. Better infrastructure.
This is Part 2. If you missed Part 1, read it here first: How I Built a Serverless Power Outage Tracker for Nigeria on AWS.

The Architecture

The event-driven pipeline looks like this:

User
  ↓
API Gateway (HTTP API)
  ↓
Lambda Validator    → SQS Dead Letter Queue (failed reports)
  ↓
SQS Queue
  ↓
Lambda Enricher     → DynamoDB + SNS alert (threshold exceeded)
  ↓
Lambda Query        ← GET /reports
  ↓
Lambda Aggregator   ← daily summary (EventBridge scheduled)

Every component is serverless. No servers to manage, no idle compute costs, pay only when reports come in.

The Architecture Drawing

The Terraform Module Structure

Instead of one giant template I split everything into 5 independent modules:

terraform/
├── modules/
│   ├── api/           # API Gateway HTTP API + routes + Lambda permissions
│   ├── compute/       # IAM role, Lambda functions, SQS event source mapping
│   ├── messaging/     # SQS queue, Dead Letter Queue, SNS topic + subscription
│   ├── observability/ # CloudWatch log groups, alarms, dashboard
│   └── storage/       # DynamoDB table with GSI and TTL
└── environments/
    └── dev/           # Root module wiring everything together

Each module has one job. The messaging module doesn't know about Lambda. The compute module doesn't know about API Gateway. They communicate through outputs and inputs only.

Module 1 — Storage

The DynamoDB table stores every outage report with 3 key design decisions:
Partition key + sort key:

hash_key  = "LGA"        # Local Government Area e.g. Ikeja
range_key = "timestamp"  # ISO 8601 timestamp

This lets you query all reports for a specific LGA efficiently.
Global Secondary Index for state-level queries:

global_secondary_index {
  name            = "StateIndex"
  hash_key        = "state"
  range_key       = "timestamp"
  projection_type = "ALL"
}

Query all outages in Lagos State without scanning the entire table.
TTL — auto-expire old records:

ttl {
  attribute_name = "expiry"
  enabled        = true
}

Records automatically delete after 90 days. No manual cleanup, no growing storage costs.

Module 2 — Messaging

Three resources with one important design decision — the Dead Letter Queue:

resource "aws_sqs_queue" "dlq" {
  name                      = "${var.project_name}-outage-dlq"
  message_retention_seconds = 1209600  # 14 days
}

resource "aws_sqs_queue" "outage_queue" {
  name                       = "${var.project_name}-outage-queue"
  receive_wait_time_seconds  = 10      # long polling
  visibility_timeout_seconds = 30

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dlq.arn
    maxReceiveCount     = 3
  })
}

Why the DLQ matters:
Without a DLQ, if the enricher Lambda fails to process a message it retries indefinitely. After 3 failed attempts the message disappears. You never know it failed.
With a DLQ, after 3 failed attempts the message moves to the DLQ where it stays for 14 days. You can investigate, fix the bug, and replay the message. No data loss.
Long polling (receive_wait_time_seconds = 10) reduces empty API calls to SQS — Lambda waits up to 10 seconds for messages before returning empty. Fewer API calls, lower cost.

Module 3 — Compute

Four Lambda functions, one IAM role, least privilege policies:

resource "aws_iam_role_policy" "lambda_sqs" {
  policy = jsonencode({
    Statement = [{
      Effect = "Allow"
      Action = [
        "sqs:SendMessage",
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:GetQueueAttributes"
      ]
      Resource = [var.queue_arn, var.dlq_arn]
    }]
  })
}

Each policy only grants what's needed. Lambda can send to SQS but cannot delete tables. Lambda can write to DynamoDB but cannot access S3. Least privilege at every layer.
The SQS event source mapping connects the queue to the enricher Lambda automatically:

resource "aws_lambda_event_source_mapping" "sqs_enricher" {
  event_source_arn = var.queue_arn
  function_name    = aws_lambda_function.enricher.arn
  batch_size       = 10
}

When messages arrive in SQS, Lambda polls and processes them in batches of up to 10. No manual polling code needed.
The alert threshold is a variable:

variable "alert_threshold" {
  type    = number
  default = 3
}

Change it in terraform.tfvars and redeploy. No code changes needed:

alert_threshold = 5  # require 5 reports before alerting

Module 4 — API

HTTP API Gateway with two routes:

resource "aws_apigatewayv2_route" "post_reports" {
  route_key = "POST /reports"
  target    = "integrations/${aws_apigatewayv2_integration.validator.id}"
}

resource "aws_apigatewayv2_route" "get_reports" {
  route_key = "GET /reports"
  target    = "integrations/${aws_apigatewayv2_integration.query.id}"
}

POST /reports routes to the validator Lambda. GET /reports routes to the query Lambda. CORS is enabled so a frontend can call it directly from the browser.

Module 5 — Observability

This was missing entirely from the SAM version. Four CloudWatch log groups, alarms on Lambda errors, and a dashboard:

resource "aws_cloudwatch_metric_alarm" "dlq_messages" {
  alarm_name          = "${var.project_name}-dlq-messages"
  metric_name         = "ApproximateNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  threshold           = 0
  alarm_description   = "Messages are landing in the Dead Letter Queue"

  dimensions = {
    QueueName = "${var.project_name}-outage-dlq"
  }
}

The DLQ alarm is the most important one. If even one message lands in the DLQ something is wrong. Threshold of 0 means the alarm fires immediately.

The Proof

Submit an outage report:

curl -X POST https://your-api-id.execute-api.eu-west-1.amazonaws.com/reports \
  -H "Content-Type: application/json" \
  -d '{
    "lga": "Ikeja",
    "state": "Lagos",
    "reporter_name": "Emmanuel Ulu",
    "description": "Power outage on Allen Avenue since 6am"
  }'

{"message": "Outage report received", "report_id": "6765dd6f-cbbd-4900-a607-b542e5720487"}

Query reports for Ikeja:

curl "https://your-api-id.execute-api.eu-west-1.amazonaws.com/reports?lga=Ikeja"

{"count": 4, "reports": [...]}

After the 3rd report from the same LGA an email alert fires:

Power outage alert for Ikeja, Lagos.
4 reports received today.
Latest report: Generator running low on fuel, still no NEPA
Reported by: Bola
Time: 2026-06-27T21:04:45

SAM vs Terraform — What Actually Changed

Feature	SAM Version	Terraform Version
IaC tool	AWS SAM	Terraform
State management	CloudFormation	S3 backend with native locking
Module structure	Single template	5 independent modules
Dead Letter Queue	No	Yes
CloudWatch dashboard	No	Yes
Alert threshold	Hardcoded	Configurable variable
Observability	Basic logs	Log groups, alarms, dashboard

Key Lessons

The DLQ is not optional
Without it you have no idea when messages fail. Every production SQS queue needs a DLQ.
Long polling saves money
receive_wait_time_seconds = 10 cuts empty SQS API calls significantly. Small change, real savings at scale.
TTL is free cleanup
DynamoDB TTL automatically removes expired items. No Lambda scheduled to delete old records, no growing storage costs.
Modules make variables powerful
The alert threshold is a number in terraform.tfvars. Change it, run terraform apply, done. In the SAM version you would have to edit Python code, redeploy, and hope nothing broke.
Observability is infrastructure
CloudWatch dashboards and alarms should be provisioned with the same code that provisions the app. Not added later when something breaks.

What's Next

Add EventBridge scheduled trigger for the daily aggregator summary
Add X-Ray tracing across the full pipeline
Build a simple frontend on S3 and CloudFront to visualize outages on a map
Add VPC endpoints so Lambda never touches the public internet

Screenshots

Resources

GitHub: nigeria-outage-tracker
Part 1: How I Built a Serverless Power Outage Tracker for Nigeria on AWS
Terraform AWS Provider docs: registry.terraform.io/providers/hashicorp/aws

Top comments (2)

Paul Marcelin • Jul 1

Excellent technical work accompanied by an easy-to-read article! Why am I not surprised, Emmanuel?

I really like it when we re-implement past work in a different way, be that replacing Lambda functions with a Step Function or comparing Serverless Application Model (SAM) plus CloudFormation or Cloud Development Kit (CDK) with Terraform. We end up learning a lot and improving both implementations.

With SAM, don't forget that you can define and reference CloudFormation parameters, in much the same way that Terraform supports input variables. CloudFormation's AWS-specific data types and its AWS::CloudFormation::Interface metadata option make CloudFormation variables easier for humans to use, in the AWS Console. CloudFormation supports advanced validation rules and rule-specific functions. Terraform supports elaborate type constraints and static validation, but dynamic checks require precondition or postcondition blocks. Each IaC system has pros and cons.

In Terraform, data.aws_iam_policy_document.NAME.... is a great way to construct and reference IAM policies. I do rely on jsonencode() or a heredoc in Terraform, and JSON embedded in CloudFormation YAML (no official reference, but here's an example from my own Lights Off AWS scheduling tool), when I want an IAM policy body to be fundamentally portable or when I allow the user to insert custom policy elements.

Don't worry too much about a VPC Lambda function until you have truly sensitive data or are calling a service that requires a user-managed VPC, like Amazon MSK (Managed Streaming for Kafka)! VPC Lambda functions require a lot of extra configuration and they are much slower to create and to delete.

Have you tried something like Kiro to help with SAM/CloudFormation to Terraform conversion? I know that I, and many other people, would love to read about your findings.

I'm looking forward to your next release!

Emmanuel Ulu AWS Community Builders • Jul 2

Thank you Paul! The rewrite honestly taught me more than the original build. I will go through your feedback carefully and make the necessary corrections.

Thank you