DEV Community

Cover image for Building a Production-Grade Serverless API on AWS: Architecture, Tradeoffs, and Lessons
Nick Gojaman
Nick Gojaman

Posted on • Edited on

Building a Production-Grade Serverless API on AWS: Architecture, Tradeoffs, and Lessons

Most tutorials focus on building features. This project focused on operating a real backend system in production.

What is IntellPulse?

IntellPulse is a backend-first, API-only service that generates quantitative trading signals (BUY / HOLD / SELL) with explainability, quota enforcement, and safe deployments.
There is intentionally no UI. The goal was to design and ship a system that behaves like internal fintech infrastructure — not a demo app.
This project emphasizes:

End-to-end system design (request flow + deployment flow)
Security and access control (authentication, rate limiting)
Operational discipline (staging environments, digest-pinned deployments)
Production readiness (quota tracking, safe rollbacks)

This article walks through the architecture, key design decisions, and tradeoffs involved in building a production-grade serverless API on AWS — without overengineering.

Architecture Overview

IntellPulse is implemented as a serverless, container-based API with clear separation between runtime and deployment flows.
Request Flow (Runtime)

Clients access the API over HTTPS using an AWS Lambda Function URL
Requests are handled by a FastAPI application running inside a containerized Lambda function
Authentication and quota enforcement are applied before any signal logic executes
Usage and rate-limit state is stored in Amazon DynamoDB
The API returns structured JSON responses containing signals and explainability metadata

Deployment Flow (CI/CD)

Code changes trigger a CI/CD pipeline
A container image is built and pushed to Amazon ECR
The Lambda function is updated using a digest-pinned image reference (@sha256)
Deployments are promoted through staging → production environments

This design keeps the runtime path minimal and predictable, while allowing deployments to be performed safely without configuration drift or accidental rollbacks.

IntellPulse serverless architecture diagram showing Lambda Function URL, containerized FastAPI app, DynamoDB for quotas, and ECR for container storage with CI/CD deployment flow

🧠 Key Design Decisions

1️⃣ Why Lambda Function URLs instead of API Gateway?

For this project, I intentionally used AWS Lambda Function URLs instead of Amazon API Gateway.
Rationale:

The goal was to expose a small, controlled API surface without introducing additional infrastructure overhead
Lambda Function URLs provide native HTTPS access and integrate cleanly with Lambda-based authentication logic
Advanced API Gateway features (custom domains, request mapping templates, usage plans) were not required for this use case

Tradeoff accepted:

Fewer built-in features (no native throttling UI, no API key management)
But: Simpler architecture, faster iteration, lower cost

What this shows: Understanding when to use lighter-weight AWS services vs. defaulting to heavier managed solutions.

2️⃣ Why container-based Lambda for FastAPI?

Rather than adapting FastAPI to a zip-based Lambda deployment, the service runs as a containerized Lambda function stored in Amazon ECR.
Why containers?

Full control over Python dependencies (no Lambda layer size limits)
Consistent execution environments (local dev = cloud production)
Easier iteration without Lambda packaging constraints

What this enables:

FastAPI runs naturally without framework-specific workarounds
Still benefits from Lambda's serverless execution model (auto-scaling, pay-per-request)

Alternative considered: Lambda Layers
Why rejected: Dependency conflicts and size limits made iteration slower

3️⃣ Why DynamoDB for rate limiting and quotas?

Rate limiting and daily usage quotas are enforced using Amazon DynamoDB rather than in-memory or middleware-based solutions.
Why DynamoDB?

Scales automatically with request volume
Predictable performance under load (single-digit millisecond latency)
Per-key usage tracking with TTL-based expiry (automatic cleanup)

Implementation approach:

Each API key has a DynamoDB item with requests_today and last_reset_timestamp
Quota checks happen before signal logic executes
TTL automatically deletes expired quota records

What this shows: Stateful, durable, horizontally scalable quota enforcement that works at any scale.

4️⃣ Why digest-pinned deployments (@sha256)?

CI/CD deployments update Lambda functions using digest-pinned container images (@sha256:abc123...) rather than mutable tags like latest.
Why this matters:

Each deployment is deterministic (exact image version is known)
Rollbacks reference known artifacts (not "whatever latest points to now")
Production never pulls unintended versions (no tag drift)

Example:
bash# ❌ Bad: Mutable tag
aws lambda update-function-code --image-uri 123456789.dkr.ecr.us-east-1.amazonaws.com/intellpulse:latest

✅ Good: Digest-pinned

aws lambda update-function-code --image-uri 123456789.dkr.ecr.us-east-1.amazonaws.com/intellpulse@sha256:abc123...
Tradeoff: Adds CI/CD complexity (pipeline must resolve digests)
Benefit: Significantly improves deployment safety and traceability
What this shows: Senior-level deployment discipline and understanding of immutable infrastructure.

5️⃣ Why separate staging and production environments?

Even for a relatively small backend service, separate staging and production environments were maintained.
Why staging matters:

Validate CI/CD changes before they reach production
Test deployment logic safely (digest resolution, rollback procedures)
Build confidence in changes before promotion

Implementation:

Staging uses its own Lambda function, DynamoDB table, and ECR repository
CI/CD pipeline deploys to staging first, then requires manual approval for production
Both environments use the same codebase but different AWS accounts (or separate regions)

What this shows: Operational discipline that mirrors patterns used in larger systems and scales beyond single projects.

⚖️ Tradeoffs and Lessons Learned

Building IntellPulse reinforced the importance of intentional tradeoffs when designing production systems.
Backend-Only Approach
Tradeoff: No visual demo for non-technical users
Benefit: Full focus on correctness, security, and operational discipline
Lesson: Acceptable tradeoff when the target audience is API consumers, not end users
Lambda Function URLs vs API Gateway
Tradeoff: Fewer built-in features (no native throttling, no usage plans)
Benefit: Simpler architecture, faster iteration
Lesson: Understand service boundaries and avoid defaulting to heavier components when they're not required
Digest-Pinned Deployments
Tradeoff: Added CI/CD complexity upfront
Benefit: Eliminated risk of accidental production regressions
Lesson: Safety mechanisms pay off even at small scale

🚀 What's Next?

If I were extending this system further, the next improvements would focus on operational maturity and developer ergonomics, not additional features.

Planned enhancements:

  1. Infrastructure as Code (Terraform or AWS CDK) — eliminate manual AWS Console changes
  2. Structured observability (CloudWatch metrics, X-Ray tracing) — understand system behavior in production
  3. Lightweight UI — internal dashboard for monitoring usage and quotas
  4. Additional signal strategies — expand beyond simple BUY/HOLD/SELL logic
  5. Historical query support — allow clients to retrieve past signals

Importantly: These enhancements build on a stable foundation rather than compensating for architectural gaps.

📦 Source Code

The full implementation, including the FastAPI service, Lambda container setup, DynamoDB quota logic, and CI/CD pipeline, is available here:
GitHub: https://github.com/Gojaman/intellpulse

💡 Key Takeaways for Builders

Production-grade doesn't mean overengineered — choose the simplest solution that meets requirements
Deployment safety scales down — digest-pinned deploys and staging environments aren't just for "big systems"
Understand your constraints — Lambda Function URLs vs API Gateway isn't a "one is better" decision; it's context-dependent
Build for operators, not just users — quota enforcement, rollback procedures, and observability matter from day one

Want to discuss serverless architecture or trading system design? Connect with me:
📘 LinkedIn | 💻 GitHub | 🐦 X

Top comments (0)