DEV Community

ANKUSH CHOUDHARY JOHAL
ANKUSH CHOUDHARY JOHAL

Posted on • Originally published at johal.in

Retrospective: 3 Years of Using AWS Lambda for Serverless APIs at 1M Request Scale

Retrospective: 3 Years of Using AWS Lambda for Serverless APIs at 1M Request Scale

Three years ago, our team migrated our customer-facing API from a containerized ECS cluster to AWS Lambda, aiming to reduce operational overhead and align costs with traffic. Today, that API handles a steady 1 million requests per month (peaking at 5x that during seasonal events) with 99.99% uptime. This retrospective breaks down what worked, what didn’t, and the hard-won lessons we’d share with any team adopting Lambda for high-traffic APIs.

Why We Chose Lambda in the First Place

Our legacy ECS setup required constant capacity planning: we over-provisioned for peak traffic 80% of the time, and scaling events took 5+ minutes to spin up new tasks, leading to throttled requests during traffic spikes. Lambda promised three key wins:

  • Zero capacity planning: Automatic scaling to match traffic, with no idle resource costs.
  • Reduced ops burden: No patching, no load balancer config, no task health checks to manage.
  • Cost alignment: Pay only for execution time, which we estimated would cut our API hosting costs by 40% for our variable traffic patterns.

Initial benchmarks validated these promises: our test Lambda functions responded 20% faster than ECS tasks (thanks to Lambda’s edge-optimized endpoints via API Gateway) and we spent 2 weeks migrating with zero downtime.

Scaling to 1M Requests: What Exceeded Expectations

Lambda delivered on most of its early promises as we scaled from 10k to 1M monthly requests:

  • Automatic scaling worked flawlessly: We never hit concurrency limits after setting up reserved concurrency for critical endpoints, and burst traffic (like our Black Friday 2022 spike to 5M requests) was handled without manual intervention.
  • Cost savings were real: Our monthly API hosting costs dropped 42% compared to ECS, even as traffic grew 100x. We attribute this to Lambda’s per-request pricing and eliminating idle ECS cluster costs.
  • Observability improved: Native integration with CloudWatch Logs, X-Ray, and Lambda Insights gave us granular visibility into cold starts, execution time, and error rates per endpoint, which was harder to configure with ECS.

Lessons Learned: The Pitfalls We Hit (So You Don’t Have To)

Not everything was smooth. We hit three major pain points in our first 18 months:

Cold Starts Are Real (and Fixable)

Early on, 15% of our requests to infrequently used endpoints had cold start times of 800ms+, which violated our 200ms p95 latency SLA. We fixed this by: (1) Using provisioned concurrency for high-priority endpoints, (2) Switching to lighter runtimes (Node.js 18 instead of Java 11, which cut cold starts by 60%), and (3) Trimming deployment package sizes from 50MB to 8MB by removing unused dependencies.

API Gateway Throttling Is a Hidden Bottleneck

We assumed Lambda scaling would be our only limit, but API Gateway’s default account-level throttle (10k requests per second) caught us off guard during a traffic spike. We had to request a quota increase and implement request-level throttling per endpoint to avoid global throttling.

Vendor Lock-In Is Manageable (But Real)

Lambda’s tight integration with AWS services (DynamoDB, S3, SNS) made development fast, but migrating a single function to another platform would require rewriting IAM roles, event source mappings, and API Gateway integrations. We mitigated this by keeping business logic decoupled from AWS SDK calls where possible.

Best Practices We Adopted by Year 3

After three years, we’ve standardized on these practices for all Lambda-based APIs:

  • Keep functions single-purpose: Each function handles one API endpoint or event type, with deployment packages under 10MB.
  • Use infrastructure as code (IaC): All Lambda functions, API Gateway config, and IAM roles are defined in Terraform, with automated CI/CD pipelines for deployment.
  • Monitor cold starts and concurrency: We alert on cold start rates above 5% and concurrency usage above 80% of reserved limits.
  • Implement idempotency for mutations: All POST/PUT/DELETE endpoints use idempotency keys to avoid duplicate processing during retries, which Lambda’s automatic retries can trigger.

Is Lambda Still the Right Choice for 1M+ Request Scale?

For our use case: yes. We evaluated migrating back to ECS or EKS in 2023, but found that Lambda’s operational simplicity and cost efficiency still outweighed the tradeoffs. For teams with steady, predictable traffic, containerized solutions may be cheaper, but for variable traffic (like our seasonal spikes), Lambda’s pay-per-use model is unbeatable.

Conclusion

Three years in, AWS Lambda has delivered on its core promises for our serverless API, even as we hit and solved scaling challenges. The key to success? Treat Lambda as a managed service, not a magic bullet: invest in observability, plan for cold starts, and decouple business logic from AWS-specific code. For teams scaling to 1M+ requests, Lambda is a viable, low-ops choice—if you learn from the pitfalls we hit first.

Top comments (0)