We shipped 40K lines of Lambda code in production last year. Here's what we learned about building real systems on serverless.
The Setup
BuildFlags and Buildr HQ both run on Lambda + DynamoDB. No EC2. No Kubernetes. Just functions and databases that scale on demand.
People always ask: "Doesn't that get expensive?"
Not if you design for it.
Why Lambda + DynamoDB Works
Speed
- Deploy in seconds, not minutes
- Iterate fast because you're not managing infrastructure
- Scaling is automatic — you don't think about capacity planning
Cost
- Pay for what you use: compute seconds, database capacity
- Zero idle cost if nobody's using your app
- Stops the "We have 100 servers sitting idle" problem
Reliability
- AWS manages availability zones for you
- DynamoDB replicates automatically
- Built-in monitoring and logging (CloudWatch)
The Gotchas (And How We Fixed Them)
1. Cold Starts (The Biggest Complaint)
The problem: Lambda takes 1-3 seconds to start if the function hasn't run recently.
Why it happens: Lambda freezes your container when not in use. First request has to thaw it.
Our solution:
- Use Node.js 20 (faster than older runtimes)
- Keep bundle size under 50MB (we're at 15MB)
- Use native AWS SDK (faster than third-party clients)
- Don't import unused libraries (tree-shaking in esbuild)
- Reserve concurrency for critical functions ($5-10/month for peace of mind)
The real talk: Cold starts suck. But they happen <5% of the time in production. We cache responses and use CloudFront to hide latency.
2. DynamoDB Query Patterns
The problem: Write your first queries wrong and you get throttled or pay 10x.
Why it happens: Single-table design looks simple until you realize you need 5 different access patterns.
Our solution — Composite Sort Keys:
Instead of separate tables, use one table with smart keys
PK pattern: ENTITY_TYPE#id
SK pattern: ATTRIBUTE#timestamp or ATTRIBUTE#reference#sort
Example: flags table
PK: WORKSPACE#workspace-123
SK: FLAG#flag-key | FLAG#flag-key#created-at | ENV#dev
This lets you query:
- All flags in a workspace (PK + SK begins_with "FLAG#")
- All flags created after timestamp (PK + SK begins_with "FLAG#" + sort)
- All envs for a workspace (PK + SK begins_with "ENV#")
Cost impact: Separate tables = multiple costs. Single table with smart keys = half the cost.
3. Transaction Limits
The problem: DynamoDB transactions have limits you'll hit.
Why it matters: You can only write to 25 items in one transaction.
Our solution:
- Keep transactions small (2-5 items max)
- Use conditional writes instead of transactions when possible
- For complex operations, use Step Functions to orchestrate multi-step workflows
4. Exponential Backoff
The problem: When DynamoDB throttles you, hammering it harder makes it worse.
Our solution: AWS SDK v3 has built-in retry logic, but tune it properly with exponential backoff.
Real talk: If you're hitting throttling regularly, your design is wrong. Throttling usually means:
- Partition key isn't distributed (you're hammering one shard)
- You need On-Demand pricing instead of Provisioned
- Your query pattern doesn't fit the table design
5. Lambda Payload Limits
The problem: Lambda has a 6MB payload limit (3MB if you use async).
Why it matters: Bulk operations can exceed this.
Our solution:
- Stream large files instead of loading them whole
- Paginate results (DynamoDB
LastEvaluatedKey) - Use S3 for temporary storage if you need to pass large data between Lambda functions
- Chunk bulk operations into 1K item batches
Our Production Patterns
API Gateway → Lambda → DynamoDB
HTTP Request flows through:
- Lambda Handler (Node.js) parses, validates, queries
- CloudFront caches responses
- JSON returned in 50-200ms warm, 1-3s cold start
DynamoDB queries run in 20-50ms. We use cache hit patterns extensively.
Bulk Operations (Batch Writes)
Large datasets get uploaded to S3, Lambda reads and batch writes to DynamoDB in 25-item chunks, logs success.
Scheduled Jobs (Cron)
EventBridge triggers Lambda daily. Example: expire trial users by querying all users where trial_ends < today and updating their status.
Cost Breakdown (Real Numbers)
BuildFlags production (100 monthly active users):
- Lambda: $0.50/month (1M requests, averaged 200ms)
- DynamoDB: $5–$10/month (200 reads, 50 writes per day, on-demand pricing)
- CloudFront: $2–$3/month (caching static assets)
- Data transfer: $0 (under 1GB)
Total: ~$8–$13/month for production infra.
Compare to traditional servers: EC2 + RDS = $50–$500/month minimum.
When NOT to Use Lambda + DynamoDB
- Long-running jobs (>15 min): Use Batch or ECS instead
- Complex SQL queries: DynamoDB is not relational. Use RDS if you need JOINs
- Real-time streaming: Better with Kafka or Kinesis
- ML training: Use SageMaker, not Lambda
- File processing (large files): Consider Step Functions + ECS
The Verdict
Lambda + DynamoDB is the fastest way to ship production code. It's not the cheapest at scale (that's your own data center), but it's the best for startups, SaaS teams, and rapid iteration.
We've shipped millions in ARR on this stack. It works.
Top comments (0)