DEV Community

Moon Robert
Moon Robert

Posted on • Originally published at blog.rebalai.com

Serverless vs Containers in 2026: Why I Stopped Treating It as a Binary Choice

About 14 months ago, my team of four migrated our entire backend — a fairly standard Node.js/Python mix serving a B2B SaaS product — fully onto AWS Lambda. We'd read the same blog posts you probably have. Pay for what you use, infinite scale, no servers to babysit. We were sold.

Six months later, two of our services were back in containers on ECS Fargate.

Not because Lambda failed us. It's more complicated than that. This post is my attempt to be honest about what actually happened, what the tradeoffs look like in practice in early 2026, and what I'd tell a team starting fresh today.

Why We Went All-In on Serverless (And What Actually Worked)

Before I get into the friction: serverless genuinely delivers for certain workloads, and I want to say that clearly before this turns into another "we went back to containers" post that implies the whole thing is overhyped. It isn't.

Our webhook processing pipeline — where we ingest events from third-party integrations and fan them out to customer-specific handlers — is still on Lambda and I have zero plans to change that. It's processing about 2-3 million invocations a day now, and the cost is roughly $40/month. The same workload on containers would require careful autoscaling configuration, and we'd almost certainly be over-provisioned most of the time because the traffic pattern is genuinely spiky: bursts of thousands of events followed by minutes of nothing.

The other thing serverless got right for us: the team doesn't have to think about it. Lambda functions deploy in under two minutes, they scale, they recover from errors automatically. For a four-person team where nobody has "DevOps" in their title, that operational simplicity is worth real money.

The tooling also got genuinely better between 2024 and now. AWS SAM plus GitHub Actions is a clean deployment story. The old pain of local Lambda testing has mostly been solved — sam local invoke is workable, not perfect, but I stopped complaining about it months ago.

The sweet spot for Lambda: irregular or unpredictable traffic, cold invocations that are tolerable for the use case, discrete bounded work, and a team small enough that operational overhead eats into actual product time.

Where the Serverless Story Started to Break

We ran our main user-facing API on Lambda for about four months. Authentication, data fetching, the synchronous endpoints our SaaS customers hit directly. And it worked — until 2am on a Tuesday, when our largest customer kicked off a bulk export job that slammed our database connection pool.

Lambda functions don't maintain persistent connections. Every cold start means a new connection. When 200 functions spun up simultaneously and each tried to grab a database handle from our RDS instance, we had a very fun morning.

RDS Proxy helped. We implemented it and it did solve the immediate connection storm. But it added 3-5ms of latency per query in our benchmarks, and it added another managed service to reason about. Our connection pooling logic — which had been invisible when we ran a containerized API server — was now something we actively had to debug and configure.

The deeper issue was architectural. Lambda encourages stateless, short-lived compute, which is correct and good engineering, but our API had accumulated a few stateful patterns we hadn't noticed until serverless made them painful. In-memory caching with a warm LRU cache we'd been relying on without realizing it. Some SDK client initialization done lazily that assumed a long-lived process. You could argue we should have caught these earlier — fair — but the migration surfaced a whole class of assumptions we'd made about "the server" that didn't hold anymore. These weren't Lambda problems exactly. They were invisible debt that Lambda forced us to pay.

The Cold Start Problem in 2026 Is Better, But Not Gone

I genuinely thought cold starts were a solved problem when we made our migration decision. That was wrong.

Lambda SnapStart — originally Java-only — extended to Node.js 22 and Python 3.13 runtimes in late 2025. The basic idea: AWS snapshots your initialized function and restores from that snapshot instead of initializing from scratch. In practice, this brought our cold starts from 600-900ms down to 80-150ms for most functions. Real improvement.

But there are edge cases. SnapStart doesn't play nicely with certain SDK initialization patterns. We hit a weird issue where the AWS SDK v3 client caching behavior caused stale credential state in restored snapshots — silent auth failures for about 0.1% of cold-start invocations. Took us two days to track down. It's documented in a GitHub issue thread (aws/aws-lambda-snapstart-java #89, though the Node behavior lives in a comment thread rather than its own issue, which is... typical).

For Python-heavy ML inference, cold starts are still brutal. A Lambda function loading a scikit-learn model plus its dependencies is going to take 3-8 seconds on a cold start depending on model size. Lambda container image support helps — you can package up to 10GB now — but you're still paying the initialization cost every time a new instance spins up. I moved our ML inference endpoints to containers for exactly this reason: a persistent ECS service that keeps the model warm is just better for that use case, full stop.

Here's what the contrast looks like in actual code:

# Lambda: webhook handler — stateless, spiky, exactly the right use case
import json
import boto3
from aws_lambda_powertools import Logger, Tracer

logger = Logger()
tracer = Tracer()

# Initialized once per lifecycle — SnapStart snapshots this state
sqs = boto3.client('sqs')

@tracer.capture_lambda_handler
@logger.inject_lambda_context
def handler(event, context):
    records = event.get('Records', [])
    results = [process_webhook(r) for r in records]
    return {"processed": len(results)}
Enter fullscreen mode Exit fullscreen mode
# ECS: ML inference service — persistent process, model stays loaded in memory
from fastapi import FastAPI
import joblib
import numpy as np

app = FastAPI()

# This is the whole point: loads ONCE at container start, not on every invocation
# On Lambda, a cold start would re-load this 3-8 seconds every time
model = joblib.load('/app/models/churn_predictor_v3.pkl')

@app.post("/predict")
async def predict(features: dict):
    X = np.array([[features[k] for k in sorted(features)]])
    probability = model.predict_proba(X)[0][1]
    return {"churn_probability": float(probability)}
Enter fullscreen mode Exit fullscreen mode

The difference is obvious when you lay it out this way. Lambda's cold start cost only becomes a real problem when you have heavy initialization — but "heavy initialization" turns out to describe a lot of production workloads.

Container Economics: When the Math Actually Flips

I spent too long assuming serverless was inherently cheaper. "Pay per invocation" sounds obviously better than always-running containers. The math gets interesting.

Our API sits at roughly 8 million requests per day. With 512MB functions averaging 50ms execution time:

  • Request charges: 240M/month → essentially negligible (~$0.05)
  • Compute: 240M × 0.05s × 0.5GB = 6M GB-seconds → ~$100/month
  • Supporting services (RDS Proxy, NAT Gateway egress, X-Ray): ~$65/month

Total: roughly $165/month for Lambda.

Two Fargate tasks (1 vCPU, 2GB RAM each) running 24/7: about $140/month. With a simple autoscaling policy that steps to four tasks during business hours, you land at ~$170/month.

Basically the same cost. At 2x our current scale, containers get cheaper — warm instances mean consistent p95 latency, persistent connection pools eliminate the RDS Proxy overhead, and we can be more precise about autoscaling than Lambda's concurrency model allows.

This math assumes someone actively tuning Fargate task sizes and autoscaling thresholds, though. That's real work. If your team doesn't have the bandwidth for it, the serverless model's operational simplicity is itself worth something — I wouldn't optimize purely for raw infrastructure cost if the alternative is your engineers spending their afternoons staring at CloudWatch dashboards.

What I'd Actually Recommend

I pushed our team to go all-serverless partly because I was excited and partly because I'd read too many AWS blog posts written by people whose job is to sell you AWS services. That's not a knock on the technology — just useful context for how those blog posts are framed.

For event-driven async workloads — webhooks, queue consumers, scheduled jobs, file processing pipelines — Lambda is genuinely the right default. Traffic is irregular, the work is discrete, and the operational overhead is low enough that a small team can mostly forget it exists. That's a real win.

For user-facing synchronous APIs, it depends. Latency requirements under 100ms p99, heavy in-memory state, ML model serving, or traffic steady enough to keep instances warm — containers are probably the right call. ECS Fargate is my default recommendation there. You don't need to manage EC2 instances unless your infra team is actually sized for that work; Fargate hits the sweet spot.

The hybrid architecture isn't a cop-out. It's how most mature backend systems actually end up, because different workloads have genuinely different characteristics. My mistake was treating this as either/or — that's a framing problem, not a technical one.

The "no servers to manage" promise of serverless is real, but it trades server management for function management: cold start tuning, concurrency limits, timeout edge cases, VPC routing. Lower stakes, but not zero stakes. A Lambda function silently timing out at 29 seconds on an edge case is harder to notice than a container dropping out of your load balancer's health check rotation. I've experienced both, and neither is fun at 2am.

Our current setup: Lambda for async pipelines, Fargate for synchronous APIs, shared infrastructure (VPC, RDS, ElastiCache) that both can reach. Four engineers, two environments, one YAML-heavy afternoon to wire up the networking. That's the architecture I'd start with if I were doing it again.

Top comments (0)