Michael Masterson

Posted on May 14 • Edited on May 27 • Originally published at m2s2.io

The Truth About Serverless APIs Nobody Mentions

#aws #serverless #lambda #architecture

I've noticed serverless conversations tend to split into two camps:

People who think it solves everything, and people who think it's a mistake.

The reality is much more useful than either extreme.

Everyone who sells you on serverless leads with the same three points: no servers to manage, automatic scaling, and you only pay for what you use.

All of that is technically true.

None of it is the whole story.

I've shipped serverless APIs in production — Go Lambdas behind API Gateway, the kind that power real user-facing products. I'm not here to talk you out of it. For the right workload, it's genuinely excellent.

But the gaps between the pitch and the reality are specific, and knowing them in advance changes the decisions you make early, when they're still cheap to change.

Cold starts are real — just not how you think

Lambda cold starts happen when AWS needs to provision a new execution environment for your function. The runtime boots, your init code runs, and then the handler executes.

For a Go function on arm64, that's typically 50–150ms total. For Node.js or Python with heavy dependencies, it can be 500ms–2 seconds.

The part nobody leads with: cold starts are probabilistic, not guaranteed.

Under steady traffic, most invocations hit warm instances. You'll rarely see them in your P50 or even P95. Where they show up is P99 and beyond — the tail latency your best users experience right after a traffic spike, or the first request after a quiet weekend.

The practical advice: profile in production with real traffic before you add Provisioned Concurrency. It costs money and adds complexity.

For most APIs, the smarter move is optimizing your init code:

lazy-load dependencies
keep your binary small
initialize only what the first request actually needs

Fix the actual problem before reaching for the expensive solution.

The billing math will surprise you

Lambda pricing is simple on paper: you pay per request and per GB-second of execution time.

At low traffic, it's nearly free.

The surprise comes at scale — or in specific usage patterns.

The scenario that bites teams most often: a Lambda that does meaningful work per invocation. Parsing a large payload, doing in-memory transformation, making multiple downstream calls.

These functions run longer, use more memory, and the cost per invocation adds up faster than the pricing page suggests.

A few numbers from real workloads:

A lightweight REST Lambda (Go, 128MB, 20ms avg): ~$0.40 per million requests
A heavier processing Lambda (Node, 512MB, 800ms avg): ~$8.00 per million requests

That 20× difference in cost per million requests adds up quickly when you have real traffic.

It's still cheap compared to running servers — but the assumption that serverless is always the low-cost option needs to be verified, not assumed.

API Gateway pricing is also frequently missed. At $3.50 per million requests for HTTP APIs, a high-traffic public endpoint can make Gateway cost more than the Lambda it fronts.

Know the full cost model before you commit.

Debugging is a different skill

Local development for Lambda is awkward.

You can simulate the execution environment with tools like AWS SAM or custom test harnesses, but you're never truly running the same thing that runs in production.

The gap shows up in exactly the moments you can least afford it.

Observability requires deliberate setup.

A Lambda that exits silently leaves no trace unless you've instrumented it. CloudWatch Logs are the floor, not the ceiling — you need structured logging, correlation IDs, and ideally X-Ray tracing before you're in production, not after something breaks.

The failure modes are also harder to reproduce.

A Cognito authorizer rejecting a token before your code runs looks different from your code returning a 401. A DynamoDB throttle on a hot partition looks different from a Lambda timeout.

Getting fast at debugging serverless systems means learning the AWS console deeply:

CloudWatch Insights queries
X-Ray service maps
reading IAM policy errors without a stack trace to guide you

Statelessness is a constraint, not just a pattern

Every Lambda invocation starts with a clean slate.

That's the feature — it's what makes horizontal scaling effortless.

It's also the constraint that eliminates entire categories of architectural approaches.

No in-memory caches that persist across requests. No background jobs that run between invocations. No WebSocket connections that stay open. No files written to disk that the next request can read.

If your application assumes any of those things, serverless isn't wrong — but it's not your foundation. It's a piece of a larger architecture that needs something else alongside it.

This comes up most often when teams decide mid-project that they suddenly need a feature requiring persistent state.

Adding ElastiCache or SQS to a serverless architecture is straightforward — but it's a different architecture than what you started with, and the integration work is non-trivial.

The most dangerous part is that many of these problems don't show up until the system already matters.

Vendor lock-in is real, and that's okay

A Lambda function isn't a Docker container.

It doesn't run the same way everywhere else.

Your Go binary targeting provided.al2023 on arm64 can technically run in other environments, but your API Gateway routes, Cognito authorizers, DynamoDB single-table design, and SES integration are AWS-specific throughout.

I'm not arguing against it — I use all of those services in production.

But it's worth being honest:

If you go deep on serverless AWS, migrating later means a substantial rewrite.

Make that decision deliberately, not by default.

When not to use it

Serverless is the wrong tool when:

You have persistent connections — WebSocket servers, gRPC streams, database connection pools
You have long-running jobs — Lambda has a 15-minute max timeout
You have very high, sustained concurrency — account-level concurrency limits can affect multiple services simultaneously
You need low and predictable latency at P99 — cold-start tail latency is difficult to eliminate completely

When it's the right call

Event-driven workloads.

APIs with bursty or unpredictable traffic.

Background processing triggered by S3 uploads, SQS messages, or DynamoDB streams.

Any workload where you don't want to manage capacity and the per-request cost model fits your traffic profile.

The API powering this site is Go Lambdas behind API Gateway.

Contact form submissions, resume requests, visitor analytics, admin dashboard — all of it.

It handles the traffic I need, costs less than $1/month to run, and I've never paged for a server that went down.

For that workload, it's the right call.

The key is understanding what you're actually trading.

Serverless isn't simpler than traditional infrastructure.

It's differently complex.

The operational burden shifts from server management to observability, IAM, and distributed system debugging.

That trade is absolutely worth it in the right context.

Go in with your eyes open, and it's one of the most productive architectures available.

The right architecture should help your team maintain momentum as the system grows — not create friction six months later.

At M²S² Engineering Group, we help teams make these decisions before they become expensive to unwind.

Originally published at m2s2.io

DEV Community