Joanne Skiles for AWS Community Builders

Posted on May 25

Serverless Mental Models: What They Don't Tell You Before You Build

#serverless #aws #cloudcomputing #tutorial

Most serverless tutorials teach you how to deploy a Lambda function.

This article teaches you how to think about serverless, so you can make good architecture decisions instead of just following steps and hoping for the best.

I put together a four-part YouTube mini-series called Serverless Mental Models. This article covers the same ground, with each section anchored to the video if you want to go deeper, see the code, or watch the concepts play out live.

The Four Serverless Mental Models Videos

Serverless has real limits, and you should know them before you commit.
Stateless doesn't mean simple. It means the complexity moves outward, to edges.
Serverless is about failure domains. Having the mental model of the "blast radius" of a failure is the real value proposition.
Build it, then break it on purpose.

Mental model 1: Serverless has real limits

Video 1 of 4: "When Serverless Is the Wrong Choice"

I'm starting with the limits because I'd be doing you a disservice if I didn't.

Serverless is powerful. I've built real production systems with it, and I've given talks about it at AWS re:Invent and AWS Summit. But the honest picture is that it has four real limitations, and if you don't know what they are before you start building, you'll hit them when your system is live and your users are waiting.

Cold Starts

The first one, cold starts, is something you all have probably heard of before. When nobody has called your function in a while, the cloud provider freezes it. The next request has to wake it up (boot the runtime, load your code, initialize dependencies) before it can run. This takes anywhere from 100ms to a couple of seconds, depending on your runtime and bundle size.

For background jobs nobody notices: fine. For a user clicking a button and waiting... well, that's a bad experience.

The workaround is Provisioned Concurrency: you keep N copies of your function always warm. It works. But you're paying for always-on compute, which partially walks back the "pay for what you use" promise. If you need consistent sub-50ms p99 latency, serverless is going to frustrate you.

15-minute ceiling

Lambda functions have a hard maximum execution time of 15 minutes. This rules out video transcoding, large data processing jobs, ML training, and any workflow that needs to hold state across a long sequence of steps.

You can work around some of these with Step Functions, but that adds complexity. Sometimes the honest answer is a container.

Cost at high volume

Serverless is cheap at low scale, but at high scale the economics can flip.

Real numbers from the AWS Cost Calculator for 128MB / 100ms per request, no free tier:

Monthly requests	Lambda cost	EC2 t3.small cost
1M	$0.41	$15
10M	$4.08	$15
50M	$20.42	$15
100M	$40.83	$15

The crossover happens around 30-40 million requests per month. Model your costs with real numbers before you commit.

Stateful workloads

Each Lambda invocation is isolated. When it finishes, its memory is gone. The next invocation starts completely fresh.

This rules out persistent WebSocket connections, in-memory caching that survives across requests, and streaming pipelines that hold rolling buffers. If your workload needs persistent memory, serverless is going to fight you. And honestly, the fight isn't worth it; use the right tool for the job.

When serverless IS the right choice

Serverless is great for so many use cases:

Bursty or unpredictable traffic
Event-driven processing, think file uploads, DB changes, queue messages
Scheduled jobs. I actually use this myself I have a scheduled Lambda that checks my Meetup groups hourly, combines them into a calendar, and drops them into a Discord channel. Costs basically nothing.
Glue code connecting services
And my favorite, side projects, where you want zero ops overhead.

If your workload fits these patterns, serverless is an excellent choice. And if it doesn't, know that it's going to be a fight.

Mental model 2: Stateless doesn't mean simple

Video 2 of 4: "Stateless Does Not Mean Simple"

The most common misconception I see from those new to serverless, is that when they hear "stateless functions" they think "simple system." But then they build something real and it feels surprisingly complicated (like a Charlie Day diagram), and they don't understand why.

The thing is, stateless doesn't mean the complexity goes away. It means the complexity moves out of your functions and into the infrastructure around them.

Where the complexity goes

Databases. Your primary source of truth. The complexity here is consistency. When multiple Lambda functions run simultaneously, and both try to update the same record, who wins? You need to think about transactions, optimistic locking, and conflict resolution. DynamoDB's conditional writes handle single-item atomicity; multi-item atomicity requires DynamoDB Transactions (2x read/write cost, 100-item limit per call). (P.S - Don't try to simulate transactions with separate conditional writes. I have a story about that for another day.)

Caches. You add Redis or ElastiCache to speed up frequent reads. Now you have two copies of the same data. When the database changes, how does the cache know? Cache invalidation is famously one of the hardest problems in computer science. Not because it's algorithmically complex, but because getting it wrong has subtle, hard-to-detect consequences. Even a well-designed write-through strategy has a timing window between the DB write and the cache update. For prices, inventory counts, or financial data, that window is real.

Queues. Services like SQS let functions communicate asynchronously. The complexity is in delivery guarantees. SQS Standard queues offer at-least-once delivery, meaning a message may arrive more than once. Your function needs to handle duplicates gracefully.

Which brings us to idempotency...

Idempotency: the concept that ties it all together

Idempotent: running the same function twice with the same input produces the same result and causes no additional harm.

This matters because in a distributed system with queues, retries, and at-least-once delivery, your function will be called more than once for the same event. It's not a question of if.

The classic example from the code we build in Video 4:

// Idempotent: each invocation generates a new unique ID
// Two POSTs = two distinct notes, not one overwritten
const noteId = randomUUID();

The classic counterexample:

// NOT idempotent: if this runs twice, the customer is charged twice
await stripe.charges.create({ amount: 4900, currency: 'usd' });

And FYI chargebacks are not free, they penalize the seller. Enough of them, and your payment processor cuts you off.

Design for idempotency before you ship. Use idempotency keys. Design every queue-triggered function so that a duplicate delivery causes no additional harm.

The warm container gotcha

Lambda reuses execution environments between invocations, keeping your function "warm" to avoid cold start overhead. Variables initialized outside your handler persist across invocations on the same container.

This is good for things like the DynamoDB client:

// Initialized once: reused across warm invocations
const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

It's a bug for request-specific mutable state:

// Don't do this: state leaks between invocations
let requestCount = 0; // This won't reset between requests!
exports.handler = async (event) => {
  requestCount++; // Undefined behavior on warm containers
  // ...
};

Write your functions as if they always start cold. Put reusable stateless resources outside the handler. Put everything request-specific inside it.

FIFO queues still need idempotency

SQS FIFO queues offer exactly-once delivery within the 5-minute deduplication window. But if your Lambda times out mid-processing without deleting the message, SQS redelivers it. FIFO reduces the frequency of duplicates. It does not eliminate them, so you have to design for idempotency regardless of queue type.

Mental model 3: Serverless is about failure domains

Video 3 of 4: "Why Serverless Is About Failure Domains"

Just so you know, the real reason to use serverless isn't cost or auto-scaling. Those are really nice side effects.

The real reason is failure isolation.

What is a failure domain?

A failure domain is the set of components that will fail together when one thing goes wrong. It's the boundary of what breaks.

Think about circuit breakers in a house. When the kitchen circuit trips (if you're me, this happens when I plug in the food processor and the KitchenAid mixer at the same time), the kitchen goes dark. The bedroom lights stay on. The living room TV keeps running. The failure is contained in one domain.

Without circuit breakers, a single kitchen fault can take out the whole house.

Software works the same way.

Monolith vs serverless failure surface

In a traditional monolith, everything runs in one process. A bug in the email service can crash the entire application. That means the payment processor, the catalog, and the user service. One failure domain equals the whole app.

In a serverless architecture, each Lambda function runs in its own isolated execution environment. They don't share memory. A crash in one doesn't cascade to another.

If your email function crashes, emails fail. Payments keep processing, and the catalog keeps running. The blast radius is just an email function.

Designing for blast radius

There are three principles to design for blast radius:

1. Separate what can fail independently.

If two operations don't need to succeed or fail together, don't connect them synchronously.

// Tightly coupled: analytics down = signup fails
signupHandler() {
  createAccount()     // critical
  sendWelcomeEmail()  // nice to have
  addToAnalytics()    // background work
}

// Decoupled: analytics down = analytics retries, signup succeeds
signupHandler() {
  createAccount()
  publish("user.created") // → queue → async handlers
}

2. Define SLAs per function, not for the whole system.

Critical functions (payments, authentication) need dead-letter queues, alarms on every error, and idempotency keys. Best-effort functions (analytics, recommendation refresh) can tolerate high error rates. Treating them all the same means you're either over-engineering the unimportant or under-engineering the critical.

3. Know your blast radius before you ship.

Draw the dependency graph. Ask yourself, "If this function goes down, what else breaks?" If the answer is "everything," you've accidentally built a serverless monolith: a chain of synchronous Lambda calls with the same cascading failure profile as a monolith, just more expensive and harder to debug.

The latency vs resilience tradeoff

Isolating failure domains with queues adds latency. End-to-end time for a full workflow is longer when there are queue hops.

So keep in mind this pattern that works. Keep the critical synchronous path as short as possible, return a meaningful response to the user, and then push everything else async.

// Synchronous path (user waits): create order, charge card, return order ID
// Async (user doesn't wait): send confirmation email, update analytics, notify warehouse

Common production gotchas

The serverless monolith. You split into Lambda functions but connected them all synchronously. Function A calls B, B calls C, C calls D. B fails = C and D never run, A gets an error. You rebuilt a monolith.

Shared Lambda layers as hidden coupling. A layer used by 12 functions means those 12 functions share a failure domain for anything in that layer. A bad deployment takes down all 12.

Treating dead-letter queues as optional. Without a DLQ, failed async invocations are silently dropped after retries. You find out when a user reports missing data, not from an alert.

Cross-region synchronous dependencies. If your Lambda in us-east-1 calls a service in eu-west-1 synchronously, a regional event becomes your outage. Cross-region calls need a fallback or should be moved off the synchronous path.

Mental model 4: Build it, then break it on purpose

Video 4 of 4: "Build a Minimal Serverless API"

This is the hands-on video. We review a notes API (POST, GET, DELETE) using Lambda, API Gateway, and DynamoDB, deployed with AWS CDK. Then we break it four different ways and debug it "live".

Here's what the architecture looks like:

Browser → API Gateway → Lambda → DynamoDB

Three services, which means three failure domains. If DynamoDB has a bad day, API Gateway is still receiving requests and Lambda is still running. Blast radius = DynamoDB calls fail. Not the whole API is down.

The key decisions in the handler

// Outside the handler: warm containers reuse this
// Ep 2: warm container gotcha
const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);
const TABLE = process.env.TABLE_NAME || "notes";

exports.handler = async (event) => {
  try {

    if (method === "POST") {
      const body = JSON.parse(event.body || "{}");

      // Validate at the boundary: 400 (caller error) not 500 (our error)
      if (!body.content || typeof body.content !== "string") {
        return respond(400, { error: "content is required" });
      }

      // randomUUID: idempotency by design. Ep 2
      // Retried POST = new note, not an overwrite
      const noteId = randomUUID();

      await docClient.send(new PutCommand({
        TableName: TABLE,
        Item: { noteId, content: body.content.trim(), createdAt: Date.now() },
      }));

      return respond(201, { noteId });
    }

    // ...GET and DELETE...

  } catch (err) {
    // Always log before returning 500
    // console.error → CloudWatch Logs
    // Never return err.message: may leak internals to callers
    console.error("Handler error:", err);
    return respond(500, { error: "Internal server error" });
  }
};

Every decision connects back to a concept from the series.

A note on the comments: the tutorial code is heavily commented because it is a tutorial. That is not how I write production code. In production, I only comment on things that actually need explanation: weird behavior, a non-obvious "why", a gotcha that will bite the next person. The code should explain what it does. Comments explain why it does it when that is not obvious. If anybody has ever seen any of my production code... sorry. Anyways.

The CDK stack

Infrastructure as code, not console clicking. This is what engineers use in production: version-controlled, reproducible, and reviewable in a pull request.

const table = new dynamodb.Table(this, 'NotesTable', {
  partitionKey: { name: 'noteId', type: dynamodb.AttributeType.STRING },
  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST, // Ep 1: no idle cost
  removalPolicy: cdk.RemovalPolicy.DESTROY,
});

const dlq = new sqs.Queue(this, 'NotesDLQ', {
  retentionPeriod: cdk.Duration.days(14),
});

const handler = new lambda.Function(this, 'NotesHandler', {
  runtime: lambda.Runtime.NODEJS_20_X,
  environment: { TABLE_NAME: table.tableName },
  tracing: lambda.Tracing.ACTIVE, // X-Ray: cold start visibility. Ep 1
  deadLetterQueue: dlq,           // Silent failure prevention. Ep 3
});

// Least privilege: limits blast radius of a compromised function. Ep 3
table.grantReadWriteData(handler);

The four breaks

Break 1: IAM permission error. Comment out grantReadWriteData, redeploy, call the API. You get a 500 with no useful information. Navigate to CloudWatch Logs → your function's log group → most recent stream. You'll see:

Handler error: AccessDeniedException: User: arn:aws:sts::...
is not authorized to perform: dynamodb:PutItem on resource: ...table/notes

The log tells you exactly what failed: which principal, which action, which resource. The 500 told you nothing. This is why console.error before every return respond(500) is non-negotiable.

Break 2: Bad input. Send a POST with no content field. Without validation it would silently store a note with content: undefined and return a 201. With it, you get a clean 400. Validate at the boundary. 400 = caller's fault. 500 = your fault. They're different problems for different people.

Break 3: Cold start in X-Ray. Leave the function idle for a few minutes, then invoke it. Navigate to X-Ray traces. You'll see two segments on the cold invocation: Initialization (~300-400ms) and Invocation (<10ms). Warm invocations have no Initialization segment. This is the cold start discussion from Video 1, made visible in real traces.

Break 4: The DLQ and silent failures. Our HTTP API is synchronous, so errors return 500s to the caller. But real production systems gain async triggers over time. Without a DLQ, failed async invocations are silently dropped. The DLQ is already wired in the CDK stack. Navigate to Lambda → Configuration → Asynchronous invocation to verify it's there. Add a CloudWatch alarm on queue depth so you get paged when something fails, not when a user reports it.

The full picture

Four videos. Four mental models. Here's what they give you together:

Mental model	The insight
Limits	Know when serverless is the wrong tool before you commit
Stateless ≠ simple	The complexity moves to the state layer. Design for it.
Failure domains	Serverless is fundamentally about blast radius control
Build + break	The happy path is easy. Knowing what the errors look like is the real skill

The developers who get really good at serverless aren't the ones who know the most API syntax. They're the ones who have a mental model of the whole distributed system. This means the failure modes, the state management, the blast radius, and making design decisions from that model.

That's what this series is trying to give you.

Watch the series

📺 Serverless Mental Models: full playlist

💻 Code on GitHub

DEV Community