Renaldi for AWS Community Builders

Posted on Mar 8

Building with API Gateway, Lambda and DynamoDB Single-Table Design for Multi-Tenant SaaS

#webdev #programming #typescript #lambda

When I build multi-tenant SaaS on AWS, one of my favorite combinations is API Gateway + Lambda + DynamoDB. It gives me a clean serverless control plane, fast iteration, and a lot of flexibility in how I model data and enforce tenancy.

In this post, I will walk through an end-to-end implementation pattern for a multi-tenant SaaS API using:

Amazon API Gateway for API ingress
AWS Lambda for business logic
Amazon DynamoDB with a single-table design
Amazon Cognito + JWT for authentication
Per-tenant throttling and quotas using API Gateway usage plans (with a Lambda authorizer pattern)

I will focus on the architectural and data modeling decisions that matter in production:

Tenant isolation patterns
Partition key design
Hot partition mitigation
GSIs for access patterns
Auth context propagation (Cognito/JWT)
Per-tenant throttling and quotas

I will also include code and an implementation walkthrough so you can adapt the pattern to your own SaaS.

Why this stack works so well for multi-tenant SaaS

I like this stack because it lets me combine application-level tenancy controls with infrastructure-level scalability:

API Gateway gives me request routing, auth integration, throttling, and observability.
Lambda gives me a stateless execution layer where I can consistently apply tenant-aware logic.
DynamoDB gives me low-latency reads/writes and a data model that can be shaped around access patterns, which is exactly what multi-tenant SaaS needs.

The biggest challenge is not deploying these services. It is designing them so that one tenant cannot interfere with another tenant, and so that one noisy tenant does not dominate throughput.

That is where the design details matter.

What I am building in this walkthrough

To make the examples concrete, I will use a simple B2B issue tracking SaaS:

Each tenant is a customer organization
Each tenant has:
- users
- projects
- tickets
- comments
Users authenticate with Cognito
API calls are authorized via JWT and mapped to a tenant context
Data lives in one DynamoDB table (single-table design)

Example endpoints:

POST /projects/{projectId}/tickets
GET /projects/{projectId}/tickets?status=OPEN
GET /me/tickets
POST /tickets/{ticketId}/comments

Important design principle:

I do not trust the client to tell me the tenant ID in the path/body for authorization purposes. I derive tenancy from the validated JWT, then enforce it in my Lambda logic and DynamoDB key design.

Architecture Overview

This architecture uses:

Cognito to issue JWTs
API Gateway (REST API) as the front door
Lambda Authorizer to validate JWT and inject normalized auth context
Lambda handlers for business operations
DynamoDB single table for all entities
API Gateway usage plans for per-tenant throttling/quota (via authorizer metering key)

I am intentionally using a Lambda authorizer (instead of only a native Cognito authorizer) because it lets me:

normalize claims (tenantId, roles, plan)
enforce custom auth checks
return a usageIdentifierKey to support per-tenant usage plans and quotas in API Gateway REST API

End-to-end request flow (from login to DynamoDB write)

Let’s walk through a POST /projects/{projectId}/tickets call.

User signs in via Cognito
- Cognito issues a JWT (ID token or access token, depending your claim strategy)
- The token contains user identity and tenant-related claims (for example tenant_id, plan, roles)
Client calls API Gateway
- Sends Authorization: Bearer <JWT>
API Gateway invokes Lambda Authorizer
- Validates the JWT signature and issuer
- Extracts/normalizes claims
- Returns:
  - IAM policy (Allow)
  - context (tenant ID, user ID, roles, plan)
  - usageIdentifierKey (tenant-specific metering key for usage plan enforcement)
API Gateway invokes Lambda handler
- The handler reads the authorizer context from event.requestContext.authorizer
Lambda writes to DynamoDB
- Uses tenant-prefixed keys and tenant-aware access patterns
- Applies conditional expressions as needed
- Writes base item + GSI attributes
CloudWatch metrics/logs capture telemetry
- API Gateway request count/throttles
- Lambda duration/errors
- DynamoDB consumed capacity/throttles

This pattern keeps tenancy and throttling context consistent across the request path.

Tenant isolation patterns (and which one I am using)

Before writing a single key schema, I decide what kind of tenancy model I need.

1) Silo model (strongest isolation)

Each tenant gets isolated infrastructure (separate table / account / stack).

Pros

Strong isolation
Easier compliance stories for some customers
No noisy neighbor at data layer

Cons

Higher operational overhead
Harder fleet-wide schema changes
More expensive for small tenants

2) Pool model (shared infrastructure, tenant-aware app logic)

All tenants share the same infrastructure (same API, same Lambda, same table), but every request and every item is tenant-scoped.

Pros

Cost-efficient
Operationally simpler for most SaaS products
Scales well if keys are designed correctly

Cons

You must be disciplined about isolation in code and data model
Hot partitions/noisy neighbors need mitigation

3) Bridge model (hybrid)

Some tenants are pooled, some are siloed (for enterprise/compliance tiers).

Pros

Flexible business model
Lets you upsell isolation

Cons

More deployment and routing complexity

What I am using here

For this post, I am using the pool model with strong tenant-aware design:

Tenant ID is always derived from auth context
DynamoDB keys are tenant-prefixed
Access patterns are tenant-scoped
API Gateway usage plans enforce per-tenant limits
Lambda code refuses cross-tenant access even if a user manipulates IDs

This is a very practical pattern for SaaS teams that want speed and scale without overbuilding day one.

Single-table design mindset for multi-tenant SaaS

A lot of DynamoDB pain comes from designing tables like relational schemas.

I do the opposite: I start with access patterns, then design keys around them.

Access patterns I need

For this SaaS, I care about these reads/writes:

Create a ticket in a project
List tickets by project (optionally filtered by status)
List tickets assigned to the current user
Get a ticket by ID
Add/list comments for a ticket
List projects for a tenant
Look up user membership / role in a tenant

These access patterns will drive:

primary key design (PK, SK)
GSI design
where I shard to avoid hot partitions

Partition key design (tenant-aware and access-pattern-first)

This is the part that usually makes or breaks multi-tenant DynamoDB.

A common mistake

A very common first attempt is:

PK = TENANT#<tenantId>
SK = <everything else>

That can work for small tenants, but it becomes risky when a tenant is large or very active because all writes for that tenant concentrate on a small number of partitions.

My design approach

I still make keys tenant-aware, but I distribute writes across tenant sub-collections.

I use patterns like:

PK = TENANT#<tenantId> for tenant metadata and low-volume tenant-level items
PK = TENANT#<tenantId>#PROJECT#<projectId> for project and ticket collections
PK = TENANT#<tenantId>#TICKET#<ticketId> for ticket root + comments (if comment-heavy)
Sharded GSIs (or write-sharded partitions) for high-volume list patterns

This gives me tenant isolation without forcing every high-volume write into one tenant partition.

Example item model (single table)

I usually keep a few shared attributes on every item:

PK, SK
entityType
tenantId
timestamps (createdAt, updatedAt)
optional GSI attributes (GSI1PK, GSI1SK, GSI2PK, GSI2SK)

Example items

Tenant metadata

{
  "PK": "TENANT#t_123",
  "SK": "META",
  "entityType": "TENANT",
  "tenantId": "t_123",
  "name": "Acme Corp",
  "plan": "pro",
  "createdAt": "2026-02-25T00:00:00Z"
}

Project

{
  "PK": "TENANT#t_123",
  "SK": "PROJECT#p_001",
  "entityType": "PROJECT",
  "tenantId": "t_123",
  "projectId": "p_001",
  "name": "Platform",
  "createdAt": "2026-02-25T00:00:00Z"
}

Ticket (project-scoped partition)

{
  "PK": "TENANT#t_123#PROJECT#p_001",
  "SK": "TICKET#tk_9001",
  "entityType": "TICKET",
  "tenantId": "t_123",
  "projectId": "p_001",
  "ticketId": "tk_9001",
  "title": "API returns 500 on retry",
  "status": "OPEN",
  "priority": "HIGH",
  "assigneeUserId": "u_77",
  "createdByUserId": "u_12",
  "createdAt": "2026-02-25T01:00:00Z",
  "updatedAt": "2026-02-25T01:00:00Z",

  "GSI1PK": "TENANT#t_123#PROJECT#p_001#STATUS#OPEN",
  "GSI1SK": "UPDATED#2026-02-25T01:00:00Z#TICKET#tk_9001",

  "GSI2PK": "TENANT#t_123#ASSIGNEE#u_77",
  "GSI2SK": "UPDATED#2026-02-25T01:00:00Z#TICKET#tk_9001"
}

Comment (ticket-scoped partition)

{
  "PK": "TENANT#t_123#TICKET#tk_9001",
  "SK": "COMMENT#2026-02-25T01:05:00Z#c_001",
  "entityType": "COMMENT",
  "tenantId": "t_123",
  "ticketId": "tk_9001",
  "commentId": "c_001",
  "authorUserId": "u_12",
  "body": "Investigating now",
  "createdAt": "2026-02-25T01:05:00Z"
}

Why this layout works

Tenant is embedded in every key path
Tickets are grouped by project for efficient project views
Comments are grouped by ticket for efficient comment threads
User-assignee lookup is handled by GSI
Status-based listing is handled by GSI

This is a good default shape for many SaaS workloads.

GSIs for access patterns (and how I decide them)

I try to make every GSI exist for a specific query, not “just in case”.

GSI1: List tickets by project + status

Use case: GET /projects/{projectId}/tickets?status=OPEN

GSI1PK = TENANT#<tenantId>#PROJECT#<projectId>#STATUS#<status>
GSI1SK = UPDATED#<timestamp>#TICKET#<ticketId>

This lets me list tickets by status and sort by most recent updates.

GSI2: List tickets assigned to a user

Use case: GET /me/tickets

GSI2PK = TENANT#<tenantId>#ASSIGNEE#<userId>
GSI2SK = UPDATED#<timestamp>#TICKET#<ticketId>

This gives me a tenant-safe “my work” view.

Optional GSI3: Lookup by public ticket reference (if needed)

If you expose a user-friendly ticket reference like ACME-194, I may add a sparse GSI:

GSI3PK = TENANT#<tenantId>#REF#ACME-194
GSI3SK = TICKET#<ticketId>

I do not add this unless I truly need it.

Hot partition mitigation (the part many examples skip)

A single-table design can be “correct” and still fail under traffic if hot partitions are ignored.

Here are the patterns I use.

1) Avoid concentrating all writes into one tenant root partition

If all writes use PK = TENANT#<tenantId>, a large tenant can get hot fast.

Mitigation: partition by natural sub-aggregates (project, ticket thread, etc.)

2) Use write sharding for high-volume list/index patterns

Suppose one tenant has a huge number of ticket updates, and your status GSI gets hot.

I can shard the GSI partition key:

GSI1PK = TENANT#t_123#PROJECT#p_001#STATUS#OPEN#SHARD#03

Then query multiple shards and merge results in Lambda.

How do I pick the shard?

Hash ticketId or (ticketId + status) into N buckets
Start small (for example 4 or 8 shards)
Increase if traffic grows

Example helper:

function shardFor(value: string, shardCount = 8): string {
  let hash = 0;
  for (let i = 0; i < value.length; i++) {
    hash = (hash * 31 + value.charCodeAt(i)) >>> 0;
  }
  const shard = hash % shardCount;
  return shard.toString().padStart(2, "0");
}

3) Separate write-heavy entities from read-heavy aggregations

If comments are very high volume, I often avoid mixing them in the same partition as ticket metadata.

That is why in the example above I used:

ticket record under TENANT#...#PROJECT#...
comments under TENANT#...#TICKET#...

4) Use on-demand capacity early, then tune if needed

For newer SaaS products, I often start with on-demand capacity because it reduces operational tuning.

As traffic patterns stabilize, I decide whether provisioned + autoscaling is worth it.

5) Watch metrics by access pattern, not just table-wide

I monitor:

Throttled requests
Consumed read/write capacity
Latency by endpoint
GSI-specific pressure

If one endpoint is slow, I inspect whether the underlying key pattern is too concentrated.

Auth context propagation (Cognito/JWT -> API Gateway -> Lambda)

This is the glue that keeps tenant isolation consistent.

What I want from auth context

By the time my business Lambda runs, I want a normalized auth context like this:

tenantId
userId
roles
plan
(optional) scopes, email, orgId, isSupportUser

I do not want each Lambda handler parsing arbitrary JWT claims differently.

Cognito claim strategy

You have a few options:

Use Cognito ID token claims directly (simpler if tenant custom attributes are on the user profile)
Use Pre Token Generation trigger to inject normalized tenant claims into the token you use for APIs
Use groups + custom claims for role/plan mapping

For this post, I will assume the token contains:

sub
custom:tenant_id (or tenant_id)
custom:plan (or plan)
cognito:groups

Lambda Authorizer example (TypeScript)

This authorizer validates the Cognito JWT, extracts tenant context, and returns:

an Allow policy
authorizer context
usageIdentifierKey (for per-tenant API Gateway usage plan metering)

Note: In API Gateway REST APIs, usageIdentifierKey should be the API key value you want metered. In production, I usually map tenantId -> meteringKey and return the metering key, not the raw tenant ID.

// authorizer.ts
import type {
  APIGatewayTokenAuthorizerEvent,
  APIGatewayAuthorizerResult,
} from "aws-lambda";
import { CognitoJwtVerifier } from "aws-jwt-verify";

const userPoolId = process.env.USER_POOL_ID!;
const clientId = process.env.APP_CLIENT_ID!;

const verifier = CognitoJwtVerifier.create({
  userPoolId,
  tokenUse: "id", // or "access" if your claims are injected there
  clientId,
});

function buildPolicy(principalId: string, effect: "Allow" | "Deny", resource: string) {
  return {
    principalId,
    policyDocument: {
      Version: "2012-10-17",
      Statement: [
        {
          Action: "execute-api:Invoke",
          Effect: effect,
          Resource: resource,
        },
      ],
    },
  };
}

// Replace with a real lookup (DynamoDB / Secrets Manager / config service)
async function lookupMeteringKey(tenantId: string): Promise<string> {
  // Example only. In production, return the API Gateway API key *value* assigned to that tenant.
  return `meter-${tenantId}`;
}

export const handler = async (
  event: APIGatewayTokenAuthorizerEvent
): Promise<APIGatewayAuthorizerResult> => {
  try {
    const raw = event.authorizationToken || "";
    const token = raw.replace(/^Bearer\s+/i, "");

    const claims = await verifier.verify(token);

    const tenantId =
      (claims["custom:tenant_id"] as string | undefined) ??
      (claims["tenant_id"] as string | undefined);

    if (!tenantId) {
      throw new Error("Missing tenant claim");
    }

    const userId = String(claims.sub);
    const plan =
      String(
        (claims["custom:plan"] as string | undefined) ??
        (claims["plan"] as string | undefined) ??
        "free"
      );

    const groups = (claims["cognito:groups"] as string[] | undefined) ?? [];
    const meteringKey = await lookupMeteringKey(tenantId);

    const policy = buildPolicy(userId, "Allow", event.methodArn);

    return {
      ...policy,
      context: {
        tenantId,
        userId,
        plan,
        rolesJson: JSON.stringify(groups),
      },
      usageIdentifierKey: meteringKey,
    };
  } catch (err) {
    // API Gateway treats "Unauthorized" specially
    throw new Error("Unauthorized");
  }
};

Why I normalize here instead of in every Lambda

This avoids duplicated auth parsing and reduces inconsistent authorization logic across handlers.

It also gives me one place to:

map plans
map roles
support internal support-user impersonation rules (if needed)
connect metering/throttling to tenancy

Business Lambda: reading auth context and enforcing tenancy

Now let’s create a ticket.

This handler:

reads tenant context from the authorizer
ignores any tenant ID in the request body
writes a ticket item to DynamoDB using a tenant-prefixed key
populates GSIs for list access patterns

// create-ticket.ts
import type { APIGatewayProxyEvent, APIGatewayProxyResult } from "aws-lambda";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, PutCommand } from "@aws-sdk/lib-dynamodb";
import { randomUUID } from "crypto";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const TABLE_NAME = process.env.TABLE_NAME!;

type AuthorizerCtx = {
  tenantId: string;
  userId: string;
  plan: string;
  rolesJson?: string;
};

function response(statusCode: number, body: unknown): APIGatewayProxyResult {
  return {
    statusCode,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body),
  };
}

function getAuthContext(event: APIGatewayProxyEvent): AuthorizerCtx {
  const auth = (event.requestContext.authorizer ?? {}) as Record<string, unknown>;
  const tenantId = String(auth.tenantId ?? "");
  const userId = String(auth.userId ?? "");
  const plan = String(auth.plan ?? "free");
  const rolesJson = auth.rolesJson ? String(auth.rolesJson) : "[]";

  if (!tenantId || !userId) {
    throw new Error("Missing auth context");
  }

  return { tenantId, userId, plan, rolesJson };
}

function shardFor(value: string, shardCount = 8): string {
  let hash = 0;
  for (let i = 0; i < value.length; i++) {
    hash = (hash * 31 + value.charCodeAt(i)) >>> 0;
  }
  return String(hash % shardCount).padStart(2, "0");
}

export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  try {
    const { tenantId, userId } = getAuthContext(event);
    const projectId = event.pathParameters?.projectId;
    if (!projectId) return response(400, { message: "projectId is required" });

    const body = event.body ? JSON.parse(event.body) : {};
    const title = String(body.title ?? "").trim();
    const assigneeUserId = body.assigneeUserId ? String(body.assigneeUserId) : undefined;
    const priority = body.priority ? String(body.priority) : "MEDIUM";

    if (!title) return response(400, { message: "title is required" });

    const ticketId = `tk_${randomUUID()}`;
    const now = new Date().toISOString();
    const status = "OPEN";

    // Optional GSI sharding for high-volume status lists
    const statusShard = shardFor(ticketId, 8);

    const item = {
      PK: `TENANT#${tenantId}#PROJECT#${projectId}`,
      SK: `TICKET#${ticketId}`,
      entityType: "TICKET",

      tenantId,
      projectId,
      ticketId,
      title,
      status,
      priority,
      assigneeUserId,
      createdByUserId: userId,
      createdAt: now,
      updatedAt: now,

      // List tickets by project + status
      GSI1PK: `TENANT#${tenantId}#PROJECT#${projectId}#STATUS#${status}#SHARD#${statusShard}`,
      GSI1SK: `UPDATED#${now}#TICKET#${ticketId}`,

      // List tickets by assignee
      ...(assigneeUserId && {
        GSI2PK: `TENANT#${tenantId}#ASSIGNEE#${assigneeUserId}`,
        GSI2SK: `UPDATED#${now}#TICKET#${ticketId}`,
      }),
    };

    await ddb.send(
      new PutCommand({
        TableName: TABLE_NAME,
        Item: item,
        ConditionExpression: "attribute_not_exists(PK) AND attribute_not_exists(SK)",
      })
    );

    return response(201, {
      ticketId,
      status,
      createdAt: now,
    });
  } catch (err) {
    console.error("create-ticket failed", err);
    return response(500, { message: "Internal server error" });
  }
};

Isolation note

The handler never accepts tenantId from the request body for authorization purposes.

Even if a client sends a different tenant ID, it does not matter because the handler uses tenantId from the validated authorizer context.

Querying with GSIs (list tickets by project + status)

Now let’s implement GET /projects/{projectId}/tickets?status=OPEN.

Because I sharded GSI1PK, I may need to query multiple shards and merge the results.

// list-project-tickets.ts
import type { APIGatewayProxyEvent, APIGatewayProxyResult } from "aws-lambda";
import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import { DynamoDBDocumentClient, QueryCommand } from "@aws-sdk/lib-dynamodb";

const ddb = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const TABLE_NAME = process.env.TABLE_NAME!;

function response(statusCode: number, body: unknown): APIGatewayProxyResult {
  return {
    statusCode,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(body),
  };
}

function getTenantId(event: APIGatewayProxyEvent): string {
  const auth = (event.requestContext.authorizer ?? {}) as Record<string, unknown>;
  const tenantId = String(auth.tenantId ?? "");
  if (!tenantId) throw new Error("Missing tenantId");
  return tenantId;
}

export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  try {
    const tenantId = getTenantId(event);
    const projectId = event.pathParameters?.projectId;
    const status = String(event.queryStringParameters?.status ?? "OPEN");

    if (!projectId) return response(400, { message: "projectId is required" });

    const shardCount = 8;
    const queries = Array.from({ length: shardCount }, (_, n) => {
      const shard = String(n).padStart(2, "0");
      return ddb.send(
        new QueryCommand({
          TableName: TABLE_NAME,
          IndexName: "GSI1",
          KeyConditionExpression: "GSI1PK = :pk",
          ExpressionAttributeValues: {
            ":pk": `TENANT#${tenantId}#PROJECT#${projectId}#STATUS#${status}#SHARD#${shard}`,
          },
          Limit: 25, // tune based on your UX pagination model
          ScanIndexForward: false, // newest first
        })
      );
    });

    const results = await Promise.all(queries);
    const items = results.flatMap((r) => r.Items ?? []);

    // Merge and sort in-memory because we queried multiple shards
    items.sort((a, b) => String(b.updatedAt).localeCompare(String(a.updatedAt)));

    return response(200, {
      items,
      count: items.length,
    });
  } catch (err) {
    console.error("list-project-tickets failed", err);
    return response(500, { message: "Internal server error" });
  }
};

Tradeoff discussion

Sharding improves write scalability, but read logic is more complex because I need fan-out queries.

That is a normal DynamoDB tradeoff. I only add this complexity for access patterns that actually need it.

Per-tenant throttling and quotas (API Gateway + authorizer metering)

This is one of the most useful SaaS controls and one of the least discussed implementation details.

What I want

I want to enforce limits like:

Free tier tenant: lower RPS and daily quota
Pro tier tenant: higher RPS and quota
Enterprise tenant: custom limits

Why API Gateway usage plans help

API Gateway REST APIs support:

throttling
quotas
API keys
usage plans

With a Lambda authorizer, I can return a usageIdentifierKey so API Gateway meters/throttles by a tenant-specific key, even when the client authenticates with JWT.

Important implementation note

For REST APIs, set apiKeySourceType to AUTHORIZER so the usage key comes from the authorizer rather than an x-api-key header.

CDK sketch (TypeScript)

Below is a simplified CDK sketch to show the pattern. You will need to adapt it to your stack structure.

import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as apigw from "aws-cdk-lib/aws-apigateway";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as dynamodb from "aws-cdk-lib/aws-dynamodb";

export class MultiTenantApiStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const table = new dynamodb.Table(this, "AppTable", {
      partitionKey: { name: "PK", type: dynamodb.AttributeType.STRING },
      sortKey: { name: "SK", type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      pointInTimeRecovery: true,
      removalPolicy: cdk.RemovalPolicy.RETAIN,
    });

    table.addGlobalSecondaryIndex({
      indexName: "GSI1",
      partitionKey: { name: "GSI1PK", type: dynamodb.AttributeType.STRING },
      sortKey: { name: "GSI1SK", type: dynamodb.AttributeType.STRING },
      projectionType: dynamodb.ProjectionType.ALL,
    });

    table.addGlobalSecondaryIndex({
      indexName: "GSI2",
      partitionKey: { name: "GSI2PK", type: dynamodb.AttributeType.STRING },
      sortKey: { name: "GSI2SK", type: dynamodb.AttributeType.STRING },
      projectionType: dynamodb.ProjectionType.ALL,
    });

    const authorizerFn = new lambda.Function(this, "AuthorizerFn", {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: "authorizer.handler",
      code: lambda.Code.fromAsset("dist/authorizer"),
      environment: {
        USER_POOL_ID: "your-user-pool-id",
        APP_CLIENT_ID: "your-app-client-id",
      },
    });

    const createTicketFn = new lambda.Function(this, "CreateTicketFn", {
      runtime: lambda.Runtime.NODEJS_20_X,
      handler: "create-ticket.handler",
      code: lambda.Code.fromAsset("dist/create-ticket"),
      environment: {
        TABLE_NAME: table.tableName,
      },
    });

    table.grantReadWriteData(createTicketFn);

    const api = new apigw.RestApi(this, "SaaSApi", {
      restApiName: "multi-tenant-saas-api",
      apiKeySourceType: apigw.ApiKeySourceType.AUTHORIZER, // key point for per-tenant metering
      deployOptions: {
        stageName: "prod",
        metricsEnabled: true,
        loggingLevel: apigw.MethodLoggingLevel.INFO,
        dataTraceEnabled: false,
        throttlingBurstLimit: 2000,
        throttlingRateLimit: 1000,
      },
    });

    const tokenAuthorizer = new apigw.TokenAuthorizer(this, "JwtAuthorizer", {
      handler: authorizerFn,
      resultsCacheTtl: cdk.Duration.minutes(5),
    });

    const projects = api.root.addResource("projects");
    const projectId = projects.addResource("{projectId}");
    const tickets = projectId.addResource("tickets");

    tickets.addMethod(
      "POST",
      new apigw.LambdaIntegration(createTicketFn),
      {
        authorizer: tokenAuthorizer,
        authorizationType: apigw.AuthorizationType.CUSTOM,
      }
    );

    // Example usage plan (create multiple plans for free/pro/enterprise)
    const freePlan = api.addUsagePlan("FreePlan", {
      name: "free-tier-plan",
      throttle: {
        rateLimit: 50,
        burstLimit: 100,
      },
      quota: {
        limit: 100000,
        period: apigw.Period.DAY,
      },
    });

    freePlan.addApiStage({
      api,
      stage: api.deploymentStage,
    });

    // API keys that represent tenant metering identities.
    // In production, create one per tenant and associate with the correct plan.
    const tenantApiKey = api.addApiKey("TenantApiKey_t_123", {
      apiKeyName: "tenant-t_123-metering-key",
      enabled: true,
      // value can be auto-generated or explicitly set
    });

    freePlan.addApiKey(tenantApiKey);
  }
}

How I usually operationalize this

Create usage plans by tier (free, pro, enterprise)
Create an API key per tenant (metering identity)
Store the key value (or a mapping) securely
Have the authorizer return the correct usageIdentifierKey for the tenant

This gives me a clean SaaS throttle/quota model without making clients manage API keys directly.

Authorization checks beyond JWT validation (critical for tenant safety)

JWT validation is not enough. I also enforce authorization in Lambda based on tenant and role.

Examples:

User can create tickets only in projects belonging to their tenant
User can comment only on tickets in their tenant
Support/admin actions require specific roles/scopes
Cross-tenant resource IDs are rejected even if guessed

Example guard pattern

function requireRole(rolesJson: string | undefined, allowed: string[]) {
  const roles = rolesJson ? JSON.parse(rolesJson) as string[] : [];
  const ok = roles.some((r) => allowed.includes(r));
  if (!ok) {
    const err = new Error("Forbidden");
    (err as any).statusCode = 403;
    throw err;
  }
}

I typically build a small internal library for:

auth context parsing
role checks
tenant-scoped key helpers
common error responses

That reduces drift across Lambda functions.

End-to-end implementation walkthrough

Here is a practical sequence I would follow to build this from scratch.

Step 1: Define your access patterns first

Before touching CloudFormation/CDK, write down:

reads
writes
sorting/filtering requirements
expected high-volume paths

This will drive your PK/SK and GSI decisions.

Step 2: Set up Cognito and define tenant claims

Create a user pool and app client. Decide how tenant context appears in JWTs.

I recommend normalizing these claims:

tenant_id
plan
roles or groups

If your current user profile attributes are inconsistent, fix that first. It will save a lot of pain later.

Step 3: Build the Lambda authorizer

Implement JWT validation and claim normalization.

Outputs should include:

tenantId
userId
rolesJson
plan
usageIdentifierKey (tenant metering key)

Step 4: Create DynamoDB single table + GSIs

Start with only the GSIs you need for your first release.

A good initial setup:

base table (PK, SK)
GSI1 for project+status ticket listing
GSI2 for assignee ticket listing

Step 5: Implement tenant-safe handlers

In every handler:

parse auth context from API Gateway authorizer context
derive keys from auth context
never trust client-supplied tenant IDs for authorization
use conditional expressions when appropriate

Step 6: Add per-tenant usage plans

Create:

usage plans by tier
API keys per tenant
tenant -> metering key mapping

Then return the metering key in the Lambda authorizer.

Step 7: Test with real tokens and tenant boundary tests

I always run tests for:

valid tenant access
invalid token
wrong tenant resource ID
quota exceeded / throttle behavior
large-tenant list/query traffic

Example request/response flow (local testing mindset)

Create ticket request

curl -X POST \
  "https://{apiId}.execute-api.{region}.amazonaws.com/prod/projects/p_001/tickets" \
  -H "Authorization: Bearer <JWT>" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "API returns 500 on retry",
    "priority": "HIGH",
    "assigneeUserId": "u_77"
  }'

Expected response

{
  "ticketId": "tk_6b4d8e3e-....",
  "status": "OPEN",
  "createdAt": "2026-02-25T01:00:00Z"
}

List tickets by status

curl \
  "https://{apiId}.execute-api.{region}.amazonaws.com/prod/projects/p_001/tickets?status=OPEN" \
  -H "Authorization: Bearer <JWT>"

Common mistakes I see in this architecture

1) Trusting `tenantId` from the client

This is the fastest way to create cross-tenant data leaks.

Fix: derive tenant context from validated JWT and enforce server-side.

2) Designing the table like a relational schema

DynamoDB single-table design is access-pattern driven, not entity-table driven.

Fix: start with queries and write paths first.

3) Overusing one tenant root partition key

This can create hot partitions for large tenants.

Fix: use tenant-aware sub-collections and shard high-volume patterns.

4) Adding too many GSIs early

Every GSI adds write amplification and operational cost.

Fix: add GSIs only for clear access patterns.

5) Treating JWT validation as full authorization

A valid token does not mean a valid action.

Fix: layer role checks and tenant-resource checks in Lambda.

6) Ignoring quota/throttling strategy until later

By the time you need it, retrofitting it can be messy.

Fix: establish a per-tenant metering pattern early (usage plans or app-level rate limiting).

When I would extend this design

As the SaaS grows, I often add:

DynamoDB Streams for async workflows (notifications, audit, search indexing)
SQS/EventBridge for background processing
Fine-grained authorization model (RBAC/ABAC) beyond simple groups
Tenant-level feature flags
Audit log table (append-only) for compliance
Support-user safe impersonation with strict audit trails
Silo/bridge routing for enterprise tenants needing dedicated infrastructure

The nice part is that the core pattern (API Gateway + Lambda + DynamoDB + normalized auth context) still holds.

Final thoughts

This stack is powerful because it lets me solve architecture and data modeling together.

The API layer, auth model, and DynamoDB key design are not separate decisions in a multi-tenant SaaS. They are one system:

Auth defines the tenant context
The API propagates and enforces it
The data model encodes it
Throttling/quotas protect the platform from noisy neighbors

If you get those four pieces aligned early, you can move fast without constantly revisiting foundational decisions.

If I were building a new SaaS MVP today on AWS, this would still be one of my first choices.

References

Amazon API Gateway REST API documentation
AWS Lambda developer guide
Amazon DynamoDB developer guide
DynamoDB single-table design patterns (AWS documentation and re:Invent talks)
Amazon Cognito user pools and JWT token documentation
API Gateway usage plans and API keys documentation
aws-jwt-verify library documentation
DynamoDB best practices for partition key design and adaptive capacity
AWS Well-Architected Framework (Serverless Lens)

Why this stack works so well for multi-tenant SaaS

What I am building in this walkthrough

Architecture Overview

End-to-end request flow (from login to DynamoDB write)

Tenant isolation patterns (and which one I am using)

1) Silo model (strongest isolation)

2) Pool model (shared infrastructure, tenant-aware app logic)

3) Bridge model (hybrid)

What I am using here

Single-table design mindset for multi-tenant SaaS

Access patterns I need

Partition key design (tenant-aware and access-pattern-first)

A common mistake

My design approach

Example item model (single table)

Example items

Tenant metadata

Project

Ticket (project-scoped partition)

Comment (ticket-scoped partition)

Why this layout works

GSIs for access patterns (and how I decide them)

GSI1: List tickets by project + status

GSI2: List tickets assigned to a user

Optional GSI3: Lookup by public ticket reference (if needed)

Hot partition mitigation (the part many examples skip)

1) Avoid concentrating all writes into one tenant root partition

2) Use write sharding for high-volume list/index patterns

3) Separate write-heavy entities from read-heavy aggregations

4) Use on-demand capacity early, then tune if needed

5) Watch metrics by access pattern, not just table-wide

Auth context propagation (Cognito/JWT -> API Gateway -> Lambda)

What I want from auth context

Cognito claim strategy

Lambda Authorizer example (TypeScript)

Why I normalize here instead of in every Lambda

Business Lambda: reading auth context and enforcing tenancy

Isolation note

Querying with GSIs (list tickets by project + status)

Tradeoff discussion

Per-tenant throttling and quotas (API Gateway + authorizer metering)

What I want

Why API Gateway usage plans help

Important implementation note

CDK sketch (TypeScript)

How I usually operationalize this

Authorization checks beyond JWT validation (critical for tenant safety)

Example guard pattern

End-to-end implementation walkthrough

Step 1: Define your access patterns first

Step 2: Set up Cognito and define tenant claims

Step 3: Build the Lambda authorizer

Step 4: Create DynamoDB single table + GSIs

Step 5: Implement tenant-safe handlers

Step 6: Add per-tenant usage plans

Step 7: Test with real tokens and tenant boundary tests

Example request/response flow (local testing mindset)

Create ticket request

Expected response

List tickets by status

Common mistakes I see in this architecture

1) Trusting tenantId from the client

2) Designing the table like a relational schema

3) Overusing one tenant root partition key

4) Adding too many GSIs early

5) Treating JWT validation as full authorization

6) Ignoring quota/throttling strategy until later

When I would extend this design

Final thoughts

References

1) Trusting `tenantId` from the client