DEV Community

Tyson Cung
Tyson Cung

Posted on

How I Built a Production-Ready Serverless NestJS API on AWS (and Open-Sourced It)

I've built the same serverless API setup at least five times now.

I work with AWS and TypeScript daily. Every new service starts the same way: NestJS app, stick it on Lambda, wire up API Gateway, add auth, add a database, add a queue for background jobs, configure environments, set up monitoring.

After the third time copy-pasting infrastructure code between repos, I decided to build a proper starter kit. I open-sourced the core as serverless-nestjs-starter, and built a Pro version with everything I actually need in production.

Here's what I learned along the way.

The Problem With Existing Starters

Most NestJS + Lambda examples I found online had the same issues:

  • They use Express (slower cold starts)
  • They use @vendia/serverless-express (heavier adapter)
  • No real infrastructure code - just a handler file and "deploy it yourself"
  • Single auth mode hardcoded in
  • No database, no queues, no observability
  • No multi-environment support

I wanted something I could clone, change a JSON config file, run cdk deploy, and have a production-ready API running in minutes.

Architecture

Route 53 (custom domain)
    │
WAF WebACL (rate limiting, OWASP rules)
    │
API Gateway (IAM / Cognito / API Key / VPC-only)
    │
├── Server Lambda (NestJS + Fastify)
│       │
│       ├── DynamoDB (single-table design)
│       └── SQS ──► Worker Lambda
│
└── CloudWatch Dashboard
Enter fullscreen mode Exit fullscreen mode

Two Lambdas. One handles HTTP requests through NestJS. The other consumes SQS messages for background work. Both deploy through CDK with a single command.

Why Fastify Over Express

NestJS supports both Express and Fastify as HTTP adapters. I went with Fastify for two reasons:

  1. Speed. Fastify handles roughly 2x the requests per second compared to Express in benchmarks. On Lambda, faster execution = lower cost.

  2. Lighter Lambda adapter. I use aws-lambda-fastify instead of @vendia/serverless-express. It's a thinner layer that translates API Gateway events directly to Fastify requests without the overhead of emulating a full HTTP server.

The adapter choice matters more than you'd think. @vendia/serverless-express creates an in-memory socket to pipe requests through Express. aws-lambda-fastify skips that and injects the request directly into Fastify's routing layer.

The Cached Server Pattern

Cold starts are the #1 complaint about Lambda. Here's how I minimize them:

import { NestFactory } from '@nestjs/core';
import { FastifyAdapter, NestFastifyApplication } from '@nestjs/platform-fastify';
import awsLambdaFastify from 'aws-lambda-fastify';
import type { APIGatewayProxyEvent, APIGatewayProxyResult, Context } from 'aws-lambda';

let cachedProxy: ReturnType<typeof awsLambdaFastify> | undefined;

const lambdaHandler = async (
  event: APIGatewayProxyEvent,
  context: Context,
): Promise<APIGatewayProxyResult> => {
  if (!cachedProxy) {
    const fastifyAdapter = new FastifyAdapter();
    const nestApp = await NestFactory.create<NestFastifyApplication>(
      AppModule, fastifyAdapter, { logger: false }
    );

    nestApp.enableCors();
    nestApp.useGlobalPipes(new ValidationPipe({
      transform: true,
      whitelist: true,
      forbidNonWhitelisted: true,
    }));

    await nestApp.init();
    await nestApp.getHttpAdapter().getInstance().ready();

    cachedProxy = awsLambdaFastify(nestApp.getHttpAdapter().getInstance());
  }

  return cachedProxy(event, context);
};
Enter fullscreen mode Exit fullscreen mode

The key line is let cachedProxy. Lambda keeps the execution environment alive between invocations (usually 5-15 minutes). On the first request, we bootstrap the full NestJS app. On subsequent requests, we skip straight to handling the event.

Cold start with this setup: around 1-2 seconds (depends on your module count and Lambda memory). Warm invocations: 5-20ms.

For production workloads where even 1-2 seconds is too much, the Pro version includes provisioned concurrency with auto-scaling:

if (config.lambda.provisionedConcurrency > 0) {
  const alias = new lambda.Alias(this, resourceName('alias'), {
    aliasName: 'live',
    version: serverLambda.currentVersion,
    provisionedConcurrentExecutions: config.lambda.provisionedConcurrency,
  });

  const scalingTarget = new autoscaling.ScalableTarget(this, resourceName('scaling-target'), {
    serviceNamespace: autoscaling.ServiceNamespace.LAMBDA,
    minCapacity: config.lambda.provisionedConcurrency,
    maxCapacity: config.lambda.provisionedConcurrency * 5,
    resourceId: `function:${serverLambda.functionName}:${alias.aliasName}`,
    scalableDimension: 'lambda:function:ProvisionedConcurrency',
  });

  scalingTarget.scaleToTrackMetric(resourceName('scaling-policy'), {
    targetValue: 0.7,
    predefinedMetric: autoscaling.PredefinedMetric.LAMBDA_PROVISIONED_CONCURRENCY_UTILIZATION,
  });
}
Enter fullscreen mode Exit fullscreen mode

Set provisionedConcurrency: 2 in your config and cold starts disappear.

The 4 Security Modes

This is probably the feature I'm most proud of. Instead of hardcoding one auth strategy, the CDK stack reads a JSON config and wires up the right one:

{
  "security": {
    "mode": "iam",
    "waf": true,
    "ipAllowlist": [],
    "sourceAccountIds": "111111111111,222222222222"
  }
}
Enter fullscreen mode Exit fullscreen mode

Change mode to one of four options and redeploy. That's it.

IAM (default)

API Gateway uses AWS SigV4. Callers need IAM credentials. Perfect for service-to-service communication. Supports cross-account access by specifying sourceAccountIds.

Cognito

Automatically provisions a Cognito User Pool and wires up a JWT authorizer on API Gateway. On the NestJS side, I built a guard that verifies tokens and extracts user info:

@UseGuards(CognitoAuthGuard)
@Get('profile')
getProfile(@CurrentUser() user: CognitoUser) {
  return { userId: user.sub, email: user.email };
}
Enter fullscreen mode Exit fullscreen mode

The CognitoAuthGuard handles JWT verification with JWKS caching (1-hour TTL). No external libraries needed - it uses Node's built-in crypto module.

API Key

Creates an API key with a usage plan. Rate limit of 100 req/sec, burst limit of 200, daily quota of 10,000 requests. All configurable. Good for third-party integrations.

VPC-only

Private API Gateway endpoint. Zero internet access. Only reachable from within your VPC. For internal microservices that should never be public.

The CDK code that makes this work is a switch statement in app-stack.ts:

switch (config.security.mode) {
  case 'iam':
  case 'vpc-only':
    Object.assign(apiProps, {
      defaultMethodOptions: { authorizationType: apigw.AuthorizationType.IAM },
    });
    break;
  case 'api-key':
    Object.assign(apiProps, {
      defaultMethodOptions: { apiKeyRequired: true },
      apiKeySourceType: apigw.ApiKeySourceType.HEADER,
    });
    break;
  case 'cognito':
    // Authorizer added after API creation
    break;
}
Enter fullscreen mode Exit fullscreen mode

Simple, readable, no magic.

DynamoDB Single-Table Design

The Pro version includes a DynamoDB setup with the single-table pattern. A generic repository handles the key structure:

export interface EntityKeyConfig {
  entityType: string;
  gsi1Key?: (entity: Record<string, unknown>) => { GSI1PK: string; GSI1SK: string };
}

async create<T extends BaseEntity>(config: EntityKeyConfig, entity: T): Promise<T> {
  const item: Record<string, unknown> = {
    ...entity,
    PK: `${config.entityType}#${entity.id}`,
    SK: `${config.entityType}#${entity.id}`,
    EntityType: config.entityType,
  };

  if (config.gsi1Key) {
    const gsiKeys = config.gsi1Key(item);
    item.GSI1PK = gsiKeys.GSI1PK;
    item.GSI1SK = gsiKeys.GSI1SK;
  }

  await this.dynamoDbService.put(item);
  return entity;
}
Enter fullscreen mode Exit fullscreen mode

PK and SK follow the {ENTITY_TYPE}#{id} convention. A GSI lets you query by entity type (list all items of a kind). You define your access patterns through EntityKeyConfig - no ORM, no abstractions you'll fight against later.

SQS Worker Pattern

Background jobs go through SQS. The worker Lambda uses batch item failure reporting, which is a detail a lot of tutorials skip:

export const handler = async (event: SQSEvent): Promise<SQSBatchResponse> => {
  const batchItemFailures: SQSBatchItemFailure[] = [];

  for (const record of event.Records) {
    try {
      const message = JSON.parse(record.body) as QueueMessage;
      await processMessage(message);
    } catch (error) {
      batchItemFailures.push({ itemIdentifier: record.messageId });
    }
  }

  return { batchItemFailures };
};
Enter fullscreen mode Exit fullscreen mode

Why this matters: without reportBatchItemFailures, if one message in a batch of 10 fails, all 10 retry. With it, only the failed message retries. The CDK side enables this:

workerLambda.addEventSource(
  new SqsEventSource(sqsConstruct.queue, {
    batchSize: 10,
    maxBatchingWindow: cdk.Duration.seconds(5),
    reportBatchItemFailures: true,
  }),
);
Enter fullscreen mode Exit fullscreen mode

Messages that fail 3 times land in a dead-letter queue with 14-day retention.

Per-Environment Config

One thing I got tired of was managing environment variables across stages. The Pro version uses JSON config files:

infra/config/
  dev.json
  staging.json
  prod.json
Enter fullscreen mode Exit fullscreen mode

Dev config might look like:

{
  "security": { "mode": "iam", "waf": false },
  "lambda": { "memorySize": 1024, "provisionedConcurrency": 0 },
  "dynamodb": { "enabled": true, "billingMode": "PAY_PER_REQUEST" },
  "sqs": { "enabled": true, "visibilityTimeout": 60, "maxReceiveCount": 3 },
  "dashboard": { "enabled": true }
}
Enter fullscreen mode Exit fullscreen mode

Prod config enables WAF, bumps memory, adds provisioned concurrency. Staging mirrors prod but with smaller numbers. Deploy with npm run deploy:dev or npm run deploy:prod.

The CDK Stack

Everything comes together in app-stack.ts. It's one file, around 250 lines, and reads like a checklist:

  1. Create log group
  2. Optionally look up VPC
  3. Optionally create DynamoDB table
  4. Optionally create SQS queue + DLQ
  5. Optionally create Cognito User Pool
  6. Create server Lambda with environment variables
  7. Optionally set up provisioned concurrency
  8. Optionally create worker Lambda
  9. Create API Gateway with the right auth mode
  10. Optionally attach WAF
  11. Optionally configure custom domain
  12. Optionally create CloudWatch dashboard

Each "optionally" is driven by the JSON config. The constructs are modular - each in its own file under infra/constructs/.

Try It

The free version is on GitHub: serverless-nestjs-starter

It gives you NestJS + Fastify on Lambda with IAM auth, CDK, Powertools, and Swagger docs. Good enough for prototypes and simple internal APIs.

If you need the production stuff - security modes, DynamoDB, SQS, WAF, custom domains, provisioned concurrency, CloudWatch dashboards - the Pro version is $49 on Gumroad. One-time purchase, all future updates, unlimited projects.

I also have a DDD Microservices Starter Kit if you're building something bigger with domain-driven design.


Questions? Drop a comment or open an issue on the GitHub repo. Happy to help.

Top comments (3)

Collapse
 
rdarrylr profile image
Darryl Ruggles • Edited

Is the link to the free version missing?

Oh i see - the link near the top has it but the link after:

Try It The free version is on GitHub -> points to the paid version.

Collapse
 
tyson_cung profile image
Tyson Cung

My bad, thanks for pointing that out. I have updated it. Plea check again or click on this link: github.com/tysoncung/serverless-ne...

Collapse
 
tyson_cung profile image
Tyson Cung

Does it work for you?