Dixit R Jain for AWS Community Builders

Posted on Jan 27

Persistence Made Easy: AWS DynamoDB Deep Dive for Serverless Applications

#dynamodb #aws #serverless #cdk

Introduction
Why DynamoDB for Serverless?
Table Design Fundamentals
Setting Up the Stack
CRUD Operations with AWS SDK v3
Global Secondary Indexes
Query vs Scan
Gotchas & Common Pitfalls
Best Practices
Cost Considerations
Conclusion
References

Introduction

Quick Links:

📂 Source Code: GitHub Repository

🚀 Live Demo: https://dmcechq7isaw7.cloudfront.net/

Remember that contact form API we built in Part 2? It worked great, but there was a sneaky problem – our data vanished every time Lambda recycled!

Picture this nightmare: A Fortune 500 company submits a partnership inquiry through your contact form. You celebrate. But when you check for the lead later... nothing. The Lambda function recycled, and that million-dollar opportunity evaporated into thin air.

Not ideal.

Today, we're fixing this by adding DynamoDB – a fully managed, serverless NoSQL database that scales from zero to millions of requests per second. Your leads will persist forever (or until you delete them).

In this third part of our AWS Serverless Web Mastery series, you'll learn:

How to design DynamoDB tables for serverless applications
CRUD operations using AWS SDK v3
Global Secondary Indexes (GSIs) for flexible queries
Production patterns for error handling and performance

Let's make your data bulletproof!

Why DynamoDB for Serverless?

Perfect Serverless Companion

Feature	Benefit
Serverless	No servers to manage, automatic scaling
Pay-per-request	$0 when idle, perfect for variable workloads
Single-digit millisecond latency	Fast responses regardless of scale
Built-in security	Encryption at rest and in transit
Event-driven	DynamoDB Streams trigger Lambda functions

The Competition

Database	Serverless?	Cold Start Issue?	Scaling
DynamoDB	✅ Yes	❌ None	Instant
RDS	❌ No	N/A	Minutes
Aurora Serverless v2	✅ Yes	Minutes	Seconds
MongoDB Atlas	⚠️ Partial	Minutes	Minutes

For serverless applications, DynamoDB is the clear winner.

Table Design Fundamentals

Think Access Patterns, Not Entities

Unlike relational databases where you normalize data, DynamoDB is designed around access patterns. Before creating your table, ask:

How will I retrieve data?
What queries do I need?
What filters will I apply?

Our Access Patterns

For our Contact Form, we need:

Access Pattern	Solution
Get lead by ID	Primary key lookup
List all leads	Scan (acceptable for admin)
Find leads by email	GSI on email
Filter by status	GSI on status

Table Design

┌─────────────────────────────────────────────────────────────┐
│                    ContactFormLeads Table                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Primary Key: leadId (String)                               │
│                                                             │
│  Attributes:                                                │
│  ├── name (String)                                          │
│  ├── email (String)                                         │
│  ├── company (String, optional)                             │
│  ├── subject (String)                                       │
│  ├── message (String)                                       │
│  ├── status (String: new/contacted/qualified/converted)     │
│  ├── createdAt (String, ISO timestamp)                      │
│  └── updatedAt (String, ISO timestamp)                      │
│                                                             │
│  GSI: email-index                                           │
│  ├── Partition Key: email                                   │
│  └── Sort Key: createdAt                                    │
│                                                             │
│  GSI: status-index                                          │
│  ├── Partition Key: status                                  │
│  └── Sort Key: createdAt                                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Why This Design?

leadId as partition key: Unique IDs ensure even distribution and fast lookups
No sort key on main table: Simple key structure for CRUD operations
GSIs for query patterns: Enable efficient queries without scanning

Setting Up the Stack

CDK Infrastructure

Create lib/dynamodb-stack.ts:

import * as cdk from "aws-cdk-lib";
import * as dynamodb from "aws-cdk-lib/aws-dynamodb";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as lambdaNodejs from "aws-cdk-lib/aws-lambda-nodejs";
import * as apigateway from "aws-cdk-lib/aws-apigateway";
import { Construct } from "constructs";

export class DynamoDbStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // ============================================
    // DynamoDB Table
    // ============================================
    const leadsTable = new dynamodb.Table(this, "LeadsTable", {
      tableName: "ContactFormLeads",

      // Partition key only - simple design for CRUD
      partitionKey: {
        name: "leadId",
        type: dynamodb.AttributeType.STRING,
      },

      // On-demand = auto-scaling, pay-per-request
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,

      // Point-in-time recovery for data protection
      pointInTimeRecovery: true,

      // Encryption at rest
      encryption: dynamodb.TableEncryption.AWS_MANAGED,

      // DESTROY for dev (RETAIN for production!)
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // GSI for querying by email
    leadsTable.addGlobalSecondaryIndex({
      indexName: "email-index",
      partitionKey: {
        name: "email",
        type: dynamodb.AttributeType.STRING,
      },
      sortKey: {
        name: "createdAt",
        type: dynamodb.AttributeType.STRING,
      },
      projectionType: dynamodb.ProjectionType.ALL,
    });

    // GSI for filtering by status
    leadsTable.addGlobalSecondaryIndex({
      indexName: "status-index",
      partitionKey: {
        name: "status",
        type: dynamodb.AttributeType.STRING,
      },
      sortKey: {
        name: "createdAt",
        type: dynamodb.AttributeType.STRING,
      },
      projectionType: dynamodb.ProjectionType.ALL,
    });

    // Lambda Function with DynamoDB access
    const leadsHandler = new lambdaNodejs.NodejsFunction(this, "LeadsHandler", {
      runtime: lambda.Runtime.NODEJS_22_X,
      entry: path.join(__dirname, "../lambda/handlers/leads.ts"),
      handler: "handler",
      description: "Leads CRUD operations with DynamoDB",
      timeout: cdk.Duration.seconds(30),
      memorySize: 256,
      logRetention: logs.RetentionDays.ONE_WEEK,
      environment: {
        TABLE_NAME: leadsTable.tableName,
        EMAIL_INDEX: "email-index",
        STATUS_INDEX: "status-index",
      },
      bundling: {
        minify: true,
        sourceMap: true,
      },
    });

    // Grant read/write permissions
    leadsTable.grantReadWriteData(leadsHandler);

    // ... API Gateway setup (same as Part 2)
  }
}

Key Points

billingMode: PAY_PER_REQUEST: Scales automatically, no capacity planning
grantReadWriteData(): CDK automatically creates the IAM policy
Environment variables: Pass table name to Lambda

CRUD Operations with AWS SDK v3

Setting Up the Client

import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
import {
  PutItemCommand,
  GetItemCommand,
  UpdateItemCommand,
  DeleteItemCommand,
  QueryCommand,
  ScanCommand,
} from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";

const dynamoClient = new DynamoDBClient({});
const TABLE_NAME = process.env.TABLE_NAME!;

Create (PutItem)

async function createLead(leadData: CreateLeadRequest): Promise<Lead> {
  const lead: Lead = {
    leadId: `lead_${Date.now()}_${Math.random().toString(36).slice(2)}`,
    name: leadData.name.trim(),
    email: leadData.email.toLowerCase().trim(),
    company: leadData.company?.trim(),
    subject: leadData.subject,
    message: leadData.message.trim(),
    status: "new",
    createdAt: new Date().toISOString(),
    updatedAt: new Date().toISOString(),
  };

  const command = new PutItemCommand({
    TableName: TABLE_NAME,
    Item: marshall(lead, { removeUndefinedValues: true }),
    // Prevent accidental overwrites
    ConditionExpression: "attribute_not_exists(leadId)",
  });

  await dynamoClient.send(command);
  return lead;
}

Key Points:

marshall() converts JS objects to DynamoDB format
removeUndefinedValues: true handles optional fields
ConditionExpression prevents duplicate IDs (though unlikely)

Read (GetItem)

async function getLeadById(leadId: string): Promise<Lead | null> {
  const command = new GetItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ leadId }),
  });

  const response = await dynamoClient.send(command);

  if (!response.Item) {
    return null;
  }

  return unmarshall(response.Item) as Lead;
}

Key Points:

GetItem is extremely fast (single-digit milliseconds)
unmarshall() converts DynamoDB format back to JS

Update (UpdateItem)

async function updateLead(
  leadId: string,
  updates: UpdateLeadRequest,
): Promise<Lead | null> {
  // Build update expression dynamically
  const updateExpressions: string[] = [];
  const expressionAttributeNames: Record<string, string> = {};
  const expressionAttributeValues: Record<string, any> = {};

  // Always update timestamp
  updateExpressions.push("#updatedAt = :updatedAt");
  expressionAttributeNames["#updatedAt"] = "updatedAt";
  expressionAttributeValues[":updatedAt"] = new Date().toISOString();

  // Add fields that have values
  if (updates.status !== undefined) {
    updateExpressions.push("#status = :status");
    expressionAttributeNames["#status"] = "status";
    expressionAttributeValues[":status"] = updates.status;
  }

  // ... add other fields similarly

  const command = new UpdateItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ leadId }),
    UpdateExpression: `SET ${updateExpressions.join(", ")}`,
    ExpressionAttributeNames: expressionAttributeNames,
    ExpressionAttributeValues: marshall(expressionAttributeValues),
    ConditionExpression: "attribute_exists(leadId)",
    ReturnValues: "ALL_NEW",
  });

  try {
    const response = await dynamoClient.send(command);
    return unmarshall(response.Attributes!) as Lead;
  } catch (error: any) {
    if (error.name === "ConditionalCheckFailedException") {
      return null; // Item doesn't exist
    }
    throw error;
  }
}

Key Points:

Use expression attribute names (#status) for reserved words
Dynamic expressions allow partial updates
ReturnValues: 'ALL_NEW' returns the updated item

Delete (DeleteItem)

async function deleteLead(leadId: string): Promise<boolean> {
  const command = new DeleteItemCommand({
    TableName: TABLE_NAME,
    Key: marshall({ leadId }),
    ConditionExpression: "attribute_exists(leadId)",
  });

  try {
    await dynamoClient.send(command);
    return true;
  } catch (error: any) {
    if (error.name === "ConditionalCheckFailedException") {
      return false; // Item doesn't exist
    }
    throw error;
  }
}

Global Secondary Indexes

Query by Email (GSI)

async function getLeadsByEmail(email: string): Promise<Lead[]> {
  const command = new QueryCommand({
    TableName: TABLE_NAME,
    IndexName: "email-index",
    KeyConditionExpression: "email = :email",
    ExpressionAttributeValues: marshall({
      ":email": email.toLowerCase(),
    }),
    // Newest first
    ScanIndexForward: false,
  });

  const response = await dynamoClient.send(command);
  return (response.Items || []).map((item) => unmarshall(item) as Lead);
}

Filter by Status (GSI)

async function getLeadsByStatus(status: string): Promise<Lead[]> {
  const command = new QueryCommand({
    TableName: TABLE_NAME,
    IndexName: "status-index",
    KeyConditionExpression: "#status = :status",
    ExpressionAttributeNames: {
      "#status": "status", // 'status' is a reserved word
    },
    ExpressionAttributeValues: marshall({
      ":status": status,
    }),
    ScanIndexForward: false,
  });

  const response = await dynamoClient.send(command);
  return (response.Items || []).map((item) => unmarshall(item) as Lead);
}

Query vs Scan

The Critical Difference

Operation	Performance	Use Case
Query	O(n) - reads matched items only	Specific key lookups
Scan	O(N) - reads ENTIRE table	Full table access

Example: Finding "new" leads

❌ Slow: Scan with Filter

// This reads EVERY item in the table!
const command = new ScanCommand({
  TableName: TABLE_NAME,
  FilterExpression: "#status = :status",
  ExpressionAttributeNames: { "#status": "status" },
  ExpressionAttributeValues: marshall({ ":status": "new" }),
});

✅ Fast: Query with GSI

// This reads only matching items
const command = new QueryCommand({
  TableName: TABLE_NAME,
  IndexName: "status-index",
  KeyConditionExpression: "#status = :status",
  ExpressionAttributeNames: { "#status": "status" },
  ExpressionAttributeValues: marshall({ ":status": "new" }),
});

The difference at scale:

1 million leads, 100 are "new"
Scan: Reads 1,000,000 items → $$$, slow
Query: Reads 100 items → Pennies, fast

Gotchas & Common Pitfalls

1. Reserved Words

Problem: status, name, data are reserved words.

// ❌ This fails
UpdateExpression: 'SET status = :status'

// ✅ Use expression attribute names
UpdateExpression: 'SET #status = :status',
ExpressionAttributeNames: { '#status': 'status' }

2. Empty Strings

Problem: DynamoDB doesn't allow empty strings in certain contexts.

// ❌ This fails for empty company
Item: {
  company: "";
}

// ✅ Use removeUndefinedValues and check for empty
marshall(lead, { removeUndefinedValues: true });

// Or explicitly handle
company: leadData.company?.trim() || undefined;

3. Conditional Check Failures

Problem: ConditionExpression throws on failure.

try {
  await dynamoClient.send(command);
} catch (error: any) {
  if (error.name === "ConditionalCheckFailedException") {
    // Handle gracefully (item doesn't exist or condition failed)
    return null;
  }
  throw error;
}

4. Large Items

Problem: DynamoDB has a 400KB item size limit.

Solution: Store large content (like file attachments) in S3, store the S3 URL in DynamoDB.

Best Practices

1. Use On-Demand for Variable Workloads

billingMode: dynamodb.BillingMode.PAY_PER_REQUEST;

No capacity planning, automatic scaling, pay only for what you use.

2. Enable Point-in-Time Recovery

pointInTimeRecovery: true;

Protects against accidental deletes. Can restore to any point in the last 35 days.

3. Use Batch Operations for Bulk Actions

For multiple items, use BatchGetItem and BatchWriteItem:

import { BatchWriteCommand } from "@aws-sdk/lib-dynamodb";

// Write up to 25 items at once
const command = new BatchWriteCommand({
  RequestItems: {
    [TABLE_NAME]: leads.map((lead) => ({
      PutRequest: { Item: lead },
    })),
  },
});

Check out my DynamoDB Batch Operations blog for detailed patterns.

4. Design for Even Distribution

Avoid "hot" partitions by choosing well-distributed partition keys:

// ✅ Good - even distribution
partitionKey: { name: 'leadId', type: STRING }

// ❌ Bad - hot partitions (if most leads are 'new')
partitionKey: { name: 'status', type: STRING }

Cost Considerations

Pricing Model

Operation	On-Demand Cost
Write (1KB)	$0.625 per million
Read (4KB)	$0.125 per million
Storage	$0.25 per GB/month
GSI Writes	Same as table
GSI Reads	Same as table

Real-World Example

For a contact form with:

1,000 leads/month (writes)
10,000 reads/month (admin dashboard)
1GB storage

Monthly Cost:

Writes: 0.001M × $0.625 = $0.000625
Reads: 0.01M × $0.125 = $0.00125
Storage: 1GB × $0.25 = $0.25
Total: ~$0.25/month (essentially free!)

Conclusion

Your contact form now has a rock-solid persistence layer! 🎉

In this part, you learned:

✅ DynamoDB table design for serverless
✅ Global Secondary Indexes for flexible queries
✅ CRUD operations with AWS SDK v3
✅ Query vs Scan performance implications
✅ Production-ready error handling

Your leads are now safely stored and can be queried efficiently. No more data disappearing when Lambda recycles!

In Part 4, we'll bring everything together – frontend, API, and database – into a complete, production-ready application with:

Full frontend-backend integration
Environment configuration
Monitoring and alerting
Production deployment checklist

Related Posts:

GitHub Repository: aws-serverless-website-tutorial

See you until next time. Happy coding! 🚀

Table of Contents

Introduction

Why DynamoDB for Serverless?

Perfect Serverless Companion

The Competition

Table Design Fundamentals

Think Access Patterns, Not Entities

Our Access Patterns

Table Design

Why This Design?

Setting Up the Stack

CDK Infrastructure

Key Points

CRUD Operations with AWS SDK v3

Setting Up the Client

Create (PutItem)

Read (GetItem)

Update (UpdateItem)

Delete (DeleteItem)

Global Secondary Indexes

Query by Email (GSI)

Filter by Status (GSI)

Query vs Scan

The Critical Difference

Example: Finding "new" leads

Gotchas & Common Pitfalls

1. Reserved Words

2. Empty Strings

3. Conditional Check Failures

4. Large Items

Best Practices

1. Use On-Demand for Variable Workloads

2. Enable Point-in-Time Recovery

3. Use Batch Operations for Bulk Actions

4. Design for Even Distribution

Cost Considerations

Pricing Model

Real-World Example

Conclusion

References