Marcos Henrique for AWS Community Builders

Posted on Mar 24

DynamoDB Monitoring with cdk-monitoring-constructs

#cdk #aws #typescript

Why Monitoring DynamoDB Matters

Hey cloud comrades 👋
Let’s talk about DynamoDB, AWS’s beloved “hold my beer” service that scales like a caffeinated squirrel on a treadmill. Sure, it’s a beast—until you forget to monitor it, and suddenly, your production database pulls a Houdini and vanishes into the void.

Think of DynamoDB like that one friend who’s chill until they’re hangry.
No metrics? No alarms? Bold move. It’s like assuming your pet velociraptor won’t eat your couch because it seemed fine yesterday😅

Newsflash: Tables throttle, latency spikes, and your servers melt into a puddle of regret faster than you can say, “Why is the billing alert screaming?” 😱

Oh, the 3 AM call. The developer’s version of a jump scare—no thanks! Let’s yeet that chaos into oblivion with some CDK Monitoring Constructs, the unsung hero of “why didn’t we set this up sooner” moments.

Imagine this library as your DynamoDB’s pit crew, armed with dashboards, alarms, and enough graphs to make a CFO weep with joy.
No more guessing games like, “Is my table vibing or dying?” Let’s get tactical.

The problem with manual CloudWatch setup

Oh, the manual CloudWatch alarm grind—it’s like assembling IKEA furniture with missing screws, let's ack the pain points:

Setting up proper CloudWatch alarms manually is:

Time-consuming
Error-prone
Hard to maintain consistently across environments
A chore to keep in sync with your infrastructure changes

I've seen teams spend days crafting the "perfect" CloudWatch dashboards and alarms, only to have them become outdated within weeks as the application evolves. There has to be a better way.

Enter CDK Monitoring Constructs

The CDK Monitoring Constructs library is an absolute game-changer. It provides high-level constructs that let you define monitoring for your AWS resources using code—the same way you define the resources themselves with CDK.

What I love about this approach is that your monitoring configuration lives with your infrastructure code, evolves with it, and can be version-controlled, reviewed, and tested just like any other code.

Show me the code!

Let's get straight to a practical example. Here's how you can set up comprehensive monitoring for a DynamoDB table:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as monitoring from 'cdk-monitoring-constructs';

export class DynamodbMonitoringStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // Create a DynamoDB table
    const table = new dynamodb.Table(this, 'MonitoredTable', {
      partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

    // Create a monitoring facade
    const monitoringFacade = new monitoring.MonitoringFacade(this, 'MonitoringFacade', {
      alarmFactoryDefaults: {
        alarmNamePrefix: 'DynamoDBMonitoring-',
        actionsEnabled: true,
      },
    });

    // Add DynamoDB monitoring
    monitoringFacade.addDynamoTableMonitoring({
      table,
      // Customize alarms with specific thresholds
      addSystemErrorsAlarm: {
        Warning: {
          maxErrorCount: 5,
          evaluationPeriods: 3,
        },
        Critical: {
          maxErrorCount: 20,
          evaluationPeriods: 2,
        },
      },
      addThrottledRequestsAlarm: {
        Warning: {
          maxThrottledRequestsCount: 10,
          evaluationPeriods: 3,
        },
      },
      addUserErrorsAlarm: {
        Warning: {
          maxErrorCount: 5,
          evaluationPeriods: 3,
        },
      },
      addLatencyAlarm: {
        Warning: {
          p90: { maxLatency: cdk.Duration.milliseconds(500) },
          p99: { maxLatency: cdk.Duration.seconds(1) },
        },
        Critical: {
          p99: { maxLatency: cdk.Duration.seconds(2) },
        },
      },
    });
  }
}

That's it! With these few lines of code, you get:

Dashboards that actually make sense
Multi-level alarms (Warning/Critical) for key metrics
Proper thresholds based on industry best practices
Consistency across all your environments

Breaking Down the Monitoring Configuration

Let's dig into what makes this implementation powerful:

1. System Errors Monitoring

addSystemErrorsAlarm: {
  Warning: {
    maxErrorCount: 5,
    evaluationPeriods: 3,
  },
  Critical: {
    maxErrorCount: 20,
    evaluationPeriods: 2,
  },
}

This catches internal DynamoDB service errors that might affect your application. I've found that having a low threshold for warnings helps catch issues early, while the critical alarm ensures you're alerted immediately for serious problems.

2. Throttled Requests Tracking

addThrottledRequestsAlarm: {
  Warning: {
    maxThrottledRequestsCount: 10,
    evaluationPeriods: 3,
  },
}

Throttling is often the first sign that your table configuration doesn't match your usage patterns. This alarm gives you early warning before it significantly impacts your users.

3. Performance Monitoring

addLatencyAlarm: {
  Warning: {
    p90: { maxLatency: cdk.Duration.milliseconds(500) },
    p99: { maxLatency: cdk.Duration.seconds(1) },
  },
  Critical: {
    p99: { maxLatency: cdk.Duration.seconds(2) },
  },
}

I've learned the hard way that average latency can hide serious problems. That's why I monitor p90 and p99 latencies separately—they tell you how your worst-case scenarios are performing.

Beyond the Basics

Once you have the foundation in place, consider these enhancements:

Add SNS Notifications

Connect your alarms to an SNS topic to get immediate notifications:

const alertTopic = new sns.Topic(this, 'AlertTopic');
monitoring_facade.addAlarmActions({
  warning: [new cw_actions.SnsAction(alertTopic)],
  critical: [new cw_actions.SnsAction(alertTopic)]
});

Monitor Cost Metrics

For pay-per-request tables, you might want to track consumed capacity to avoid bill shock:

monitoring_facade.addDynamoTableMonitoring({
  table,
  addConsumedCapacityAlarm: {
    Warning: {
      maxConsumedCapacity: 80,  // 80% of provisioned capacity or a specific RCU amount
    }
  }
});

Real-world Lessons

In production environments, I've found these monitoring practices to be game-changers:

Set progressive thresholds - Start with conservative values and adjust based on your application's normal behavior.
Don't overlook user errors - A sudden spike in user errors often indicates a client-side code issue that was just deployed.
Monitor read/write distribution - Uneven distribution across partitions can cause hot spots even when your total capacity seems adequate.
Track GSI metrics separately - Global Secondary Indexes can have performance characteristics very different from your base table.

From Reactive to Proactive

Implementing proper monitoring isn't just about avoiding outages—it's about changing your team's mindset from reactive firefighting to proactive optimization.

With the monitoring setup we've explored today, you'll be able to:

Catch issues before they impact users
Make data-driven decisions about capacity planning
Justify infrastructure investments with concrete metrics
Sleep better at night (seriously!)

The beauty of using CDK Monitoring Constructs is that this robust setup takes minutes, not days, to implement. And since it's code, you can continually refine and improve it as your application evolves.

What monitoring strategies have worked for your DynamoDB workloads? I'd love to hear your experiences in the comments below!

Happy building xD