Posted on Jan 7

The Missing Link: Triggering Serverless Events from Legacy Databases with AWS DMS

#aws #serverless #architecture #dms

We live in a world where we want everything to be Event-Driven. We want a new user registration in our SQL database to immediately trigger a welcome email via SES, update a CRM via API, and start a Step Function workflow.

If you are building greenfield on DynamoDB, this is easy (DynamoDB Streams). But what if your data lives in a legacy MySQL monolith, an on-premise Oracle DB, or a standard PostgreSQL instance?

You need Change Data Capture (CDC). You need to stream these changes to the cloud.

Naturally, you look at AWS DMS (Database Migration Service). It’s perfect for moving data. But then you hit the wall:

The Problem: AWS DMS cannot target an AWS Lambda function directly.

You cannot simply configure a task to say "When a row is inserted in Table X, invoke Function Y".

So, how do we bridge the gap between the "Old World" (SQL) and the "New World" (Serverless)? We need a glue service. While many suggest Kinesis, often the most robust and cost-effective answer is Amazon S3.

Here is the architecture pattern I use to modernize legacy backends without rewriting them.

The Architecture: The "S3 Drop" Pattern

The flow works like this:

Source: DMS connects to your Legacy Database and captures changes (INSERT/UPDATE/DELETE) via the transaction logs.
Target: DMS writes these changes as JSON files into an S3 Bucket.
Trigger: S3 detects the new file and fires an Event Notification.
Compute: Your Lambda Function receives the event, reads the file, and processes the business logic.

Why S3 instead of Kinesis or Airbyte?

You might face two common alternatives when designing this:

1. Why not Kinesis Data Streams?
Kinesis is faster (sub-second latency). However, S3 is often superior for this specific use case because:

Cost: S3 is incredibly cheap compared to a provisioned Kinesis stream (especially if the legacy DB is quiet).
Observability: You can literally see the changes as files in your bucket. It makes debugging legacy data migration 10x easier.
Batching: DMS writes to S3 in batches. This naturally prevents your Lambda from being overwhelmed if the database takes a massive write hit.

2. Why not Airbyte or Fivetran?
Tools like Airbyte are fantastic for ELT (Extract Load Transform) pipelines, typically moving data to a warehouse like Snowflake every 15 or 60 minutes.
However, our goal here is Event-Driven capability. We want to trigger a Lambda function as close to "real-time" as possible. AWS DMS offers continuous replication (CDC), giving us a granular stream of events that batch-based ELT tools often miss. Furthermore, staying 100% AWS native simplifies IAM governance in strict enterprise environments.

The Implementation Guide

Here are the specific settings you need to make this work smoothly.

1. The DMS Endpoint Settings

When creating your Target Endpoint in DMS (pointing to S3), don't just use the defaults. You want the data to be readable by your Lambda.

Use these Extra Connection Attributes:

dataFormat=json;
datePartitionEnabled=true;

dataFormat=json: By default, DMS might use CSV. JSON is much easier for your Lambda to parse using standard libraries.

datePartitionEnabled=true: This organizes your files in S3 by date (/2023/11/02/...), preventing a single folder from containing millions of files.

2. Understanding the Event Structure

When DMS writes a file, it looks like this inside:

{
    "data": { "id": 101, "username": "jdoe", "status": "active" },
    "metadata": { "operation": "insert", "timestamp": "2023-11-02T10:00:00Z" }
}
{
    "data": { "id": 102, "username": "asmith", "status": "pending" },
    "metadata": { "operation": "update", "timestamp": "2023-11-02T10:05:00Z" }
}

You get the Operation (was it an insert or an update?) and the Data in one clean package.

3. The Lambda Logic

AWS DMS does not write a valid JSON array (e.g., [{...}, {...}]). It writes Line-Delimited JSON (NDJSON).

If you try to json.loads() the entire file content at once, your code will crash. You must iterate line-by-line.

Here is the Python boilerplate to handle this correctly:

import boto3
import json

s3 = boto3.client('s3')

def handler(event, context):
    # 1. Get the S3 Key from the event trigger
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key']

        print(f"Processing file: {key}")

        # 2. Fetch the actual file generated by DMS
        obj = s3.get_object(Bucket=bucket, Key=key)
        content = obj['Body'].read().decode('utf-8')

        # 3. Parse DMS JSON (NDJSON / Line-Delimited)
        # CRITICAL: Do not use json.loads(content) directly!
        for line in content.splitlines():
            if not line.strip(): continue # Skip empty lines

            row = json.loads(line)

            # 4. Filter: Only process what you need
            operation = row.get('metadata', {}).get('operation')

            if operation == 'insert':
                user_data = row.get('data')
                print(f"New User Detected: {user_data['username']}")
                # trigger_welcome_email(user_data)

            elif operation == 'update':
                print(f"User Updated: {row['data']['id']}")

Summary

You don't need to refactor your entire legacy database to get the benefits of Serverless.

By using AWS DMS to unlock the data and S3 as a reliable buffer, you can trigger modern Lambda workflows from 20-year-old databases with minimal friction. It is a pattern that prioritizes stability and observability over raw speed—a trade-off that is usually worth it in enterprise migrations.