Santanu Das

Posted on Jan 11 • Edited on Jan 25

BootStrapping Aurora RDS Databases using Lambda and Terraform (Part 2)

#lambda #aws #rds #devops

TL;DR
We move from theory to implementation by building a zero-dependency Python Lambda function. By leveraging the Aurora Data API, we eliminate the need for heavy SQL drivers, VPC-peering, or static credentials.

The Solution
- Strategic Benefits
Core Design Concepts
Aurora Data API
The Code Structure
- Lambda Files
- Components
Implementation & Code
- Terraform Layer
- Lambda Finction-Code
Day-Zero Readiness
- Verification
- Validation

In Part-1, we explored the inherent challenges of managing post-deployment database state and user privileges within a declarative IaC workflow. In this article, we shift from problem to implementation, detailing a solution that provides a robust, scalable alternative for platform teams.

The Solution: Decoupling via Purpose-Built Lambda

The core of this architecture is the establishment of a clear execution boundary. By offloading database mutation logic to an AWS Lambda function, we decouple the state of the infrastructure from the state of the database.

Terraform’s Responsibility: Operates as the orchestrator, provisioning the Aurora cluster and Lambda function while injecting necessary metadata (endpoints, ARNs, and engine types).
Lambda’s Responsibility: Acts as the execution engine, performing engine-specific logic (MySQL/PostgreSQL), enforcing role-based access control (RBAC), and managing IAM-based authentication.

Strategic Benefits

True Idempotency: Unlike complex Terraform providers that can drift, the Lambda logic is procedural and testable, allowing for safe re-runs without impacting infrastructure integrity.
Hardened Security: The design eliminates the need for static administrative passwords, leveraging AWS IAM Database Authentication for all bootstrap operations.
Architectural Purity: Terraform remains purely declarative, focused on infrastructure lifecycle, while the Lambda handles the "Day 1" platform bootstrapping as a discrete capability.

This deep-dive focuses on the structural internals of the Lambda—exploring the design decisions that ensure maintainability and engine parity while keeping the footprint minimal.

Core Design Concepts & Requirements

To ensure a seamless handoff between infrastructure and application, the bootstrap Lambda was built around the following core requirements:

Passwordless Database Operations: Eliminates the need for static credentials or secret rotation by utilizing native AWS authentication.
IAM/Token-Based Access: Leverages short-lived IAM database tokens for all administrative actions, ensuring a secure and auditable connection lifecycle.
Zero Developer Friction: The entire process is transparent to application teams; databases and roles are provisioned automatically without manual intervention.
Multi-Tenant Support: A single execution can manage multiple database schemas and user roles, mapping complex requirements to a unified configuration.
Day-Zero Readiness: Ensures that the database environment is fully initialized and accessible the moment the Aurora cluster enters the available state.

The Secret Sauce: Aurora Data API

One of the biggest hurdles in building a lightweight Lambda for database bootstrapping is the dependency on database drivers. Python’s standard library does not include native drivers for MySQL or PostgreSQL.

Typically, this forces platform engineers into a dependency trap, implementing either:

Lambda Layers: Adding overhead for versioning and binary compatibility.

Container Images: Introducing complex build pipelines and increasing the security blast radius.

We solved this by enabling the Aurora Data API. This design choice is the cornerstone of the Lambda's simplicity:

Pure AWS SDK: The Lambda interacts with the database via standard boto3 calls. If the Lambda environment has the AWS SDK, it has everything it needs to execute SQL.
No Drivers Required: By sending SQL commands over HTTPS via the Data API, we eliminated the need for psycopg2, pymysql, or any other external binaries.
Simplified Networking: The Data API handles the connection pooling and overhead, allowing the Lambda to remain "stateless" and extremely fast to cold-start.

By treating the database as an API endpoint rather than a traditional socket connection, we achieved a zero-dependency deployment artifact. This makes the bootstrap process more auditable, easier to maintain, and significantly more resilient to environment drift.

Implementing this modern connectivity model required only a single configuration change within the chlid-module:

# Enabling native IAM authentication within the Aurora cluster
iam_database_authentication_enabled = true

The Modular Code Structure

The Lambda’s internal structure is designed for clarity and extensibility. Rather than a monolithic script, the codebase is partitioned into functional layers. This ensures that adding support for a new engine or modifying authentication logic doesn't disrupt the core execution flow.

File Structure

The project follows a flat, purposeful directory layout:

./lambda
 ├── __init__.py
 ├── db_config.py       # Environment-to-Runtime mapping
 ├── db_helper.py       # Core SQL logic and engine-specific functions
 ├── mysql_installer.py # MySQL execution workflow
 └── pgsql_installer.py # PostgreSQL execution workflow

Component Responsibilities

db_config.py (The Configuration Layer): This module acts as the bridge between Terraform and Python. It ingest cluster-specific environment variables — such as cluster identifiers, engine types, and database names—and validates them before execution begins.
db_helper.py (The Logic Library): This is the "brain" of the operation. It contains the granular, engine-specific functions required to verify and mutate state. By centralizing functions like mysql_user_exists and pgsql_create_db here, we ensure that the logic is reusable and easy to unit test.
*_installer.py (The Workflow Orchestrators): These files represent the high-level workflows for each engine. An installer imports the necessary primitives from the helper and executes them in a logical sequence.
e.g. Check if DB exists → Create if missing → Assign privileges.

Although the bootstrap flow is shared, MySQL and PostgreSQL differ fundamentally in ownership, role semantics, and grant models — which is why installers are intentionally split.

The helper functions are designed to accept a generic execute_fn. This allows us to swap the underlying execution engine — such as switching from a standard driver to the Aurora Data API — without rewriting the core business logic.

The Blueprint: Implementation & Code

With the architectural boundaries defined, now we will look at the implementation. This section provides the Copy-Paste ready code to stand up the bootstrap utility.

IaC: The Terraform Layer

1️⃣ Every AWS Lambda function needs permission to interact with other AWS infrastructure resources within your account. These permissions are set via an AWS IAM Role that required to interact with the Aurora Data API. The creation of the execution role and the specific IAM policies are part of the root-module; the rule ARN then supplied to the lambda-function.

# ----------------------------------------------------------
# IAM role Lambda-function
# ----------------------------------------------------------
resource "aws_iam_role" "air_lfn" {
  for_each = (
    anytrue(values(local.service_enabled))
    ? toset([var.service_name])
    : []
  )
  name = "${local.template_name}-lambda-Role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
      Action = "sts:AssumeRole"
    }]
  })
}

# ----------------------------------------------------------
# Lambda role-policy document and attachment
# ----------------------------------------------------------
data "aws_iam_policy_document" "aipd_lfn" {
  for_each = (
    anytrue(values(local.service_enabled))
    ? toset([var.service_name])
    : []
  )

  statement {
    sid    = "RDSDataAPI"
    effect = "Allow"

    actions = [
      "rds-data:ExecuteStatement",
      "rds-data:BatchExecuteStatement"
    ]

    resources = ["*"]
  }

  statement {
    sid    = "DescribeCluster"
    effect = "Allow"

    actions = [
      "rds:DescribeDBClusters"
    ]

    resources = ["*"]
  }

  statement {
    sid    = "ReadDBSecret"
    effect = "Allow"

    actions = [
      "secretsmanager:GetSecretValue"
    ]

    resources = [
      for DB in keys(local.db_profiles) :
      module.db_cluster[DB].master_user_secret_arn
    ]
  }

  statement {
    sid    = "LambdaLogging"
    effect = "Allow"

    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]

    resources = ["arn:aws:logs:*:*:*"]
  }
}

resource "aws_iam_role_policy" "airp_lfn" {
  for_each = (
    anytrue(values(local.service_enabled))
    ? toset([var.service_name])
    : []
  )
  role   = aws_iam_role.air_lfn[each.value].id
  policy = data.aws_iam_policy_document.aipd_lfn[each.value].json
}

2️⃣ Next, we define AWS infrastructure resources, which I included as part of the child module. The following Terraform configuration automatically packages the Python source into a ZIP archive and provisions the Lambda function, along with its execution role and granular IAM policies.

# ---------------------------------------------------------
# Lambda function to manage BDs
# ---------------------------------------------------------
data "archive_file" "daf_lambda" {
  for_each = (
    var.enable_db_managment
    ? { (var.db_profile.engine_type) = true }
    : {}
  )
  type        = "zip"
  output_path = "${path.module}/${each.key}-lambda.zip"

  source {
    content  = file("${path.module}/lambda/__init__.py")
    filename = "__init__.py"
  }
  source {
    content  = file("${path.module}/lambda/db_config.py")
    filename = "db_config.py"
  }
  source {
    content  = file("${path.module}/lambda/db_helper.py")
    filename = "db_helper.py"
  }
  source {
    content  = file("${path.module}/lambda/${each.key}_installer.py")
    filename = "db_installer.py"
  }
}

resource "aws_lambda_function" "db_installer" {
  for_each = (
    var.enable_db_managment
    ? { (var.db_profile.engine_type) = true }
    : {}
  )
  function_name = "${var.name_prefix}-${each.key}-installer"
  role          = var.lambda_service_role  #<-- comes from root-module
  runtime       = "python3.12"
  handler       = "db_installer.handler"
  timeout       = 30
  memory_size   = 256

  filename         = data.archive_file.daf_lambda[each.key].output_path
  source_code_hash = data.archive_file.daf_lambda[each.key].output_base64sha256

  environment {    #<-- feeds db_config.py
    variables = {
      CLUSTER_ARN  = module.cluster.cluster_arn
      DATABASES    = jsonencode(var.db_profile.databases)
      DB_ENGINE    = var.db_profile.engine
      DB_USER_HOST = var.db_profile.dbs_user_host
      DB_USER_NAME = var.db_profile.database_user
      ENGINE_TYPE  = var.db_profile.engine_type
      SECRET_ARN   = local.secret_arn

      # default databases
      DEFAULT_DB = (
        var.db_profile.engine == "aurora-mysql" ? "mysql" : "postgres"
      )
    }
  }

  tags = merge(
    var.extra_tags,
    {
      Name        = "${var.name_prefix}-db-installer"
      SubResource = "aws_lambda_function.db_installer"
    }
  )
}

The Logic: Python Lambda Code

Here is how the Python components interact. The key is the db_helper.py, which leverages the Data API to execute SQL without traditional drivers.

1️⃣ db_config.py - The Configuration Layer

# db_config.py

import os
import json
from dataclasses import dataclass

def _require_env(name: str) -> str:
    try:
        return os.environ[name]
    except KeyError:
        raise RuntimeError(f"Missing required environment variable: {name}")

@dataclass(frozen=True)
class Config:
    cluster_arn: str
    databases: list[str]
    db_engine: str
    db_host: str
    db_user: str
    default_db: str
    engine_type: str
    secret_arn: str

CONFIG = Config(
    cluster_arn = _require_env("CLUSTER_ARN"),
    databases   = json.loads(_require_env("DATABASES")),
    db_engine   = _require_env("DB_ENGINE"),
    db_host     = _require_env("DB_USER_HOST"),
    db_user     = _require_env("DB_USER_NAME"),
    default_db  = os.environ.get("DEFAULT_DB", "postgres"),
    engine_type = _require_env("ENGINE_TYPE"),
    secret_arn  = _require_env("SECRET_ARN"),
)

2️⃣ db_helper.py - The Engine Logic

# db_helper.py

from db_config import CONFIG

db_user    = CONFIG.db_user
db_host    = CONFIG.db_host
default_db = CONFIG.default_db

# ==========================================================
# MySQL (IAM auth)
# ==========================================================

def mysql_user_exists(execute_fn) -> bool:
    resp = execute_fn(
        f"SELECT 1 FROM mysql.user WHERE user = '{db_user}'",
        default_db,
    )
    return len(resp.get("records", [])) > 0

def mysql_create_user(execute_fn):
    if mysql_user_exists(execute_fn):
        print(f"[SKIP] MySQL user exists: {db_user}@{db_host}")
        return

    print(f"[CREATE] MySQL user: {db_user}@{db_host}")

    execute_fn(
        f"""
        CREATE USER '{db_user}'@'{db_host}'
        IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS'
        """,
        default_db,
    )

def mysql_grant_privileges(execute_fn, db: str):
    print(f"[GRANT] MySQL privileges: {db_user}@{db_host} → {db}")

    execute_fn(
        f"GRANT ALL PRIVILEGES ON `{db}`.* TO '{db_user}'@'{db_host}'",
        default_db,
    )

def mysql_list_granted_dbs(execute_fn, user: str) -> set[str]:
    sql = """
        SELECT DISTINCT DB
        FROM mysql.db
        WHERE User = :user
    """

    resp = execute_fn(
        sql,
        database="mysql",
        params={"user": user},
    )

    records = resp.get("records", [])

    return {
        row[0]["stringValue"]
        for row in records
        if row and "stringValue" in row[0]
    }

def mysql_revoke_db(execute_fn, user: str, host: str, db: str):
    sql = f"REVOKE ALL PRIVILEGES ON `{db}`.* FROM '{user}'@'{host}'"
    execute_fn(sql, database="mysql")

# ==========================================================
# PostgreSQL (IAM auth)
# ==========================================================

def pgsql_db_exists(execute_fn, db: str, default_db: str) -> bool:
    resp = execute_fn(
        f"SELECT 1 FROM pg_database WHERE datname = '{db}'",
        default_db,
    )
    return len(resp.get("records", [])) > 0

def pgsql_create_app_role(execute_fn, app_role: str, default_db: str):
    resp = execute_fn(
        f"SELECT 1 FROM pg_roles WHERE rolname = '{app_role}'",
        default_db,
    )

    if len(resp.get("records", [])) == 0:
        print(f"[CREATE] PostgreSQL app role: {app_role}")
        execute_fn(
            f'CREATE ROLE "{app_role}" NOLOGIN',
            default_db,
        )
    else:
        print(f"[SKIP] PostgreSQL role exists: {app_role}")

    execute_fn(
        f'ALTER ROLE "{app_role}" LOGIN',
        default_db,
    )

    execute_fn(
        f'GRANT rds_iam TO "{app_role}"',
        default_db,
    )

def pgsql_create_db(execute_fn, db: str, default_db: str):
    execute_fn(
        f'CREATE DATABASE "{db}"',
        default_db,
    )

def pgsql_grant_privileges(
    execute_fn,
    db: str,
    app_role: str,
):
    print(f"[GRANT] PostgreSQL access: {app_role} → {db}")

    # DB-level access
    execute_fn(
        f'GRANT CONNECT, TEMP ON DATABASE "{db}" TO "{app_role}"',
        "postgres",
    )

    # Schema access
    execute_fn(
        f'GRANT USAGE ON SCHEMA public TO "{app_role}"',
        db,
    )

    # Explicit table grants (NO silent no-op)
    resp = execute_fn(
        """
        SELECT
            quote_ident(n.nspname) || '.' || quote_ident(c.relname)
        FROM pg_class c
        JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE n.nspname = 'public'
        AND c.relkind = 'r'
        """,
        db,
    )

    for r in resp.get("records", []):
        table = r[0]["stringValue"]
        execute_fn(
            f'GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE {table} TO "{app_role}"',
            db,
        )

    # Sequences
    execute_fn(
        f'''
        GRANT USAGE, SELECT, UPDATE
        ON ALL SEQUENCES IN SCHEMA public
        TO "{app_role}"
        ''',
        db,
    )

    # Future objects (mandatory)
    execute_fn(
        f'''
        ALTER DEFAULT PRIVILEGES IN SCHEMA public
        GRANT SELECT, INSERT, UPDATE, DELETE
        ON TABLES TO "{app_role}"
        ''',
        db,
    )

    execute_fn(
        f'''
        ALTER DEFAULT PRIVILEGES IN SCHEMA public
        GRANT USAGE, SELECT, UPDATE
        ON SEQUENCES TO "{app_role}"
        ''',
        db,
    )

3️⃣ mysql_installer.py - mySQL Orchestrator

# mysql_installer.py

import boto3
from botocore.exceptions import ClientError

from db_config import CONFIG
from db_helper import (
    mysql_create_user,
    mysql_grant_privileges,
    mysql_list_granted_dbs,
    mysql_revoke_db,
)   #<-- loads only mySQL specific functions 

rds_data = boto3.client("rds-data")
rds_ctrl = boto3.client("rds")

def _assert_cluster_available():
    """
    Ensure the Aurora cluster exists and is in 'available' state.
    Step Function should normally guarantee this, but this keeps
    the Lambda safe when invoked manually.
    """
    cluster_id = CONFIG.cluster_arn.split(":")[-1]

    try:
        resp = rds_ctrl.describe_db_clusters(
            DBClusterIdentifier=cluster_id
        )
    except ClientError as e:
        raise RuntimeError(f"Failed to describe cluster: {e}")

    status = resp["DBClusters"][0]["Status"]
    if status != "available":
        raise RuntimeError(f"Cluster not ready (status={status})")

    print(f"Aurora cluster is available: {cluster_id}")

def _execute(sql: str, database: str, params: dict | None = None):
    kwargs = {
        "resourceArn": CONFIG.cluster_arn,
        "secretArn": CONFIG.secret_arn,
        "database": database,
        "sql": sql,
    }

    if params:
        kwargs["parameters"] = [
            {"name": k, "value": {"stringValue": str(v)}}
            for k, v in params.items()
        ]

    return rds_data.execute_statement(**kwargs)


# -------------------------
# MySQL helpers
# -------------------------
def _mysql_db_exists(db: str) -> bool:
    resp = _execute(
        f"SHOW DATABASES LIKE '{db}'",
        CONFIG.default_db,
    )
    return bool(resp.get("records"))

def _mysql_create_db(db: str):
    if _mysql_db_exists(db):
        print(f"[SKIP] MySQL database already exists: {db}")
        return

    _execute(
        f"CREATE DATABASE `{db}` CHARACTER SET utf8mb4",
        CONFIG.default_db,
    )
    print(f"[OK] Created MySQL database: {db}")

def _cleanup_stale_mysql_grants(execute_fn):
    desired = set(CONFIG.databases)
    granted = mysql_list_granted_dbs(execute_fn, CONFIG.db_user)

    stale = granted - desired

    for db in sorted(stale):
        print(
            f"[CLEANUP] Revoking stale grants: "
            f"{CONFIG.db_user}@{CONFIG.db_host} → {db}"
        )
        mysql_revoke_db(
            execute_fn,
            CONFIG.db_user,
            CONFIG.db_host,
            db,
        )


# -------------------------
# Lambda entrypoint
# -------------------------
def handler(event, context):
    print("Starting Aurora database bootstrap")
    print(f"Engine    : {CONFIG.db_engine}")
    print(f"Databases : {CONFIG.databases}")

    # Explicit no-op behavior
    if not CONFIG.databases:
        print("No databases defined — skipping bootstrap")
        return {
            "status": "skipped",
            "reason": "no_databases",
            "engine": CONFIG.db_engine,
        }

    _assert_cluster_available()

    if CONFIG.db_engine == "aurora-mysql":
        _cleanup_stale_mysql_grants(_execute)
        mysql_create_user(_execute)

        for db in CONFIG.databases:
            _mysql_create_db(db)
            mysql_grant_privileges(_execute, db)

    else:
        raise ValueError(f"Unsupported DB engine: {CONFIG.db_engine}")

    print("Database bootstrap completed successfully")

    return {
        "status": "ok",
        "engine": CONFIG.db_engine,
        "databases": CONFIG.databases,
    }

4️⃣ pgsql_installer.py - pgSQL Orchestrator

# pgsql_installer.py

import boto3
from botocore.exceptions import ClientError

from db_config import CONFIG
from db_helper import (
    pgsql_db_exists,
    pgsql_create_app_role,
    pgsql_create_db,
    pgsql_grant_privileges,
)   #<-- loads only pgSQL specific functions 

rds_data = boto3.client("rds-data")
rds_ctrl = boto3.client("rds")

# ==========================================================
# PgSQL helper functions
# ==========================================================
def _assert_cluster_available():
    try:
        resp = rds_ctrl.describe_db_clusters(
            DBClusterIdentifier=CONFIG.cluster_arn
        )
    except ClientError as e:
        raise RuntimeError(f"Failed to describe DB cluster: {e}")

    status = resp["DBClusters"][0]["Status"]
    if status != "available":
        raise RuntimeError(f"DB cluster not available (status={status})")

def _execute(sql: str, database: str):
    return rds_data.execute_statement(
        resourceArn=CONFIG.cluster_arn,
        secretArn=CONFIG.secret_arn,
        database=database,
        sql=sql,
    )
# ==========================================================
# Lambda handler entrypoint
# ==========================================================
def handler(event, context):
    print("Starting PostgreSQL database bootstrap")

    _assert_cluster_available()
    app_role   = CONFIG.db_user
    default_db = CONFIG.default_db

    # 1. App role (LOGIN + rds_iam)
    pgsql_create_app_role(_execute, app_role, default_db)

    # 2. Databases + grants
    for db in CONFIG.databases:
        if pgsql_db_exists(_execute, db, default_db):
            print(f"[SKIP] PostgreSQL DB already exists: {db}")
        else:
            print(f"[CREATE] PostgreSQL DB: {db}")
            pgsql_create_db(_execute, db, default_db)

        pgsql_grant_privileges(
            _execute,
            db,
            app_role,
        )

    print("PostgreSQL database bootstrap completed successfully")

    return {
        "status": "ok",
        "engine": "aurora-postgresql",
        "databases": CONFIG.databases,
    }

The Outcome: Day-Zero Readiness

With this implementation, we have successfully established a Day-Zero capability that operates as a native extension of the infrastructure pipeline. By offloading state mutation to a specialized Lambda function, we ensure that the database is not just available, but fully initialized and ready for application consumption the moment the Terraform run completes.

Deployment Verification

Following a successful terragrunt apply, the bootstrap functions are deployed as distinct platform primitives within your AWS environment. These functions remain idle until invoked by your orchestration layer, ensuring they consume no compute resources outside of active bootstrap windows.

Validating the Execution Flow

To verify the integration between the Lambda, the Aurora Data API, and the RDS cluster, you can perform a manual execution test. By invoking the function with a standard event payload, the Lambda will execute its "check-before-create" logic across the cluster.

A successful execution (as shown in the console logs below) confirms that the IAM permissions, networking via the Data API, and the engine-specific logic are all functioning as a cohesive unit.

In Part 3, we will explore how to integrate these functions into a controlled AWS Step Functions workflow. This move will provide us with platform-level visibility, auditable execution history, and the ability to run on-demand bootstraps—all while keeping our deployment pipeline entirely script-free.

In This Series

Part 1: Architectural foundation and engineering basics.
Part 2: Deep dive into Lambda implementation and IAM-based access.
Part 3: Orchestrating workflows and Execution using AWS Step Functions.

DEV Community