Table Of Contents
- The Solution
- Core Design Concepts
- Aurora Data API
- The Code Structure
- Implementation & Code
- Day-Zero Readiness
In Part-1, we explored the inherent challenges of managing post-deployment database state and user privileges within a declarative IaC workflow. In this article, we shift from problem to implementation, detailing a solution that provides a robust, scalable alternative for platform teams.
The Solution: Decoupling via Purpose-Built Lambda
The core of this architecture is the establishment of a clear execution boundary. By offloading database mutation logic to an AWS Lambda function, we decouple the state of the infrastructure from the state of the database.
Terraform’s Responsibility: Operates as the orchestrator, provisioning the Aurora cluster and Lambda function while injecting necessary metadata (endpoints, ARNs, and engine types).
Lambda’s Responsibility: Acts as the execution engine, performing engine-specific logic (MySQL/PostgreSQL), enforcing role-based access control (RBAC), and managing IAM-based authentication.
Strategic Benefits
True Idempotency: Unlike complex Terraform providers that can drift, the Lambda logic is procedural and testable, allowing for safe re-runs without impacting infrastructure integrity.
Hardened Security: The design eliminates the need for static administrative passwords, leveraging AWS IAM Database Authentication for all bootstrap operations.
Architectural Purity: Terraform remains purely declarative, focused on infrastructure lifecycle, while the Lambda handles the "Day 1" platform bootstrapping as a discrete capability.
This deep-dive focuses on the structural internals of the Lambda—exploring the design decisions that ensure maintainability and engine parity while keeping the footprint minimal.
Core Design Concepts & Requirements
To ensure a seamless handoff between infrastructure and application, the bootstrap Lambda was built around the following core requirements:
Passwordless Database Operations: Eliminates the need for static credentials or secret rotation by utilizing native AWS authentication.
IAM/Token-Based Access: Leverages short-lived IAM database tokens for all administrative actions, ensuring a secure and auditable connection lifecycle.
Zero Developer Friction: The entire process is transparent to application teams; databases and roles are provisioned automatically without manual intervention.
Multi-Tenant Support: A single execution can manage multiple database schemas and user roles, mapping complex requirements to a unified configuration.
Day-Zero Readiness: Ensures that the database environment is fully initialized and accessible the moment the Aurora cluster enters the available state.
The Secret Sauce: Aurora Data API
One of the biggest hurdles in building a lightweight Lambda for database bootstrapping is the dependency on database drivers. Python’s standard library does not include native drivers for MySQL or PostgreSQL.
Typically, this forces platform engineers into a dependency trap, implementing either:
- Lambda Layers: Adding overhead for versioning and binary compatibility.
Or
- Container Images: Introducing complex build pipelines and increasing the security blast radius.
We solved this by enabling the Aurora Data API. This design choice is the cornerstone of the Lambda's simplicity:
Pure AWS SDK: The Lambda interacts with the database via standard boto3 calls. If the Lambda environment has the AWS SDK, it has everything it needs to execute SQL.
No Drivers Required: By sending SQL commands over HTTPS via the Data API, we eliminated the need for psycopg2, pymysql, or any other external binaries.
Simplified Networking: The Data API handles the connection pooling and overhead, allowing the Lambda to remain "stateless" and extremely fast to cold-start.
By treating the database as an API endpoint rather than a traditional socket connection, we achieved a zero-dependency deployment artifact. This makes the bootstrap process more auditable, easier to maintain, and significantly more resilient to environment drift.
Implementing this modern connectivity model required only a single configuration change within the chlid-module:
# Enabling native IAM authentication within the Aurora cluster
iam_database_authentication_enabled = true
The Modular Code Structure
The Lambda’s internal structure is designed for clarity and extensibility. Rather than a monolithic script, the codebase is partitioned into functional layers. This ensures that adding support for a new engine or modifying authentication logic doesn't disrupt the core execution flow.
File Structure
The project follows a flat, purposeful directory layout:
./lambda
├── __init__.py
├── db_config.py # Environment-to-Runtime mapping
├── db_helper.py # Core SQL logic and engine-specific functions
├── mysql_installer.py # MySQL execution workflow
└── pgsql_installer.py # PostgreSQL execution workflow
Component Responsibilities
db_config.py(The Configuration Layer): This module acts as the bridge between Terraform and Python. It ingest cluster-specific environment variables — such as cluster identifiers, engine types, and database names—and validates them before execution begins.db_helper.py(The Logic Library): This is the "brain" of the operation. It contains the granular, engine-specific functions required to verify and mutate state. By centralizing functions like mysql_user_exists and pgsql_create_db here, we ensure that the logic is reusable and easy to unit test.*_installer.py(The Workflow Orchestrators): These files represent the high-level workflows for each engine. An installer imports the necessary primitives from the helper and executes them in a logical sequence.
e.g. Check if DB exists → Create if missing → Assign privileges.
Although the bootstrap flow is shared, MySQL and PostgreSQL differ fundamentally in ownership, role semantics, and grant models — which is why installers are intentionally split.
The helper functions are designed to accept a generic execute_fn. This allows us to swap the underlying execution engine — such as switching from a standard driver to the Aurora Data API — without rewriting the core business logic.
The Blueprint: Implementation & Code
With the architectural boundaries defined, now we will look at the implementation. This section provides the Copy-Paste ready code to stand up the bootstrap utility.
IaC: The Terraform Layer
1️⃣ Every AWS Lambda function needs permission to interact with other AWS infrastructure resources within your account. These permissions are set via an AWS IAM Role that required to interact with the Aurora Data API. The creation of the execution role and the specific IAM policies are part of the root-module; the rule ARN then supplied to the lambda-function.
# ----------------------------------------------------------
# IAM role Lambda-function
# ----------------------------------------------------------
resource "aws_iam_role" "air_lfn" {
for_each = (
anytrue(values(local.service_enabled))
? toset([var.service_name])
: []
)
name = "${local.template_name}-lambda-Role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Service = "lambda.amazonaws.com"
}
Action = "sts:AssumeRole"
}]
})
}
# ----------------------------------------------------------
# Lambda role-policy document and attachment
# ----------------------------------------------------------
data "aws_iam_policy_document" "aipd_lfn" {
for_each = (
anytrue(values(local.service_enabled))
? toset([var.service_name])
: []
)
statement {
sid = "RDSDataAPI"
effect = "Allow"
actions = [
"rds-data:ExecuteStatement",
"rds-data:BatchExecuteStatement"
]
resources = ["*"]
}
statement {
sid = "DescribeCluster"
effect = "Allow"
actions = [
"rds:DescribeDBClusters"
]
resources = ["*"]
}
statement {
sid = "ReadDBSecret"
effect = "Allow"
actions = [
"secretsmanager:GetSecretValue"
]
resources = [
for DB in keys(local.db_profiles) :
module.db_cluster[DB].master_user_secret_arn
]
}
statement {
sid = "LambdaLogging"
effect = "Allow"
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
resources = ["arn:aws:logs:*:*:*"]
}
}
resource "aws_iam_role_policy" "airp_lfn" {
for_each = (
anytrue(values(local.service_enabled))
? toset([var.service_name])
: []
)
role = aws_iam_role.air_lfn[each.value].id
policy = data.aws_iam_policy_document.aipd_lfn[each.value].json
}
2️⃣ Next, we define AWS infrastructure resources, which I included as part of the child module. The following Terraform configuration automatically packages the Python source into a ZIP archive and provisions the Lambda function, along with its execution role and granular IAM policies.
# ---------------------------------------------------------
# Lambda function to manage BDs
# ---------------------------------------------------------
data "archive_file" "daf_lambda" {
for_each = (
var.enable_db_managment
? { (var.db_profile.engine_type) = true }
: {}
)
type = "zip"
output_path = "${path.module}/${each.key}-lambda.zip"
source {
content = file("${path.module}/lambda/__init__.py")
filename = "__init__.py"
}
source {
content = file("${path.module}/lambda/db_config.py")
filename = "db_config.py"
}
source {
content = file("${path.module}/lambda/db_helper.py")
filename = "db_helper.py"
}
source {
content = file("${path.module}/lambda/${each.key}_installer.py")
filename = "db_installer.py"
}
}
resource "aws_lambda_function" "db_installer" {
for_each = (
var.enable_db_managment
? { (var.db_profile.engine_type) = true }
: {}
)
function_name = "${var.name_prefix}-${each.key}-installer"
role = var.lambda_service_role #<-- comes from root-module
runtime = "python3.12"
handler = "db_installer.handler"
timeout = 30
memory_size = 256
filename = data.archive_file.daf_lambda[each.key].output_path
source_code_hash = data.archive_file.daf_lambda[each.key].output_base64sha256
environment { #<-- feeds db_config.py
variables = {
CLUSTER_ARN = module.cluster.cluster_arn
DATABASES = jsonencode(var.db_profile.databases)
DB_ENGINE = var.db_profile.engine
DB_USER_HOST = var.db_profile.dbs_user_host
DB_USER_NAME = var.db_profile.database_user
ENGINE_TYPE = var.db_profile.engine_type
SECRET_ARN = local.secret_arn
# default databases
DEFAULT_DB = (
var.db_profile.engine == "aurora-mysql" ? "mysql" : "postgres"
)
}
}
tags = merge(
var.extra_tags,
{
Name = "${var.name_prefix}-db-installer"
SubResource = "aws_lambda_function.db_installer"
}
)
}
The Logic: Python Lambda Code
Here is how the Python components interact. The key is the db_helper.py, which leverages the Data API to execute SQL without traditional drivers.
1️⃣ db_config.py - The Configuration Layer
# db_config.py
import os
import json
from dataclasses import dataclass
def _require_env(name: str) -> str:
try:
return os.environ[name]
except KeyError:
raise RuntimeError(f"Missing required environment variable: {name}")
@dataclass(frozen=True)
class Config:
cluster_arn: str
databases: list[str]
db_engine: str
db_host: str
db_user: str
default_db: str
engine_type: str
secret_arn: str
CONFIG = Config(
cluster_arn = _require_env("CLUSTER_ARN"),
databases = json.loads(_require_env("DATABASES")),
db_engine = _require_env("DB_ENGINE"),
db_host = _require_env("DB_USER_HOST"),
db_user = _require_env("DB_USER_NAME"),
default_db = os.environ.get("DEFAULT_DB", "postgres"),
engine_type = _require_env("ENGINE_TYPE"),
secret_arn = _require_env("SECRET_ARN"),
)
2️⃣ db_helper.py - The Engine Logic
# db_helper.py
from db_config import CONFIG
db_user = CONFIG.db_user
db_host = CONFIG.db_host
default_db = CONFIG.default_db
# ==========================================================
# MySQL (IAM auth)
# ==========================================================
def mysql_user_exists(execute_fn) -> bool:
resp = execute_fn(
f"SELECT 1 FROM mysql.user WHERE user = '{db_user}'",
default_db,
)
return len(resp.get("records", [])) > 0
def mysql_create_user(execute_fn):
if mysql_user_exists(execute_fn):
print(f"[SKIP] MySQL user exists: {db_user}@{db_host}")
return
print(f"[CREATE] MySQL user: {db_user}@{db_host}")
execute_fn(
f"""
CREATE USER '{db_user}'@'{db_host}'
IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS'
""",
default_db,
)
def mysql_grant_privileges(execute_fn, db: str):
print(f"[GRANT] MySQL privileges: {db_user}@{db_host} → {db}")
execute_fn(
f"GRANT ALL PRIVILEGES ON `{db}`.* TO '{db_user}'@'{db_host}'",
default_db,
)
# ==========================================================
# PostgreSQL (IAM auth)
# ==========================================================
def pgsql_db_exists(execute_fn, db: str, default_db: str) -> bool:
resp = execute_fn(
f"SELECT 1 FROM pg_database WHERE datname = '{db}'",
default_db,
)
return len(resp.get("records", [])) > 0
def pgsql_create_app_role(execute_fn, app_role: str, default_db: str):
resp = execute_fn(
f"SELECT 1 FROM pg_roles WHERE rolname = '{app_role}'",
default_db,
)
if len(resp.get("records", [])) == 0:
print(f"[CREATE] PostgreSQL app role: {app_role}")
execute_fn(
f'CREATE ROLE "{app_role}" NOLOGIN',
default_db,
)
else:
print(f"[SKIP] PostgreSQL role exists: {app_role}")
execute_fn(
f'ALTER ROLE "{app_role}" LOGIN',
default_db,
)
execute_fn(
f'GRANT rds_iam TO "{app_role}"',
default_db,
)
def pgsql_create_db(execute_fn, db: str, default_db: str):
execute_fn(
f'CREATE DATABASE "{db}"',
default_db,
)
def pgsql_grant_privileges(
execute_fn,
db: str,
app_role: str,
):
print(f"[GRANT] PostgreSQL access: {app_role} → {db}")
# DB-level access
execute_fn(
f'GRANT CONNECT, TEMP ON DATABASE "{db}" TO "{app_role}"',
"postgres",
)
# Schema access
execute_fn(
f'GRANT USAGE ON SCHEMA public TO "{app_role}"',
db,
)
# Explicit table grants (NO silent no-op)
resp = execute_fn(
"""
SELECT
quote_ident(n.nspname) || '.' || quote_ident(c.relname)
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public'
AND c.relkind = 'r'
""",
db,
)
for r in resp.get("records", []):
table = r[0]["stringValue"]
execute_fn(
f'GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE {table} TO "{app_role}"',
db,
)
# Sequences
execute_fn(
f'''
GRANT USAGE, SELECT, UPDATE
ON ALL SEQUENCES IN SCHEMA public
TO "{app_role}"
''',
db,
)
# Future objects (mandatory)
execute_fn(
f'''
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT, INSERT, UPDATE, DELETE
ON TABLES TO "{app_role}"
''',
db,
)
execute_fn(
f'''
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT USAGE, SELECT, UPDATE
ON SEQUENCES TO "{app_role}"
''',
db,
)
3️⃣ mysql_installer.py - mySQL Orchestrator
# mysql_installer.py
import boto3
from botocore.exceptions import ClientError
from db_config import CONFIG
from db_helper import (
mysql_create_user,
mysql_grant_privileges
) #<-- loads only mySQL specific functions
rds_data = boto3.client("rds-data")
rds_ctrl = boto3.client("rds")
def _assert_cluster_available():
"""
Ensure the Aurora cluster exists and is in 'available' state.
Step Function should normally guarantee this, but this keeps
the Lambda safe when invoked manually.
"""
cluster_id = CONFIG.cluster_arn.split(":")[-1]
try:
resp = rds_ctrl.describe_db_clusters(
DBClusterIdentifier=cluster_id
)
except ClientError as e:
raise RuntimeError(f"Failed to describe cluster: {e}")
status = resp["DBClusters"][0]["Status"]
if status != "available":
raise RuntimeError(f"Cluster not ready (status={status})")
print(f"Aurora cluster is available: {cluster_id}")
def _execute(sql: str, database: str):
print(f"[SQL] ({database}) {sql}")
return rds_data.execute_statement(
resourceArn=CONFIG.cluster_arn,
secretArn=CONFIG.secret_arn,
database=database,
sql=sql,
)
# -------------------------
# MySQL helpers
# -------------------------
def _mysql_db_exists(db: str) -> bool:
resp = _execute(
f"SHOW DATABASES LIKE '{db}'",
CONFIG.default_db,
)
return bool(resp.get("records"))
def _mysql_create_db(db: str):
if _mysql_db_exists(db):
print(f"[SKIP] MySQL database already exists: {db}")
return
_execute(
f"CREATE DATABASE `{db}` CHARACTER SET utf8mb4",
CONFIG.default_db,
)
print(f"[OK] Created MySQL database: {db}")
# -------------------------
# Lambda entrypoint
# -------------------------
def handler(event, context):
print("Starting Aurora database bootstrap")
print(f"Engine : {CONFIG.db_engine}")
print(f"Databases : {CONFIG.databases}")
# Explicit no-op behavior
if not CONFIG.databases:
print("No databases defined — skipping bootstrap")
return {
"status": "skipped",
"reason": "no_databases",
"engine": CONFIG.db_engine,
}
_assert_cluster_available()
if CONFIG.db_engine == "aurora-mysql":
mysql_create_user(_execute)
for db in CONFIG.databases:
_mysql_create_db(db)
mysql_grant_privileges(_execute, db)
else:
raise ValueError(f"Unsupported DB engine: {CONFIG.db_engine}")
print("Database bootstrap completed successfully")
return {
"status": "ok",
"engine": CONFIG.db_engine,
"databases": CONFIG.databases,
}
4️⃣ pgsql_installer.py - pgSQL Orchestrator
# pgsql_installer.py
import boto3
from botocore.exceptions import ClientError
from db_config import CONFIG
from db_helper import (
pgsql_db_exists,
pgsql_create_app_role,
pgsql_create_db,
pgsql_grant_privileges,
) #<-- loads only pgSQL specific functions
rds_data = boto3.client("rds-data")
rds_ctrl = boto3.client("rds")
# ==========================================================
# PgSQL helper functions
# ==========================================================
def _assert_cluster_available():
try:
resp = rds_ctrl.describe_db_clusters(
DBClusterIdentifier=CONFIG.cluster_arn
)
except ClientError as e:
raise RuntimeError(f"Failed to describe DB cluster: {e}")
status = resp["DBClusters"][0]["Status"]
if status != "available":
raise RuntimeError(f"DB cluster not available (status={status})")
def _execute(sql: str, database: str):
return rds_data.execute_statement(
resourceArn=CONFIG.cluster_arn,
secretArn=CONFIG.secret_arn,
database=database,
sql=sql,
)
# ==========================================================
# Lambda handler entrypoint
# ==========================================================
def handler(event, context):
print("Starting PostgreSQL database bootstrap")
_assert_cluster_available()
app_role = CONFIG.db_user
default_db = CONFIG.default_db
# 1. App role (LOGIN + rds_iam)
pgsql_create_app_role(_execute, app_role, default_db)
# 2. Databases + grants
for db in CONFIG.databases:
if pgsql_db_exists(_execute, db, default_db):
print(f"[SKIP] PostgreSQL DB already exists: {db}")
else:
print(f"[CREATE] PostgreSQL DB: {db}")
pgsql_create_db(_execute, db, default_db)
pgsql_grant_privileges(
_execute,
db,
app_role,
)
print("PostgreSQL database bootstrap completed successfully")
return {
"status": "ok",
"engine": "aurora-postgresql",
"databases": CONFIG.databases,
}
The Outcome: Day-Zero Readiness
With this implementation, we have successfully established a Day-Zero capability that operates as a native extension of the infrastructure pipeline. By offloading state mutation to a specialized Lambda function, we ensure that the database is not just available, but fully initialized and ready for application consumption the moment the Terraform run completes.
Deployment Verification
Following a successful terragrunt apply, the bootstrap functions are deployed as distinct platform primitives within your AWS environment. These functions remain idle until invoked by your orchestration layer, ensuring they consume no compute resources outside of active bootstrap windows.
Validating the Execution Flow
To verify the integration between the Lambda, the Aurora Data API, and the RDS cluster, you can perform a manual execution test. By invoking the function with a standard event payload, the Lambda will execute its "check-before-create" logic across the cluster.
A successful execution (as shown in the console logs below) confirms that the IAM permissions, networking via the Data API, and the engine-specific logic are all functioning as a cohesive unit.
In Part 3, we will explore how to integrate these functions into a controlled AWS Step Functions workflow. This move will provide us with platform-level visibility, auditable execution history, and the ability to run on-demand bootstraps—all while keeping our deployment pipeline entirely script-free.
In This Series
- Part 1: Architectural foundation and engineering basics.
- Part 2: Deep dive into Lambda implementation and IAM-based access.
- Part 3: Orchestrating workflows and Execution using AWS Step Functions.


Top comments (0)