Kohei Aoki

Posted on Mar 2

I Enabled Aurora Data API and My AI Agent Started Querying the Database Directly

#aws #serverless #ai #claudecode

I see Data API less as an infrastructure improvement and more as a tool that changes how the team works. Whether your AI tools can directly access the database makes an order-of-magnitude difference in debugging and data verification speed.

We enabled one AWS feature — Aurora Data API — and our AI coding tools could suddenly query the database. No bastion host, no port forwarding, no copy-pasting query results. Here's what that actually looks like in practice, and what you need to enable it.

TL;DR

Data API is an HTTPS-based SQL execution API built into Aurora. Enabling it costs virtually nothing
It complements SSM bastion hosts — use both
Once enabled, Claude Code / Cursor can query your DB directly via shell commands or MCP
Reduces team learning curve, simplifies bastion operations, and accelerates automation
If you're running Aurora, there's no reason not to enable it

AI Coding Tool Integration: How Data API Changes Team Development

Let me start with the payoff — this is where Data API had the biggest impact for us.

Shell-Based Tool Use: Claude Code Runs AWS CLI Directly

With the traditional bastion host setup — a dedicated EC2 instance you SSH into (or tunnel through) to reach your private database — AI tools accessing the database was effectively impossible. SSH port forwarding + MySQL client connection is a workflow designed for humans operating manually.

With Data API, a single shell command — aws rds-data execute-statement — executes SQL over HTTPS. That means Claude Code can run this command through its built-in Bash tool (which lets it execute shell commands on your behalf) to query and modify your database directly.

# Example: what Claude Code actually runs
aws rds-data execute-statement \
  --resource-arn "arn:aws:rds:ap-northeast-1:xxx:cluster:dev-cluster" \
  --secret-arn "arn:aws:secretsmanager:ap-northeast-1:xxx:secret:dbsecret" \
  --database "my_database" \
  --sql "SHOW TABLES" \
  --profile dev

This is the AI agent using a shell tool — it constructs and executes CLI commands, then parses the text output. It works, and it unlocks workflows like:

Debugging — a multi-step agentic loop:

"Check the staging users table for records where status is null"
Claude Code queries via Data API → finds 12 orphaned records
It reasons about the cause, issues a follow-up query on the user_sessions table
Identifies a race condition in the cleanup job → suggests a fix with a migration SQL

Migration authoring:
"Look at the current schema in dev and write a migration SQL for this spec" → auto-fetches current table definitions → generates diff SQL.

Data checks:
"How many records are in the production offices table, and when was the last update?" → instant answer.

Previously, a developer had to manually query through the bastion, copy-paste the results back to the AI, wait for analysis... Data API eliminates the human as the bottleneck in this loop.

MCP-Based Structured Tool Use: Natural Language DB Access

The second integration pattern is more powerful. MCP (Model Context Protocol) lets AI tools like Cursor and Claude Code connect to external data sources through typed, schema-defined interfaces. Instead of parsing free-form CLI output, the agent receives structured data with column names and types — making its actions more reliable and predictable.

The official MySQL MCP Server from AWS Labs uses Data API internally. Here's how to wire it up:

{
  "mcpServers": {
    "awslabs.mysql-mcp-server": {
      "command": "uvx",
      "args": [
        "awslabs.mysql-mcp-server@latest",
        "--resource_arn", "<cluster-arn>",
        "--secret_arn", "<secret-arn>",
        "--database", "<db-name>",
        "--region", "ap-northeast-1",
        "--readonly", "True"
      ],
      "env": {
        "AWS_PROFILE": "dev",
        "AWS_REGION": "ap-northeast-1"
      }
    }
  }
}

Change ap-northeast-1 to your AWS region.

Once configured, you can query from Cursor or Claude Code using natural language:

"How many support tickets came in this month?"
"Show me all active users created in the last 30 days"
"Which records were updated in the last 7 days?"

Guardrails for Production Use

With AI tools querying your database, safety matters. Here's our approach:

--readonly True in MCP config: Restricts the MCP server to SELECT-only queries. This is enforced at the MCP server level, but should not be your only line of defense.
Read-only DB user: Create a dedicated database user with SELECT-only privileges and store those credentials in a separate Secrets Manager secret. Use this secret for AI tool access — not the application's read-write credentials.
IAM policy scoping: Restrict the IAM role to rds-data:ExecuteStatement and secretsmanager:GetSecretValue on the specific read-only secret ARN.
CloudTrail audit trail: Every Data API call is logged in CloudTrail, including the SQL statement. This gives you post-hoc observability of everything the agent executed.
Human-in-the-loop for writes: Removing the human from read queries is a productivity win. For write operations, the human-as-bottleneck is actually the safety mechanism. Keep approval workflows for any non-read access.

Note on prompt injection: If your database contains user-generated content, be aware that query results could include text designed to manipulate the agent's next action. Use parameterized queries through the Data API to mitigate injection risks.

Team Learning Curve Drops Dramatically

This one matters if you're leading a team.

What team members previously had to learn for DB access:

AWS SSO login workflow
AWS Systems Manager (SSM) Session Manager installation and configuration
Port forwarding concepts and execution
MySQL client installation and connection setup
Retrieving passwords from Secrets Manager

With Data API + AI tools:

AWS CLI profile setup (most engineers already know this)
"Show me the data for X" → ask the AI

Five steps become effectively one. For onboarding, it's now "Log in with AWS SSO, then just ask Claude" — and that's it.

The barrier for non-infrastructure engineers and frontend developers who just want to "quickly check some data" drops dramatically.

Reduced Operational Burden

For small teams, the potential to eliminate bastion host maintenance is a significant win.

Complete bastion elimination depends on your team's needs — bulk data exports and complex investigations still benefit from SSM. But the majority of day-to-day "let me quickly check this data" or "update a few records" use cases can be handled by Data API.

As bastion usage drops, you can justify switching from always-on to on-demand — further reducing costs and maintenance.

What Is RDS Data API?

RDS Data API lets you execute SQL against Aurora via HTTPS REST calls.

Traditional database connections rely on the MySQL wire protocol over TCP/IP. Data API replaces that with standard HTTP requests through the AWS SDK or CLI. No VPC required — which also means faster Lambda cold starts, simpler networking, and no ENI provisioning delays.

Supported Engines and Limitations

Data API is Aurora-only. Standard RDS is not supported.

Engine	Data API Support
Aurora MySQL (v3.07+)	Supported
Aurora PostgreSQL (v13.12+, 14.9+, 15.4+)	Supported
Standard RDS MySQL / PostgreSQL	Not supported

Key limitations:

Constraint	Value
Response size	1 MB max per request
Row size	64 KB max per row
Timeout	45 seconds max per request
Multi-statement	Not supported on MySQL
Target	Writer instance only

The 1 MB response limit means Data API isn't suited for bulk SELECTs or data exports. That's where the SSM bastion still earns its keep.

Note on multi-statement: If your CI/CD migration tool (Flyway, Liquibase, etc.) uses multi-statement SQL files, you'll need to split statements or use a wrapper for MySQL.

Pricing

Enabling is free. Usage costs $0.35 per million requests.

1,000 SQL executions per month = $0.00035. Not exactly a budget consideration.

Data API vs SSM Session Manager

This was the decision point I wrestled with most. The answer: it's not either/or — it's both.

Use Case Breakdown

Use Case	Data API	SSM Bastion
Lambda → DB operations	Ideal (no VPC needed)	Not possible
CI/CD migrations	Good fit	Possible but complex
Scheduled batch scripts	Good fit (IAM auth)	Requires session management
Application integration	Ideal (pure HTTP)	Poor fit
Manual data investigation	Limited	Ideal (GUI tools)
Bulk data export	Poor fit (1 MB limit)	Ideal
Emergency manual UPDATEs	Possible but clunky	Ideal

Operational Cost Comparison

Factor	Data API	SSM Bastion
Initial setup	Near-zero (one console click)	EC2 + SSM + security group config
Monthly cost	~$0	$3-8/mo (t4g.nano always-on)
Maintenance	None (fully managed)	OS patches, SSM Agent updates
Audit logging	CloudTrail auto-records all queries	Dual management: CloudTrail + DB audit logs
Connection management	None (no persistent connections)	max_connections tuning required

The annual cost of a bastion host ($36-96) looks small on paper, but factor in the human cost of OS patching, SSM Agent updates, and recovery when the instance goes down — it adds up fast, especially for small teams.

Security Comparison

Factor	Data API	SSM Bastion
Attack surface	HTTPS (IAM-protected)	SSM only (no inbound ports)
Credential management	Secrets Manager (auto-rotation)	DB password stored locally
Access control	IAM policies control API calls	Dual management: SSM permissions + DB user permissions

Data API delegates credentials entirely to Secrets Manager, meaning developers never need to know the DB password. That's a significant security win.

How to Enable It

Prerequisites

Requirement	Details
DB engine	Aurora MySQL v3.07+ or Aurora PostgreSQL v13.12+
Instance class	Any class except T instances (for provisioned; Serverless v2 is unaffected)
Secrets Manager	DB credentials must be stored

If you're on standard RDS, Data API isn't available. Whether to migrate to Aurora for Data API alone depends on the other benefits (Serverless v2 autoscaling, etc.).

CLI

# For Serverless v2 / Provisioned clusters
aws rds enable-http-endpoint \
  --resource-arn <cluster-arn> \
  --profile <profile>

Important: For Serverless v2 / Provisioned clusters, use enable-http-endpoint. The modify-db-cluster --enable-http-endpoint command is for Serverless v1 only (now end-of-life). The docs are confusing on this — watch out.

Verification

aws rds-data execute-statement \
  --resource-arn "<cluster-arn>" \
  --secret-arn "<secret-arn>" \
  --database "<db-name>" \
  --sql "SELECT 1 AS test" \
  --profile <profile>

# → {"records": [[{"longValue": 1}]], "numberOfRecordsUpdated": 0}

Preventing CDK Drift

If you enable via CLI without updating your CDK code, the next deploy might reset it to false.

const cluster = new rds.DatabaseCluster(this, 'Cluster', {
  engine: rds.DatabaseClusterEngine.auroraMysql({
    version: rds.AuroraMysqlEngineVersion.VER_3_08_0,
  }),
  // ... existing config ...
  enableDataApi: true,  // ← Add this
});

Always update your CDK code alongside the CLI enablement. If CDK drift reverts the setting, any MCP servers or automation scripts that depend on Data API will break simultaneously.

Risk Assessment

Enabling Data API carries virtually zero risk.

No impact on existing applications or database connections
Even when enabled, access requires IAM permissions
Can be disabled instantly with a single command
Cost is effectively zero

The only caveat: prevent CDK drift. If you forget to add enableDataApi: true to your code, the next deploy reverts the setting.

Should You Enable It?

If you're running Aurora, enabling Data API is one of those "no reason not to" improvements.

Near-zero risk and cost to enable
Complements SSM bastion for full use-case coverage
AI coding tool integration accelerates the entire team
Directly reduces learning curve and operational overhead

"You don't need to know the DB password. You don't need to set up port forwarding. You just ask Claude." — That's the new onboarding experience.

If you have Aurora clusters that haven't enabled it yet, start with your dev environment. One command, five seconds.

Are you already using Data API with AI tools? Or still running bastion hosts for everything? I'd love to hear what your team's setup looks like — drop a comment below.

DEV Community