AI agents are getting access to production data and we’re doing it wrong.
Most teams are connecting agents directly to databases.
This works in demos.
It breaks in production.
Because AI agents are not deterministic systems.
They:
- explore instead of follow rules
- generate queries instead of executing predefined logic
- optimize for answers, not safety
Databases were built for humans.
Agents don’t understand consequences.
What actually goes wrong
When you connect an agent directly to a database, you introduce a new class of failures:
- Unpredictable queries
- Full table scans
- Schema exposure
- Cross-tenant data leaks
- Destructive operations on production
A simple prompt like:
"Show me recent orders"
can turn into:
SELECT * FROM orders
JOIN customers ON ...
JOIN payments ON ...
Now you’ve exposed everything.
Including data the agent should never see.
Why existing solutions don’t work
Teams try to patch this. None of the current approaches solve the core issue.
Read-only roles
Still expose the entire schema. The agent can see everything. It just can’t write.
Semantic layers
Built for humans using BI tools. Not for autonomous agents generating queries dynamically.
Sandboxes
Drift from production immediately. Agents behave differently in real environments.
Human approval
Kills autonomy. Does not scale.
The missing piece: The Agent Data Layer
We are missing a layer.
A control layer between AI agents and production data.
The Agent Data Layer (ADL)
Definition
The Agent Data Layer is a controlled interface between AI agents and production data systems, where all access is mediated through predefined, parameterized datasets.
The agent never touches the database.
It calls named endpoints.
Core principles
An Agent Data Layer enforces:
- Datasets as endpoints
- Parameterized access only
- No schema exposure
- Field-level control
- Tenant isolation
- Auditable execution
- Deterministic interface
What this looks like in practice
Without ADL
Agent gets:
host: prod.db.company.com
user: admin
password: ****
Then generates queries freely.
With ADL
Agent gets:
GET /datasets/recent_orders?customerId=123
x-api-key: sk_live_...
Response:
{
"data": [...],
"rowCount": 8,
"executionTimeMs": 42
}
No SQL.
No credentials.
No schema.
Why this matters
AI agents are moving into:
- multi-tenant SaaS
- customer-facing copilots
- production systems
Without a control layer:
You don’t have an AI system.
You have a data breach waiting to happen.
The shift
Old thinking:
Give the agent access and add guardrails later.
New thinking:
Define what the agent can access before it runs.
Final thought
AI should not explore your database.
It should operate within rules you define.
The Agent Data Layer is that interface.
I’ve implemented this pattern in a real system. If you're exploring this space, I’d be interested in how you're approaching agent data access.
Top comments (0)