Your Data Stack Is Working Exactly As Designed — Here's Why Your AI Agents Keep Failing
The data warehouse is not broken.
The transformation layer is doing its job.
The dashboards are accurate.
And your AI agents are still making decisions that would make any experienced analyst wince.
Here's the uncomfortable truth: everything your data stack was built to do, it does well. The problem is that it was built for a different reader — a human one. And when you swap that reader out for an autonomous agent, the gaps that humans quietly filled through judgment, institutional memory, and context become catastrophic failure points.
What the Stack Was Actually Built For
For the past decade, the data architecture workflow looked like this:
Raw Sources → Ingestion → Warehouse → dbt Transform → Metrics Layer → Dashboard → Human Analyst
That last arrow is load-bearing in ways we never had to make explicit.
The human analyst brought what no pipeline could deliver: business context. They knew why a metric moved — because of a pricing experiment, a large customer churn, or a one-time bulk order that skewed the averages. They knew which dashboard was canonical and which one the finance team secretly maintained with different definitions. They knew which policy could bend for a strategic account and which exception had been pre-approved by legal.
None of that lived in the warehouse. It lived in people, Slack threads, one-on-one conversations, and the kind of organizational muscle memory that gets transferred through onboarding and osmosis.
AI agents don't get onboarded.
The seam, Agents Can't Navigate
Modern enterprise data stacks are actually a collection of specialized tools stitched together:
- Ingestion: Fivetran, Airbyte
- Storage: Snowflake, BigQuery, Databricks
- Transformation: dbt
- Metrics layer: MetricFlow, Cube
- Data catalog: Alation, Collibra, Atlan
- Governance: varied
- Observability: Monte Carlo, Great Expectations
- BI: Tableau, Looker, Power BI Each tool is good at its job. Together, they create seams. Humans learned to navigate those seams invisibly — knowing which field in which system was "the real one," or that the definition of "active customer" changed in Q3 2023 and the old reports haven't been backfilled.
When an AI agent queries a metric, it gets the number. It doesn't get:
- The lineage of how that number was computed
- The definition revision history
- The exception carved out for Enterprise tier customers
- The operational judgment about when to trust it versus when to dig deeper The agent sees data. It doesn't see the map that makes the data safe to act on.
Agent Query → Data Returned → [GAP: missing context, policy, memory] → Decision Made
That gap is where the bad recommendations come from.
Governance Without Context Is Not Governance
This is the part that catches most teams off guard.
Traditional governance stacks treat the problem as a layering issue: put permissions here, lineage there, business glossary somewhere else, and tie them together through integrations. The assumption is that combining these artifacts produces governed data.
It doesn't.
Governance isn't a rule. It's the act of deciding how a rule applies in context.
A pricing policy applies until it doesn't — for a strategic customer, for a partner relationship, during a specific campaign window. An approval chain is correct unless the person who can override it has already done so verbally and the exception hasn't been captured anywhere structured.
Those exceptions are not edge cases. They represent a significant portion of where the organization's real judgment, authority, and institutional knowledge actually lives. If that judgment exists only in a Slack thread or an email chain, an AI agent following the documented rule can be technically compliant and operationally wrong at the same time.
The Enterprise Memory Gap
Here's a dimension the traditional stack almost entirely ignores: reasoning memory.
The modern data stack is excellent at storing artifacts:
- Tables ✓
- Dashboards ✓
- Logs ✓
- Metrics ✓ It is weak at preserving the reasoning that produced them:
- Why was this definition changed?
- Who approved this exception and why?
- What happened the last time we made this type of decision?
- Which tradeoff was accepted and under what constraints? For human teams, this doesn't matter much — people ask their colleagues, pull up old meeting notes, or rely on tribal knowledge. The cost is friction and occasional mistakes. For AI agents operating at scale, this is a structural failure. An agent without access to reasoning history will re-derive decisions that were previously made and rejected, repeat past mistakes, and miss constraints that were hard-won through prior experience.
Human Decision Process:
[Data] + [Policy] + [Memory of Past Decisions] + [Contextual Judgment] → Action
AI Agent (current):
[Data] + [Policy] → Action (missing memory + contextual judgment)
Retrieval-augmented approaches can pull fragments from unstructured sources, but it's brittle. A semantic search over Slack messages and Confluence docs is not the same as a structured memory layer that an agent can query with confidence.
The Architecture Shift: Adding a Context Plane
The framework that emerges from this analysis is a three-plane architecture:
┌─────────────────────────────────────────────────┐
│ CONTEXT PLANE │
│ Semantic models · Policies · Memory · Exceptions│
│ Lineage · Definitions · Governance rules │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ CONTROL PLANE │
│ Permissions · Auth · Orchestration · Audit logs │
└─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────┐
│ DATA PLANE │
│ Ingestion · Storage · Transformation · Metrics │
└─────────────────────────────────────────────────┘
The data plane is what you have today. The control plane handles auth and orchestration. The context plane is what's missing.
The context plane isn't a search index bolted onto your existing stack. It needs to be:
- Produced as work happens — context captured at the point of decision, not reconstructed retroactively
- Governed alongside data — with the same rigor as the data itself
- Queryable by agents — structured enough that an agent can retrieve it with confidence, not just probabilistically
- Updated continuously — because the business context changes as fast as the business does Building this from scratch is hard. The practical path for most enterprises will be evolutionary: identify the workflows where agents are expected to act, map the context and governance those workflows require, and layer in an AI-ready context layer that can unify structured data, unstructured knowledge, policies, memory, and orchestration incrementally.
What "Context-Driven" Actually Means in Practice
The shift from "data-driven" to "context-driven" isn't marketing language — it's an architectural requirement that shows up concretely in how you design agent systems.
A data-driven agent gets a number and acts on it.
A context-driven agent gets a number, plus:
- The semantic definition of that metric
- The lineage showing how it was computed
- The policies governing what actions are permitted
- The exceptions that apply in the current situation
- The memory of how similar situations were handled before
- The confidence signal about data freshness and reliability
The interface changes from
SELECT metric FROM tableto something much closer to a business context API:
context = get_business_context(
metric="revenue",
customer_tier="enterprise",
exception_scope="current_campaign_window",
include_reasoning_history=True
)
# Returns: value + definition + lineage + applicable policies + past decisions
That's a different architectural bet than most teams are making today.
Key Takeaways
- The data stack was optimized for human interpretation — it's working as designed, but the reader changed. AI agents don't inherit the tacit map humans use to navigate data safely.
- The seams between tools are where agents fail — permissions in one place, definitions in another, exceptions in a Slack thread. Humans navigate these invisibly; agents hit them hard.
- Governance is contextual, not rule-based — a rule that can't express its own exceptions is not enough for an agent operating at scale.
- Reasoning memory is the missing layer — storing artifacts (tables, dashboards) is not the same as preserving the reasoning that produced decisions.
- The next data architecture adds a context plane — alongside the data plane and control plane, a first-class layer for semantic models, policies, memory, and governed reasoning.
The Question Worth Debating
If context needs to be "produced as work happens" — captured at the point of decision — who owns that? Data engineering? The domain teams making the decisions? Platform teams building agent infrastructure?
Most organizations aren't close to having a clear answer. And until they do, the agents will keep making technically correct but operationally wrong decisions.
Where is your team with this? Are you building a context layer, patching it with RAG, or still hoping the warehouse is enough?
Top comments (0)