Blaine Elliott

Posted on May 11 • Originally published at blog.anomalyarmor.ai

Why Do Data Teams Use AI to Write Code but Not to Monitor Pipelines?

#ai

The AI gap in analytics engineering is a 48-percentage-point difference between how many data teams use AI to write code (72%) and how many use AI to monitor, test, or observe their pipelines (24%). It is the single most important structural finding in dbt's 2026 State of Analytics Engineering report, and it describes a reliability problem that will get worse before it gets better.

The short version: teams are building data pipelines faster than ever because AI writes the code, but nobody is paying proportional attention to whether those pipelines produce correct data. AI has been invited into the creation step. It has not been invited into the quality step. This post explains why that gap exists, what it costs, and what closing it looks like in practice.

What does the AI gap in data engineering mean?

The gap is measured in a single dbt survey across thousands of analytics engineers. The relevant numbers:

AI use case	2026 prioritization
AI-assisted coding (writing SQL, dbt models, scripts)	72%
AI-assisted pipeline management (testing, observability, quality controls)	24%
Delta	48 percentage points

The same survey also reported that 71% of teams are concerned about "hallucinated or incorrect data reaching stakeholders." So the industry is simultaneously: (a) accelerating pipeline creation with AI, (b) afraid of AI-caused data errors reaching business users, and (c) not using AI to catch those errors. That combination is not sustainable.

Why are data teams adopting AI-assisted coding first?

The creation side of the pipeline is where AI adoption is easiest and the benefit is most visible. Three reasons this happened first:

The loop is tight. A data engineer writes a dbt model, asks Copilot or Cursor to improve it, reads the result, commits. The feedback cycle is seconds. The developer sees the value immediately.

The failure mode is visible. If AI writes bad SQL, the query errors out or returns obviously wrong results at build time. Code failures are noisy, which makes them safe to accept AI help on.

The tools already exist. GitHub Copilot, Cursor, Claude Code, Codex CLI, and ChatGPT all hook into the IDE seamlessly. Writing code is a solved interface problem. Every AI coding tool competes on the same surface.

The productivity story is quantifiable. "I wrote this dbt model in 5 minutes instead of 30" is easy to measure and celebrate. Managers greenlight the tool because the demo is obvious.

None of those conditions hold for pipeline management.

Why is pipeline monitoring stuck at 24% AI adoption?

Pipeline management has none of the conditions that accelerated AI-assisted coding. It has the opposite of all of them.

The loop is slow. A monitoring system runs continuously and only fires alerts when something deviates. The value of "I caught this bad load" shows up hours or days after setup, not seconds. That makes the ROI story harder to sell internally.

The failure mode is silent. Unlike a broken query that throws an error, a bad data pipeline runs green, produces plausible-looking numbers, and nobody notices until a stakeholder asks why the dashboard is wrong. Data downtime is the metric that quantifies this invisibility. Teams often don't know they need AI help until they measure how much downtime they've accumulated.

The tools don't exist yet at scale. The data observability category has been dominated by dashboard-first products like Monte Carlo, Metaplane, and Bigeye. These tools use AI for isolated features (anomaly sensitivity tuning, alert summarization) but not as the primary interface. The AI-native equivalent of Cursor for data reliability has not reached critical mass.

The integration surface is bigger. AI-assisted coding needs to understand your file. AI-assisted pipeline management needs to understand your warehouse, your lineage, your historical baselines, your team's alert fatigue tolerance, and your on-call runbook. That is an order of magnitude more context.

The productivity story is inverted. "I prevented a bad load from reaching the dashboard" has a counterfactual quality that is harder to celebrate than "I wrote a new model in 5 minutes." The value shows up as incidents that didn't happen.

What are the consequences of the 72/24 asymmetry?

Three specific failure modes emerge when creation outpaces reliability.

Faster pipeline growth without proportional monitoring coverage. A team that ships 3x more models with AI assistance but does not scale its monitoring practice will have 3x more pipelines that can silently break. The blast radius per unaddressed failure grows because more downstream consumers depend on the pipeline.

Higher baseline rate of AI-introduced quality bugs. AI-generated SQL, especially under time pressure, produces plausible-looking queries that can silently miscount, drop edge cases, or misuse joins. A human reviewer catches some. Fresh monitoring would catch the rest. But if monitoring is the manual bottleneck while coding is AI-accelerated, the balance tilts toward bugs reaching production.

Decay of institutional knowledge about data correctness. When a human writes a pipeline, they usually have a rough model of what "correct" looks like for its outputs. When AI writes the pipeline, that model lives in the prompt, not in the engineer's head. The check against "did the data come out right?" needs to be externalized into automated monitoring, or it doesn't happen.

The dbt survey directly measures the anxiety this creates: 71% of respondents are concerned about "hallucinated or incorrect data reaching stakeholders." The fear is justified. The tooling hasn't caught up.

The AI Reliability Lag: a framework for thinking about the gap

The pattern in the dbt numbers repeats across other engineering disciplines. AI adoption in any creation step (writing code, designing, drafting) typically precedes AI adoption in the corresponding reliability step (testing, reviewing, monitoring) by 12-36 months. We can call this the AI Reliability Lag.

Stage	Creation-side AI	Reliability-side AI	Typical lag
Software engineering	Copilot, Cursor (mature)	AI test generation, AI code review (emerging)	18-24 months
Data engineering	Copilot for SQL, dbt copilots (mature)	AI observability, AI data quality (early)	24-36 months
Technical writing	ChatGPT for drafts (mature)	AI content fact-checking (rare)	24+ months
Design	Midjourney, Figma AI (mature)	AI design QA, accessibility checks (rare)	30+ months

The lag has the same root cause everywhere. Creation is a bounded, visible, immediate-reward task that any team can adopt unilaterally. Reliability is continuous, invisible, deferred-reward, and requires integration with infrastructure the team often doesn't own. The former spreads through pull; the latter requires push.

For data engineering specifically, the lag has an asymmetric cost. In code, a bug that slips past is caught by users, tests, or the next deploy. In data, a bug that slips past corrupts analytics, trains bad ML features, misinforms business decisions, and erodes trust in the platform. The data reliability step is more expensive to skip than the code reliability step.

What would AI-assisted pipeline management actually look like?

Four concrete capabilities, roughly in order of current maturity:

1. Automated anomaly detection with learned baselines

Instead of writing explicit assertions, you point the system at your warehouse and it learns what "normal" looks like for every table: update cadence, row count distribution, null rate, value ranges. When production data deviates from the baseline, it alerts. No threshold configuration required. This is the pattern covered in Data Anomaly Detection: The Complete Guide.

2. AI agents that set up monitoring from natural language

The user says: "watch the orders table for schema changes and alert the team on Slack." The agent translates intent into concrete monitors (schema snapshot cadence, diff rules, alert routing, severity), configures them, and reports back. No YAML. No rules engine. No documentation deep-dive. This is the paradigm AnomalyArmor ships today and what "AI-assisted pipeline management" should mean by default.

3. Natural-language data quality queries

A data engineer in an incident asks: "when did revenue_daily last update, and what changed in the schema in the past week?" The agent queries lineage, metadata, and audit logs and returns a structured answer. This replaces a 20-minute manual dig through INFORMATION_SCHEMA, dbt logs, and Slack channels.

4. AI-generated alert context and runbooks

When an alert fires, the system automatically summarizes: which table broke, what changed upstream, what downstream consumers are affected, what the fix usually looks like based on the team's incident history. The on-call engineer reads a two-paragraph brief instead of starting from a blank page at 3am. This is the difference between a 2-hour and a 20-minute TTR.

All four capabilities exist in nascent form somewhere in the market today. Only the first (automated anomaly detection) is close to mainstream adoption. The other three are where the next 24 months of category competition will happen.

How close the AI gap on your team today

Three practical moves that a data team can make in a week:

1. Measure your team's AI Reliability Lag. Calculate what percentage of your pipelines have any automated monitoring beyond orchestration success/failure checks. Most teams discover the number is shockingly low, often under 20%. That number is the size of the gap AI-assisted monitoring would close.

2. Pilot AI-assisted monitoring on one critical pipeline. Pick the pipeline that would hurt most if it silently broke (revenue, payments, top-of-funnel). Connect an AI-native monitoring tool (AnomalyArmor, Monte Carlo, Metaplane, Bigeye) and let it learn baselines for 7-14 days. Compare the alerts it generates against the manual checks you already have. The delta is the gap closing.

3. Measure data downtime before and after. The real metric is data downtime. Track TTD (time to detection) and TTR (time to resolution) for a month before and after introducing AI-assisted monitoring. Teams usually see TTD drop by 90%+ once detection is automated. That reduction compounds because fewer issues escape to stakeholders, which reduces the trust cost per incident.

The broader picture: 71% fear hallucinated data

The same dbt survey reported that 71% of data teams are concerned about hallucinated or incorrect data reaching stakeholders. This number sits uncomfortably next to the 72% AI-assisted coding adoption, because it implies that teams are already nervous about AI contributing to data bugs while simultaneously not using AI to catch those bugs.

Two forces are likely to close this gap over the next 12-24 months:

First, boards and executive teams will start asking about data reliability at the AI-speed of creation. "We accelerated our pipeline delivery with AI, why haven't we scaled our reliability investment?" will become a standard quarterly question. The answer "we haven't because the tools are new" has a short shelf life.

Second, as more AI-generated data errors reach stakeholders, the reputational cost of "the dashboard is wrong" will spike beyond what ambiguous ownership (41% cite this as a challenge) or data literacy gaps (36% cite this) cost today. When a board report is wrong because of AI, the board asks who is responsible. That escalation reshapes how much budget the reliability stack commands.

For context, dbt's report also showed that trust in data rose from 66% to 83% in importance year-over-year, and speed rose from 50% to 71%. Teams are asking their platforms for both, and the two usually fight each other. AI-assisted pipeline management is the only way to get both at once.

What this means for how data platforms should evolve

Three predictions based on the gap:

Observability will merge with assistants. The typical data observability product will shift from "open our dashboard" to "ask the AI in your IDE or Slack." The dashboard becomes secondary. Tools that cannot be operated from Claude Code, Cursor, ChatGPT, or Slack will get disintermediated.
Monitoring setup will become a prompt, not a process. The hours of click-through configuration that current data observability tools require will collapse to "watch my warehouse" and the agent handles the rest. Sub-10-minute time-to-value will be the minimum bar.
The $5/table price point will pull the category down. Enterprise-priced data observability (Monte Carlo, Bigeye at $50-150K/year) will lose share to AI-native tools that pass the savings of automation through to the customer. Monte Carlo's 30% layoff in April 2026 is probably the first public signal of this shift.

Data Engineering AI Gap FAQ

What is the AI gap in analytics engineering?

The AI gap in analytics engineering is the 48-percentage-point difference between how many data teams prioritize AI-assisted coding (72%) versus AI-assisted pipeline management (24%), per dbt's 2026 State of Analytics Engineering report. The gap describes an industry-wide pattern where AI accelerates pipeline creation but not pipeline reliability, leaving teams with faster-growing infrastructure that is no better monitored than it was before.

Why are data teams slower to adopt AI for pipeline management than for coding?

AI-assisted coding has immediate feedback (seconds), visible failure modes (code errors), mature tools (Copilot, Cursor, Claude Code), and a quantifiable productivity story ("I wrote this in 5 minutes"). Pipeline management has delayed feedback (hours to days), silent failure modes (pipelines run green while producing wrong data), immature AI-native tools, and an inverted productivity story (value shows up as incidents that didn't happen).

What is the AI Reliability Lag?

The AI Reliability Lag is the 12-36 month delay between AI adoption in a creation step (writing code, drafting content, designing) and AI adoption in the corresponding reliability step (testing, reviewing, monitoring). In data engineering specifically, the lag is 24-36 months and has a high cost because bugs that slip past creation silently corrupt downstream analytics.

How much does the AI gap cost in data downtime?

Teams without automated monitoring typically experience 100+ hours of data downtime per month. Teams with basic monitoring see 40-80 hours. Teams with full AI-assisted data observability target less than 4 hours. At a conservative $100/hour engineering cost and $1000/incident business impact, a team with 10 incidents per month can spend $20,000+/month on preventable downtime.

What does AI-assisted pipeline management actually do?

AI-assisted pipeline management does four things: (1) learns statistical baselines for freshness, volume, and distribution so monitors do not require manual thresholds; (2) accepts natural-language intent to set up new monitors ("watch the orders table for schema changes"); (3) answers natural-language questions about data state during incidents; (4) generates context and runbooks when alerts fire so on-call engineers can resolve faster.

Which tools count as AI-assisted pipeline management?

Data observability platforms with meaningful AI integration include AnomalyArmor, Monte Carlo, Metaplane, and Bigeye. Among these, AnomalyArmor is the most aggressively AI-native: an agent sets up monitoring from a prompt, and natural-language Q&A is a first-class interface. Monte Carlo, Metaplane, and Bigeye use AI for isolated features (anomaly sensitivity tuning, alert summarization) but retain a dashboard-first workflow. Compare the category in our data observability tools 2026 roundup.

Does AI-assisted pipeline management replace dbt tests?

No. dbt tests catch what you anticipate (known constraints, specific business rules). AI-assisted pipeline management catches what you don't anticipate (unexpected schema changes, silent volume drops, distribution shifts). The two are complementary. Teams that maintain dbt tests for critical invariants and use AI-assisted monitoring for baseline coverage get both rule-based and statistical protection, which is a stronger posture than either approach alone. For more on this, see You Don't Need to Write Data Tests.

What is the difference between AI-assisted coding and AI-assisted pipeline management?

AI-assisted coding helps you write new queries, dbt models, and scripts faster. It operates at build time, before data flows through the pipeline. AI-assisted pipeline management operates at run time, after data flows through the pipeline. It watches for freshness, volume, schema, and distribution anomalies and alerts when production data deviates from expected patterns. Creation-side AI makes pipelines. Reliability-side AI keeps them alive.

Will AI replace data engineers?

No, but it will shift what data engineers spend time on. Teams that adopt AI-assisted pipeline management typically reallocate time from manual monitoring configuration and incident triage to higher-leverage work: data contracts with upstream teams, lineage hygiene, and domain modeling. The role shifts from firefighter to architect. This matches the broader industry pattern where 60% of data engineering time currently goes to firefighting, per multiple surveys.

How long does it take to close the AI gap on one team?

The minimal pilot is 7-14 days: connect an AI-native monitoring tool to one critical pipeline, let it learn baselines, and measure alert accuracy against your existing manual checks. Full rollout across a data platform usually takes 4-8 weeks, gated by the number of warehouses, tables, and integration points. Most teams see TTD drop 90%+ within the first month of serious adoption.

What is the simplest way to start?

Start by measuring your team's current TTD and TTR. Without baseline numbers, you cannot prove improvement. Then pick one pipeline (usually the one that hurts most if it breaks silently) and connect an AI-native monitoring tool. Compare the alerts it generates against manual checks for two weeks. If the automated alerts catch something the manual checks missed, the gap is real and closing it is worth scaling. If not, the gap is not your problem yet.

Is this about analytics engineering specifically or all of data?

The dbt survey focused on analytics engineering, but the pattern generalizes to any data-producing discipline: data engineering, ML engineering, data platform teams. Wherever AI is accelerating creation without proportional investment in reliability, the AI Reliability Lag applies. Analytics engineering is just where the numbers happen to be published and measurable.

Where can I read the original dbt survey?

The full report is at getdbt.com/resources/state-of-analytics-engineering-2026. The specific AI adoption numbers (72% creation, 24% monitoring) and the 71% hallucination concern are in the AI adoption section.

The AI gap closes when your data platform adopts AI for reliability, not just for creation. See how AnomalyArmor's AI agent sets up freshness, schema, and anomaly monitoring from a single prompt.

DEV Community