Blaine Elliott

Posted on Apr 11 • Originally published at blog.anomalyarmor.ai

You Don't Need to Write Data Tests

#dataengineering #dataquality

Spend five minutes in any data engineering forum and you'll find the same confession repeated in different words: "We just eyeball row counts and pray." It shows up on Reddit, Hacker News, the dbt Community Forum, Stack Overflow. The phrasing changes but the story doesn't.

Data engineers know they should be testing. They're not skipping tests because they're lazy or because they don't understand the value. They're skipping tests because everything else in their environment conspires against it.

Why data engineers don't test

If you talk to enough practitioners (or read enough forum threads), the same reasons surface over and over:

Nobody gives them time. Organizations reward fast delivery, not reliable delivery. If decision makers don't prioritize testing, it never becomes a standard. The incentive structure actively punishes thoroughness. You get more credit for shipping a pipeline in two days than for spending a week making it bulletproof.

Data changes faster than tests can keep up. This is what separates data testing from software testing. Your code doesn't change overnight. Your data does. A source team renames a column. A third-party API changes its response format. A bulk operation shifts row counts by 40%. Tests written last month don't account for changes that happened last night.

Data quality is invisible until it breaks. The fundamental problem in data engineering is that a bad query still returns results. Results, but not necessarily correct ones. If nobody can see when things are broken, nobody builds the political will to prevent breakage.

Data is inherently hard to test. You can test code. Data is another story. Unit tests verify that your transformation logic works. They don't verify that the data you received is what you expected. These are fundamentally different problems, and the second one causes far more real-world failures.

Code testing vs data testing

This is the distinction the industry has been dancing around for years. Unit tests and data quality checks are different things, and conflating them is why most testing advice falls flat for data teams.

Unit tests verify your code does what you intended. They answer: "Does my transformation produce the right output given known input?"

Data quality checks verify the data you received is what you expected. They answer: "Did 50,000 rows actually arrive? Is the schema the same as yesterday? Are null rates within normal bounds? Did the distribution shift?"

In data engineering, the second category catches far more production failures than the first. Your dbt model can be perfectly correct and still produce garbage if the source data changed underneath it.

Most testing advice aimed at data engineers focuses on the first category. Write unit tests for your transformations. Test your SQL with fixtures. Use dbt tests. This is useful, but it misses the failures that actually page people at 3am.

"Make testing easier" is the wrong frame

The conventional wisdom is: testing is too hard, so let's make it easier. Better frameworks. Better test runners. Better dbt test macros. AI-assisted test generation.

That's genuinely helpful for teams that have the bandwidth to maintain a test suite. But it doesn't address the actual constraint. The problem isn't that testing is too hard. The problem is that testing is another thing to maintain in an environment where there's already not enough time.

Making tests 50% easier to write doesn't help when nobody has time to write them at all. And even if you find time to write them, data changes faster than tests can keep up.

The better frame: don't make testing easier. Make it unnecessary.

Automated data testing: tests you never write

Automated data testing flips the model. Instead of engineers defining what "correct" looks like for every table, the system learns what normal looks like and alerts when something deviates.

This covers the checks that catch the majority of real incidents:

Schema change detection. A column gets renamed, removed, or changes type. This breaks downstream models, joins, and dashboards. You don't need a handwritten test for this. You need a system that tracks schema state and alerts on any change.

Freshness monitoring. A table that updates every hour hasn't been touched in six hours. The pipeline didn't error. It just silently stopped. A system that learns update patterns and flags deviations catches this without any configuration.

Volume anomalies. A table that normally loads 100,000 rows per day suddenly loads 1,000. Or zero. Or 500,000. Anomaly detection against historical baselines catches this without anyone defining thresholds.

Distribution shifts. A column's null rate jumps from 2% to 35%. A numeric field's average drops by half. These are the subtle failures that pass a "did it run?" check but corrupt downstream analytics.

None of these require writing tests. They require connecting to your data warehouse and letting the system build baselines.

What this looks like in practice

You connect your Snowflake, Databricks, BigQuery, PostgreSQL, or Redshift warehouse. The system runs discovery: what tables exist, what schemas they have, when they typically update, what their normal row counts and distributions look like.

From that point, monitoring is automatic. Schema changes trigger alerts. Stale tables trigger alerts. Volume and distribution anomalies trigger alerts. All of this happens without writing a single line of test code.

When something fires, you get context: which table, what changed, when it changed, and which downstream assets are affected. The alert isn't "test failed." The alert is "the orders_fact table hasn't updated in 4 hours, and 12 downstream models depend on it."

This is what AnomalyArmor does. Five-minute setup, no test authoring, no test maintenance. It watches your warehouse and tells you when something looks wrong. The coverage scales with your warehouse, not with your team's bandwidth to write tests. See the quickstart guide to connect your first data source.

This doesn't replace all testing

To be clear: automated data testing doesn't eliminate the need for all handwritten tests. If you have specific business rules (revenue must be positive, email must contain @, every order must have a customer), those still need explicit validation.

But most data teams don't have any testing at all. They're eyeballing row counts and praying. For those teams, automated data testing provides 80% of the coverage with 0% of the authoring effort.

Start with automated monitoring. Add handwritten tests for your most critical business rules. That's the order that matches reality for time-constrained data teams.

The real question

The real question isn't whether every possible scenario has been tested. It's how much uncertainty your organization is willing to tolerate before it starts verifying the numbers it depends on.

For most data teams, the answer has been: a lot of uncertainty. Because the alternative was writing and maintaining tests they didn't have time for.

Automated data testing changes that tradeoff. The cost of coverage drops to near zero. The question stops being "can we afford to test?" and becomes "why aren't we?"

Sources

Joe Reis, 2026 State of Data Engineering Survey (2026). 1,101 respondents. Found data teams spend 34% of time on data quality, 26% on firefighting.
AnomalyArmor, Quickstart Guide. Connect your first data source and set up automated monitoring.
AnomalyArmor, Schema Monitoring Docs. How automated schema change detection works.
AnomalyArmor, Data Quality Monitoring Docs. Volume, distribution, and anomaly monitoring reference.

Automated Data Testing FAQ

What is automated data testing?

Automated data testing is software that continuously validates data without requiring engineers to write explicit test cases. It learns patterns from historical data (volume, schema, distributions, freshness) and alerts when new data deviates from those patterns. It's the opposite of manual test writing like dbt tests or custom SQL assertions.

How is automated data testing different from dbt tests?

dbt tests are deterministic rules you write manually: "this column is unique", "this foreign key exists". Automated data testing learns baselines from historical data and flags statistical deviations. dbt tests catch known problems. Automated testing catches unknown problems. Most production teams use both.

Do I still need to write data tests if I use automated testing?

Yes, for business-critical invariants. Some rules must be enforced explicitly: "revenue must never be negative", "user_id in orders must exist in users". Write these as dbt tests or validation rules. Use automated testing for everything else (statistical anomalies, freshness, schema changes, volume drops).

What can automated data testing detect that manual tests can't?

Automated testing catches things you didn't know to look for: a column's null rate drifting from 2% to 15% over two weeks, row count dropping by 30% on Tuesdays only, a new category appearing in an enum column, a schema change that silently returns NULL for one in a million rows. These are invisible to explicit rules unless you already anticipated them.

Why don't data engineers write more tests?

Three reasons. First, writing tests requires knowing what to test, and data changes faster than test coverage. Second, test maintenance scales linearly with the number of tables, so a team with 500 tables drowns in test code. Third, the ROI of manual tests is unclear until something breaks, so writing them feels like prevention against unknown risks.

How do automated data tests learn what's normal?

They compute baselines from historical data using statistical methods: running mean and standard deviation (often via Welford's algorithm), distribution fingerprints, seasonality models like Prophet, and moving averages. The baselines update incrementally as new data arrives. Most systems require 7-14 days of history before alerts start firing.

What's the false positive rate of automated data testing?

Well-tuned systems run at 5-15% false positive rates using z-scores with sensitivity thresholds of 2-3 standard deviations. Poorly tuned systems can exceed 50%. The key factors are: enough historical data to establish stable baselines, seasonality-aware models for data with weekly or daily patterns, and sensitivity tuning per table based on business criticality.

Can AI replace data engineers writing tests?

AI can configure and maintain monitoring based on patterns it learns from your data. It can't replace business logic validation. A data engineer still needs to specify what matters to the business. But AI removes the grunt work of writing 500 tests for 500 tables, which is where most test-writing effort is wasted.

What tools provide automated data testing?

Leaders in this space include AnomalyArmor, Monte Carlo, Metaplane, Bigeye, and Datafold. Each uses statistical methods to learn baselines and detect anomalies. Open-source options include re_data and Elementary. Traditional tools like Great Expectations require manual test writing but can be combined with profiling to semi-automate.

How much historical data do I need before automated testing works?

Minimum 7 days for basic z-score detection on daily data, 14 days for weekly seasonality detection, and 365 days for yearly seasonality. During the initial learning period, alerts should be suppressed or warnings only. Most tools have a "learning phase" flag that prevents false alerts until the baseline is stable.

Stop writing and maintaining data tests. See how AnomalyArmor's AI agent configures monitoring from a single sentence.

DEV Community