If you use dbt, you probably have some tests. A few not_null checks, maybe unique on your primary keys, possibly some accepted_values on status columns.
But be honest: how many of your columns actually have tests? 10%? 20%?
The rest are untested. Not because you don't care, but because writing test YAML for 200 columns across 40 models is tedious work that never makes it to the top of the sprint.
The gap in dbt testing
dbt tests are rule-based. You write a rule, it checks that rule. If you didn't write a rule, nothing gets checked. This creates three blind spots:
1. Drift goes undetected.
Your email column had 0.1% nulls last month. Today it's 12%. No dbt test catches this because you never wrote one that says "null rate should stay below X%." You find out when a PM asks why the marketing numbers look off.
2. Structural changes slip through.
A column gets dropped upstream. A type changes from integer to text. dbt won't tell you unless you wrote a test for that specific column. By the time your Spark job fails, the damage is downstream.
3. Nobody tests what they don't know about.
Orphaned foreign keys, outlier values 10x beyond normal range, columns that are technically "not null" but 40% empty strings. These are real problems in real databases that nobody writes tests for because they don't know they exist until something breaks.
What if dbt tests wrote themselves?
That's what I built. dbt-dqlens profiles your models and generates the test YAML for you.
After your normal dbt run:
pip install dbt-dqlens
dqlens-dbt profile # profiles all models using your dbt connection
dqlens-dbt generate-tests # outputs _dqlens_tests.yml
It reads your profiles.yml, connects to the same warehouse dbt uses, profiles every column (nulls, uniqueness, distributions, patterns, foreign keys, percentiles), and generates native dbt tests based on what it finds.
The output is a standard schema.yml file you commit to your repo:
models:
- name: orders
tags: [dqlens]
columns:
- name: id
tests:
- unique
- not_null
- name: email
tests:
- dqlens_no_null_drift:
baseline_pct: 0.1
threshold_multiplier: 3
- name: amount
tests:
- dqlens_no_outliers:
lower_bound: -110.0
upper_bound: 210.0
- name: customer_id
tests:
- dqlens_no_orphans:
target_model: ref('customers')
target_column: id
Then dbt test --select tag:dqlens runs them as native dbt tests. They show up in dbt docs, dbt Cloud, your CI pipeline. Nothing changes about your workflow except now you have tests you didn't write.
What it catches that dbt tests don't
| Problem | Standard dbt test | dbt-dqlens |
|---|---|---|
| Null rate increased 10x from last week | No (unless you wrote a threshold) | Yes (baseline comparison) |
| Column dropped upstream | No | Yes (schema drift detection) |
| FK references non-existent rows | Only with relationships test (manual) | Yes (auto-detected from schema) |
| 40% empty strings masquerading as "not null" | No | Yes (empty string rate check) |
| Values 10x beyond normal range | No | Yes (IQR-based outlier detection) |
| Column type changed | No | Yes (type change detection) |
It's behavior-based, not rule-based
The key difference: dbt tests check static rules you defined. dbt-dqlens checks behavior. It learns what your data looks like (the baseline) and flags when something changes.
You don't define thresholds. It computes them from your data. If your email column is normally 0.1% null and jumps to 5%, that's a finding. If your orders table normally grows 2-5% daily and suddenly jumps 50%, that's a finding.
This is the kind of check nobody writes by hand because you'd need to know the baseline first. The tool knows it because it profiled your data.
Try it
pip install dbt-dqlens
dqlens-dbt run # profiles + generates tests in one step
dbt test --select tag:dqlens
It reads your existing profiles.yml. No new connections to configure. Works with PostgreSQL today, more databases coming.
GitHub: github.com/vahid110/dbt-dqlens
The core engine (DQLens) also works standalone if you don't use dbt. Same profiling, same detection, just a CLI instead of dbt integration.
If you've been meaning to add data quality tests but never found the time, this is the shortcut. Three commands, zero YAML writing, and you get coverage you never had.
Top comments (0)