Why I Built AnomalyArmor

#dataengineering #dataquality

I've done data engineering over the years at CJ, Savings.com, MySpace, Chegg, LinkedIn, Microsoft, One Medical, and AbnormalAI. The thing that's always stuck with me is how the job gets harder in a way that sneaks up on you.

When you build a pipeline, you're not just creating one thing to maintain. You're creating a machine that generates new things to maintain. Every run, every interval, every partition of data that pipeline produces becomes another touch point you're responsible for. One pipeline running hourly for a year is 8,760 data points you now own. Scale that across dozens of pipelines feeding into each other, and you've got an exponential maintenance problem.

This is the part nobody warns you about when you start in data engineering. The pipelines themselves aren't that hard. It's everything they produce that buries you.

The problem without a solution

I spent years looking for elegant tooling to handle this. Something that could watch all those touch points without requiring me to manually define what "good" looks like for each one. The solutions I found were either too simple (just run some SQL tests), too complex (six-week implementations that needed a dedicated admin) or too expensive (out of reach for our budget or company size).

What I wanted was analysis at scale. Limited human interaction to set up, comprehensive coverage across all my data, and smart enough to distill thousands of potential issues into a small set of things I actually needed to look at. Signal, not noise.

The hackathon that started it

A few years back I built a hackathon project around this idea. The core concept was automated statistical profiling: connect to a database, analyze the distributions, detect when something changed meaningfully, and surface only the stuff worth investigating. And do all this at scale with a little I/O as possible to achieve the desired outcome: does my data have any land mines in it?

It worked better than I expected. Not because the statistics were novel, but because it removed the manual effort. I didn't have to write a test for every column. I didn't have to define thresholds for every metric. The system figured out what normal looked like and told me when things deviated.

That project sat in a repo for a while. But the idea kept nagging at me.

Building for myself

AnomalyArmor came from recognizing voids in the industry that nobody was filling. The expensive enterprise tools were overkill for most teams. The lightweight open source options required too much manual configuration. There was a middle ground that didn't exist: something that worked out of the box, scaled with your data, and didn't cost a fortune.

I also just wanted better tooling for myself. Every data engineering job I've had, I've ended up building some version of this internally. Schema change detection scripts. Freshness monitoring cron jobs. Anomaly alerts cobbled together from Airflow sensors. AnomalyArmor is what all of that should have been from the start.

What it does

The pitch is simple: connect your database, get alerts when something's wrong.

Schema drift detection tells you when columns change before your pipelines break. Freshness monitoring tells you when tables stop updating before anyone asks why the dashboard is stale. Data quality metrics catch null spikes, distribution shifts, and anomalies before they corrupt your analytics. Lineage extends these offerings to give you a blast radius of what should be monitored, then does that monitoring for you.

Why $5 per table

I priced it at roughly half what competitors charge because I know what data team budgets look like. At 100 tables, you're paying $475 a month. That's affordable for a real team, not just enterprises with unlimited spend.

If AnomalyArmor saves you one fire drill per month, one late-night debugging session, one embarrassing "why are these numbers wrong" conversation, it's paid for itself.

Try it yourself

If you're tired of the exponential maintenance problem and want tooling that actually helps, sign up and connect your first database in under 5 minutes.

No sales pitch. Just see if it solves a problem you have.

— Blaine