DEV Community

Your API Tests Are Lying to You, The Schema Drift Problem Nobody Talks About

tanvi Mittal on February 19, 2026

Last month, I watched a production incident unfold at a company I was consulting for. Their mobile app started crashing for roughly 30% of users. T...
Collapse
 
rockgecko_dev profile image
Rockgecko

Why do you manually write the openAPI spec? If there's drift, how do you consume new endpoints in the client?

I just have the BE generate it automatically so it's never out of date.
My FE deployment pipelines fetch the latest spec, then generate the client code and compile. If there's a build error, output a diff of the schema changes so it's immediately clear what field or endpoint changed (eg int to string would fail, int to long would likely pass fine).
This works for native apps, Flutter, typescript Web projects etc.

You can do the same for third party services, whether they publish openAPI or an sdk.

Collapse
 
tanvi_mittal_9996387d280 profile image
tanvi Mittal AI and QA Leaders

Great setup, auto-generated specs with client codegen and build-time validation is genuinely one of the strongest workflows out there. If you have that running end-to-end, you're ahead of 90% of teams I've worked with.
That said, here's where the problem still lives even with that pipeline:

  1. Not everyone controls the backend. If you're consuming Stripe, Twilio, a partner API, or any third-party service that doesn't publish an OpenAPI spec (and many don't or publish one that lags behind reality), you have no spec to generate from. You're relying on their changelog, which... good luck.
  2. Auto-generated specs reflect code, not intent. If a developer adds null=True to a Django model field or changes a serializer, the auto-generated spec will update but nobody decided to make that a public contract change. The build might pass (nullable string is still a string) while every consumer that doesn't handle nulls breaks silently at runtime. The spec changed, the codegen passed, and you still have a production incident.
  3. Build errors only catch type incompatibilities, not structural drift. A new field appearing? Build passes. A field going from required to optional? Depends on the generator. An array that used to always have items now sometimes returns empty? Build passes. These are exactly the drifts that cause runtime bugs and they're invisible to compile-time checks.
  4. The "generate client + compile" approach assumes typed languages. Works beautifully for TypeScript, Flutter, Swift. Doesn't help Python, Ruby, or any dynamically typed consumer which is still a huge chunk of the ecosystem, especially on the QA/automation side. Your workflow is the right answer for the teams that can implement it. The problem I'm describing hits hardest at the teams that can't which in my experience is the majority. Different layers of the same problem. Would be curious though in your pipeline, when the build does fail on a schema diff, how do you handle the decision of "is this an intentional API change or an accidental one"? That triage step is where I see even good pipelines struggle
Collapse
 
flarecanary profile image
FlareCanary

This nails a problem I've been experiencing for years. The "green tests, broken API" scenario - especially brutal with third-party APIs you don't control — you can't even add the validation to your CI pipeline because you're not the one deploying the changes.

One approach we're been exploring is continuous live monitoring — polling endpoints on a schedule and comparing the actual response structure against a learned baseline. Not just "are the expected fields present" but also "did any types change? Did nullability shift? Are there new fields that might indicate a structural migration?"

The severity classification piece is key too. Not every schema change is a fire. A new optional field is informational. A type change from string to string|null is a warning. A removed field is breaking. Without that classification, you drown in noise.

Curious whether your team found that the 23/47 drift cases were mostly additive (new fields) or mostly breaking (removed/changed fields)? In the data I've seen, the split tends to be ~60% additive / ~40% breaking-or-warning. Most changes aren't tragic, but still worth knowing about.

Collapse
 
cyber8080 profile image
Cyber Safety Zone

This article really hits an under‑the‑radar issue! Schema drift can quietly break your API tests without you noticing, and it’s one of those problems that feels obvious in hindsight. Love the practical examples and reminders to keep test suites aligned with real API changes — definitely something every API team should be watching for. Thanks for highlighting this!

Collapse
 
appdeviq profile image
Vignesh

This resonates — we hit this exact problem with data payloads, not just API responses. Fields going null, types changing, cardinality shifting. We built a screening layer that sits at the ingest boundary and compares every batch against a stored schema fingerprint (SHA-256 of sorted field names + types). First batch sets the baseline, every subsequent batch is compared. Drift events get classified: new field = WARN, type change = BLOCK, null spike = WARN/BLOCK depending on severity.

The "compared to what?" problem you mention is key — we solve it by storing baselines per-source with EMA-smoothed null rate history, so gradual drift gets caught too.

If anyone's interested in the implementation: github.com/AppDevIQ/datascreeniq-p...

Collapse
 
denys_adamov profile image
Denys Adamov

Okay about type change from int to string, it may be missed, although may be caught by schema validation.
But what the tests do if they don’t fail when property name changes or type is switched to array? 🫨🤔

Collapse
 
tented profile image
Tented

nice!