tanvi Mittal for AI and QA Leaders

Posted on Feb 22

Anatomy of a Schema Drift Incident: 5 Real Patterns That Break Production

#devops #webdev #softwareengineering #apitesting

Part 2 of 6 in The Silent API Killer series
Estimated read time: 12 minutes

In Part 1, I showed you a single API response that silently broke in five different ways after a "minor backend refactor" and how zero tests caught it. I showed you the three lies your API tests tell you every day. I shared numbers from a real audit where 23 out of 47 endpoints had undocumented structural changes while the test suite reported 100% passing for six months straight.
That post resonated with a lot of people. The most common response I got? "This literally happened to us last sprint."
So if Part 1 was the why should I care, this post is the what exactly am I looking for.
Over time, I've noticed that schema drift isn't random. It follows a small number of recurring patterns five, specifically that account for the vast majority of silent API breakages.
Each pattern has a distinct root cause, a distinct failure mode, and a distinct reason why conventional testing misses it. Once you learn to recognize them, you'll start seeing them everywhere. That's both the good news and the bad news.
Let's dissect them one by one.

Pattern #1: The Type Shift
Severity: Breaking
Detection difficulty: Moderate
How often I see it: In nearly every audit
This is the most common and most dangerous form of schema drift. A field that has always been one type silently becomes another. The value looks correct. The type is wrong.
Before (Monday):

json{
  "order_id": 90871,
  "total": 49.99,
  "quantity": 3,
  "is_express": true
}

After (Thursday, post-deployment):

{
  "order_id": "90871",
  "total": "49.99",
  "quantity": 3,
  "is_express": "true"
}

Three fields changed type. order_id went from number to string. total went from number to string. is_express went from boolean to string. Only quantity survived.
Why it happens
This is almost always a serializer or ORM change. The most common triggers I've seen: database migrations where a column type changes (INT to BIGINT, which some JSON serializers render as a string to avoid precision loss), ORM upgrades that change default serialization behavior, or a switch between JSON libraries where the default handling of numeric types differs.
The treacherous part is that the values look identical to a human. 49.99 and "49.99" display the same way in a log file, in Postman, in your terminal. You'd never spot this in a manual review. You'd only notice when total + tax gives you "49.990.08" instead of 50.07.
If you remember from Part 1, this is exactly what happened with the user_id field an integer became a string after a routine database migration, and the app's type-strict parsing layer rejected it. Same pattern. Same silence. Same blank screens.

Why tests miss it
Most API test assertions check values, not types. expect(response.total).toBe(49.99) will pass if the test framework does loose comparison. Even strict equality might pass depending on how the framework coerces types during comparison. JSON Schema validation would catch this if the schema is current. But as we established in Part 1, the schema is almost never current.

The real-world damage
At one fintech company I worked with, a type shift on a balance field from number to string caused their reconciliation service to concatenate balances instead of summing them. A customer's balance of 1500 + 2300 became "15002300" in an internal ledger. The system didn't throw an error it just produced silently wrong financial data. It took four days to detect and two weeks to clean up. In fintech. With real money.

Pattern #2: The Silent Disappearance
Severity: Breaking
Detection difficulty: Easy (if you're looking but nobody is)
How often I see it: Monthly at most organizations
A field that consumers depend on simply stops appearing in the response. No deprecation notice. No versioned endpoint change. It's just... gone.
Before:

{
  "user": {
    "id": 442,
    "name": "Marcus Rivera",
    "email": "marcus@example.com",
    "department": "Engineering",
    "manager_id": 118
  }
}

After:

{
  "user": {
    "id": 442,
    "name": "Marcus Rivera",
    "email": "marcus@example.com"
  }
}

department and manager_id: gone. No error. Just absent.

Why it happens
Three common root causes. First, the backend team refactored the data model and removed columns they believed were unused. They weren't they just weren't used by the backend. The frontend and partner integrations were using them. Nobody asked.
Second, the ORM's eager loading configuration changed. Related data that used to be included automatically is no longer fetched. The serializer still tries to include those fields, but since the data isn't loaded, they're silently omitted or the serializer skips them.
Third, and this one is particularly insidious a permissions or authorization change means the API still returns the field for some users but not others. Your test user has admin privileges and still sees department. Your production users with a regular role don't. Your tests pass. Their experience breaks.

Why tests miss it
This is embarrassing but true: most test suites don't assert the absence of unexpected omissions. They assert that specific fields have specific values. If the test says expect(response.user.name).toBe("Marcus Rivera"), it passes because name is still there. Nobody wrote expect(response.user).toHaveProperty("department") because why would you? It was always there. You don't write assertions for gravity.
This is the core problem I highlighted in Part 1 as Lie #1: "All assertions passed, so the API is fine." Your assertions test what you thought to check. They are structurally incapable of detecting removals you didn't anticipate.

The real-world damage
An HR SaaS company I consulted for had an internal API that returned employee data including manager_id. Their org chart visualization depended on it. After a data model refactor, manager_id was removed from the default serializer but kept in a separate endpoint nobody told the frontend team about.
The org chart started showing every employee as a top-level node with no reporting hierarchy. It shipped to production on a Friday afternoon. The CEO opened the org chart Monday morning and saw a completely flat organization every single employee reporting to nobody.
The post-mortem was... uncomfortable.

Pattern #3: The Nullable Surprise
Severity: Warning, escalates to Breaking depending on consumer
Detection difficulty: Hard
How often I see it: Constantly, but rarely diagnosed correctly
A field that has historically never been null starts returning null for certain records or under certain conditions. The field is still present. Its type is still technically correct. But the nullability contract has changed.
Before (response for every user, always):

{
  "profile": {
    "id": 7821,
    "display_name": "Sarah Kim",
    "avatar_url": "https://cdn.example.com/avatars/7821.jpg",
    "bio": "Backend engineer at Acme Corp"
  }
}

After (response for a newly registered user):

{
  "profile": {
    "id": 7822,
    "display_name": "New User",
    "avatar_url": null,
    "bio": null
  }
}

avatar_url and bio are now nullable. They weren't before or at least, they never were null in any response your system had previously received.

Why it happens
This is one of the most natural forms of drift because it's usually a data change, not a code change. When the API was first built, every user in the database happened to have an avatar and bio maybe they were required during onboarding. Then the product team made those fields optional in the registration flow. The API code didn't change. The serializer didn't change. The data changed and with it, the de facto nullability contract.
Other triggers I've seen: a new data source starts feeding the API where some records have null values the old source never had. An admin creates test records with missing fields. A cleanup migration sets existing values to null. Or my favorite a cache expiry returns a partially hydrated object where optional fields haven't been populated yet.

Why tests miss it
This is the hardest pattern to catch because it's conditional. Your test fixtures have complete data. Your test user has an avatar. Your staging environment has well-formed records. The null only appears in production, for a specific subset of users, under specific conditions that your test environment never reproduces.
JSON Schema can technically catch this if you specified "nullable": false but most schemas don't specify nullability explicitly because when the schema was written, every record was non-null. And auto-generated schemas from existing data won't flag it because the data at the time of generation was never null.
This directly connects to what I said in Part 1 about the null=True parameter on a Django model field. The code change is tiny one parameter. The schema impact is invisible until the right data condition triggers it. And that trigger might be weeks or months away.
The real-world damage
A mobile app I was testing had a user profile screen that rendered the avatar_url into an component. When avatar_url was a string, it worked perfectly. When it was null, the component didn't crash it made an HTTP request to literally the URL null, which returned a 404, which the error handler logged as a "network error."
The error logs for this app showed 50,000+ network errors per day. The team spent three weeks investigating CDN issues, load balancer configurations, and DNS resolution problems before someone realized the "network errors" were all GET null requests from a null avatar URL.
The fix was two lines of code. A null check. That's it.
The investigation cost three engineering-weeks because nobody thought "the field we've always received might suddenly be null" was even a possibility.

Pattern #4: The Structural Reshape
Severity: Breaking
Detection difficulty: Easy (the change is dramatic)
How often I see it: During major refactors and API version transitions
The overall structure of the response changes fields move from flat to nested, nested objects get flattened, arrays become objects, or an entirely new wrapper appears around the data.

Before:

{
  "id": 331,
  "name": "Acme Widget",
  "price": 29.99,
  "category": {
    "id": 5,
    "name": "Electronics",
    "parent": {
      "id": 1,
      "name": "All Products"
    }
  },
  "tags": ["sale", "featured"]
}

After:

{
  "data": {
    "id": 331,
    "name": "Acme Widget",
    "price": 29.99,
    "category_id": 5,
    "category_name": "Electronics",
    "tags": "sale,featured"
  },
  "meta": {
    "api_version": "2.1",
    "deprecated_fields": []
  }
}

Count what happened. The entire response got wrapped in a data envelope. The nested category object was flattened into category_id and category_name (and parent is gone entirely). The tags array became a comma-separated string. A new meta wrapper appeared.
This is the API equivalent of someone rearranging all the furniture in your house while you were sleeping. Everything is still there. Nothing is where you expect it.
Why it happens
Structural reshapes usually come from intentional decisions someone is "improving" the API. Common triggers: adopting a response envelope pattern (wrapping everything in { "data": ..., "meta": ... }), migrating from REST-style to JSON:API or GraphQL-style responses, normalizing the database and reflecting that in the API (denormalized nested objects become flat ID references), or a full framework migration where the new framework's default serialization produces a different shape.
The irony is that these changes are usually improvements. The new structure is often cleaner, more consistent, better designed. But "better" doesn't matter if every consumer expects the old shape and nobody told them.
Why tests miss it
Structural reshapes are actually the easiest drift to catch if you have structural validation. The shape change is dramatic enough that basic assertions should fail. The reason they still slip through is timing: these changes are often deployed behind feature flags or to new API versions, and the actual drift happens when the flag is accidentally enabled for all traffic, or when consumers are silently redirected to the new version without updating their parsing logic.
I've also seen cases where the reshape only affects certain endpoints, and the test suite only covers a subset of them the ones that didn't change. Coverage gaps meet structural drift, and production loses.
The real-world damage
An e-commerce platform migrated their product API from a flat structure to an envelope pattern ({ "data": ... }). They updated their web frontend. They updated their Android app. They forgot their iOS app, which was maintained by a different team in a different time zone.
The iOS app parsed response.name directly. After the reshape, name lived at response.data.name. The old path returned undefined. Every product page on the iOS app showed blank titles for 11 hours until the other team woke up, figured out what happened, and deployed a fix.
Eleven hours of an e-commerce app showing products with no names. During a sale event.

Pattern #5: The Phantom Addition
Severity: Info, escalates to Warning or Breaking
Detection difficulty: Low (it's usually harmless... until it isn't)
How often I see it: Constantly this is the most frequent drift type by far
New fields appear in the response that weren't there before. The existing fields are unchanged. Nothing is removed. Nothing changes type. There's just... more data than before.
Before:

{
  "invoice_id": "INV-2024-0891",
  "amount": 1250.00,
  "status": "paid",
  "customer_id": 445
}

After:

{
  "invoice_id": "INV-2024-0891",
  "amount": 1250.00,
  "status": "paid",
  "customer_id": 445,
  "payment_method": "credit_card",
  "stripe_payment_id": "pi_3Ox2...",
  "internal_notes": "Flagged for review - possible duplicate",
  "customer_ssn_last4": "7291"
}

Four new fields. Three of them are fine normal feature additions. One of them (customer_ssn_last4) is a PII leak that should never appear in this response.
Why it happens
Field additions are the natural byproduct of feature development. Every time a developer adds a column to the database and the serializer isn't locked down meaning it serializes all fields by default rather than an explicit whitelist the new field automatically shows up in the API response.
Most backend frameworks default to "serialize everything." Django REST Framework with model serializers, Rails Active Model Serializers with attributes :all, Express handlers that just do res.json(model) directly. This is convenient during development. It's a ticking time bomb in production.
This is why phantom additions are the most frequent drift type. Every feature sprint adds database columns. Most of them flow through to the API silently. Most are harmless. But occasionally, one of them is sensitive data that was never meant to be exposed through this endpoint.
Why tests miss it
Additions don't break anything. No assertion fails. No consumer crashes. The extra data is simply ignored by every consumer that doesn't know about it the JSON spec doesn't require parsers to reject unknown fields. The drift is completely invisible to every test and every consumer.
But the security problem is real. Your API test suite asks: "Are the fields I expect present and correct?" It doesn't ask: "Are there any fields here that shouldn't be here?" Those are fundamentally different questions, and almost nobody is asking the second one.
The real-world damage
I can't share the specifics of the most severe incident I've seen with this pattern, but I can tell you the shape of it: a healthcare API that returned patient data started including an unmasked Social Security number field after a database model change. The field was present for six weeks before a security audit caught it. Six weeks of SSNs returned in every patient lookup, logged by every monitoring tool, cached by every CDN node, stored in every client's local log files.
The HIPAA compliance remediation took months.
Even without the security angle, phantom additions matter because they're a canary in the coal mine. A new field appearing tells you the data model changed. If the data model changed, other things might have changed too things that are breaking. Tracking additions is how you catch drift early, before the breaking patterns follow.

The Drift Severity Matrix
Now that we've seen all five patterns, let's formalize them into a classification system. This is the framework I use when assessing schema drift in any API audit:

This isn't just academic taxonomy. If you were to build a tool that detects schema drift automatically, this severity matrix is exactly how you'd decide which changes block a deployment, which generate a warning in Slack, and which get logged quietly for awareness.
The key insight: severity isn't always fixed. A phantom addition is usually informational, but if the new field exposes sensitive data, it's a security incident. A nullable surprise is usually a warning, but if the consumer does response.avatar_url.replace(...) without a null check, it's a runtime crash. Defaults save time. Context determines reality.

The Compound Effect
Here's what makes real-world incidents so expensive: drift rarely shows up as a single isolated pattern. In practice, I see multiple patterns hitting simultaneously.
Remember the example from Part 1? The "minor backend refactor" that changed five things at once?

// Before (Monday)
{
  "id": 4521,
  "name": "Alice Chen",
  "role": "admin",
  "created_at": "2024-01-15T09:30:00Z",
  "team": { "id": 12, "name": "Platform" },
  "permissions": ["read", "write", "delete"]
}

// After (Friday)
{
  "id": "4521",
  "name": "Alice Chen",
  "roles": ["admin"],
  "created_at": 1705307400,
  "team_id": 12,
  "permissions": ["read", "write", "delete"],
  "metadata": {}
}

Now that we have the vocabulary, we can classify exactly what happened:

Type Shift: id changed from number to string, created_at changed from ISO string to Unix timestamp integer
Silent Disappearance: role removed, team nested object removed
Structural Reshape: team object flattened to team_id, role string became roles array
Phantom Addition: metadata appeared

Four of the five patterns in a single deployment. Each one breaking different consumers in different ways. Each one requiring a different investigation path to diagnose. Each one invisible to the test suite.
This is why schema drift incidents average 3.5 days to resolve, as I mentioned in Part 1. It's never one broken thing. It's a compound fracture.

How to Start Spotting Drift Today (No Tools Required) We'll talk about tooling in depth starting in Part 3. But here are three things you can do this week right now to start catching drift in your own systems:
Save a response snapshot today. Pick your three most critical API endpoints. Call them. Save the raw JSON response to a file with today's date. Next week, call them again. Diff the two files not the values, the keys and types. You might be surprised at what you find.
Add one structural assertion. Pick a single API test in your suite. Add an assertion that checks the set of top-level keys:

expect(Object.keys(response).sort())
  .toEqual(["created_at", "id", "name", "permissions", "role", "team"])

It's crude. It works. When a field gets added or removed, this will scream.

Audit your serializer. Open your backend's API serializer configuration. Is it using a whitelist of fields, or is it serializing all model attributes by default? If it's the latter, you're one database migration away from a Phantom Addition possibly the sensitive kind. These are Band-Aid measures. They won't scale. But they'll tell you immediately whether drift is happening in your system. In my experience, the answer is always yes.

What's Next
Now we have the vocabulary. We can name the five patterns. We can classify their severity. We can recognize them in the wild.
The natural next question is: why isn't anyone catching these automatically?
It's not that people don't care. It's that the current tooling landscape has a fundamental gap. Every existing API testing tool either requires something you don't have (an up-to-date OpenAPI spec), demands something you can't give (both API sides adopting a contract testing framework), or solves an adjacent problem (data pipeline drift, not HTTP API response drift).
In Part 3, I'm going to evaluate every major API testing tool through the lens of these five drift patterns Pact, Schemathesis, Dredd, Postman, JSON Schema validators, OpenAPI validators, and more. For each tool, I'll walk through: which of the five patterns can it catch? Which ones does it miss? And why?
Fair warning if you're a fan of any of these tools (and you should be, they're genuinely good tools), Part 3 isn't about bashing them. It's about being honest about what they were designed to solve versus the specific problem we've been dissecting in this series.
The gap is real. And once you see it, you can't unsee it.

If you missed Part 1, start there. It lays the foundation for everything in this series.
If this post helped you name a pattern you've experienced but couldn't articulate share it with your team. Half the battle is having a shared vocabulary for the problem.

Top comments (4)

FlareCanary • Apr 14

Great taxonomy of the patterns. The "phantom field addition" one is underappreciated — most teams ignore new fields, but they can signal an upcoming structural migration. We've seen cases where a new v2_ prefixed field appears weeks before the original field gets deprecated.

The nullability shift is probably the sneakiest in practice. Especially in TypeScript codebases where the type says string but production starts returning null for inactive records. Your code compiles, your tests pass (because test fixtures don't include the null case), and then production blows up on .toLowerCase().

Have you looked at multi-sample baseline approaches for reducing false positives? Instead of treating the first response as ground truth, sampling across multiple requests helps distinguish genuinely optional/conditional fields from actual drift. Cuts false positives significantly.

tanvi Mittal AI and QA Leaders • Apr 15

multiple samples as baseline would be a great addon, thanks for suggesting

FlareCanary • Apr 15

Thanks! Yeah, the single-sample baseline is the biggest source of false positives in practice. The trick is determining how many samples you need — we've found that 3-5 requests spread across different times of day usually captures most conditional fields (things like discount fields that only appear for certain account types, or metadata that varies by region).

The other piece that helps is tracking field frequency — if a field appears in 4 out of 5 samples, it's probably always-present and its absence is drift. If it appears in 1 out of 5, it's conditional and should be treated as optional in the baseline. Simple heuristic but it eliminates a lot of the noise.

Would be curious to hear if you've seen patterns in which of your 5 drift categories is hardest to detect automatically. From what I've built, the nullability shifts are the trickiest because they're often data-dependent — the field is only null for specific record states.

Ojus Chugh • Mar 29

This series is describing exactly why I built Etch. I hit the same problem - tools compare values, not structure, and the type change from integer to string is the one that kills you. Built a zero-config proxy that does structural comparison with severity classification. Would love your feedback as someone who clearly understands this problem deeply. github.com/ojuschugh1/etch