Luis M

Posted on Jan 21

Why Feature Stores Didn't Fix Training–Serving Skew

#machinelearning #rag #vectordatabase #mlops

Training–serving skew is still one of the most common failure modes in production ML.

Most teams already sense that feature stores didn't fully solve it. What's less clear is why.

The answer isn't poor implementation or missing features. It's that feature stores solve the wrong layer of the problem.

Skew is not caused by inconsistent definitions. Skew is caused by movement—every time a feature crosses a system boundary, execution context changes, and consistency becomes probabilistic rather than guaranteed.

If you've ever debugged a model that performed well in notebooks but degraded silently in production with no code changes, you've seen this failure mode. The code matched. The data didn't behave the same way.

The Promise Feature Stores Made

Feature stores promised consistent feature definitions, reusable transformations, and shared access between training and serving. On paper, this should eliminate skew.

In practice, most teams still see offline features that don't match online behavior, late or missing updates, and inference paths that quietly diverge from training logic. The issue is structural, not procedural.

Where Skew Actually Comes From

Consider a typical flow. Raw data lands in an application database. Features are computed offline and written to a feature store. Models train from one snapshot. Online serving reads from another. Inference runs in a separate service.

Even with a feature store in place, training and serving live in different execution contexts. Each context introduces different timing guarantees, different failure modes, different code paths, and often different owners.

Feature definitions match. Execution semantics do not. That gap is where skew lives.

An execution layer is where queries actually run—the query planner, the permissions model, the data access path. When training and serving share an execution layer, they share behavior, not just definitions. When they don't, consistency depends on coordination between systems that were never designed to coordinate.

Why Feature Stores Can't Close the Gap

Feature stores manage data artifacts. They do not control execution.

They cannot guarantee when a feature is computed, what version of logic ran, whether inference used the same transformation, or whether joins behaved the same way at training time versus serving time. As long as features move between systems, skew remains possible.

Most teams do not detect this. Accuracy degrades slowly. Nobody notices until business metrics slip, and by then the root cause is buried under weeks of commits and config changes.

The Execution Layer Is the Missing Piece

Skew disappears when training and serving share the same execution layer. That means the same query planner, the same permissions, the same data, and the same logic.

Features stop being artifacts that sync between systems. They become expressions evaluated at query time. Inference stops being a service call to an external system. It becomes part of data access. Similarity search stops being a separate infrastructure dependency. It becomes a filter clause.

This isn't theoretical. In practice, it looks like this: instead of computing embeddings offline, storing them in a vector database, and hoping the serving path fetches the right version, you store raw data once and compute the embedding inline when the query runs. Training and inference both execute the same transformation on the same data through the same engine.

A Concrete Contrast

Feature Store Pattern

Compute features offline. Store them separately. Recompute or fetch online. Hope consistency holds across systems and time.

Unified Execution Pattern

Store raw data once. Compute features inline at query time. Train and serve from the same source. Run inference where the data lives.

No synchronization jobs. No stale features. No silent divergence.

What This Changes for Teams

Debugging shifts from tracing requests across services to inspecting queries in one place. Experiments move to production without rewriting feature pipelines. Platform teams stop owning glue code that nobody wants to maintain. Training–serving skew becomes a visible failure with a stack trace, not a silent one that surfaces in quarterly metrics reviews.

This is not about removing tools. It is about removing unnecessary boundaries between systems that should never have been separate.

What This Means for ML Leaders

If your system has a feature store, a vector database, and a separate inference service, you still pay the coordination tax. Feature stores help with reuse and discovery. They do not fix architectural fragmentation.

Skew is an execution problem. Execution problems require execution-layer solutions.

This approach isn't free. It requires rethinking how you model features and where computation happens. Not every team is ready for that migration, and the transition cost is real. But for teams that have already felt the pain of debugging silent skew across five different systems, the tradeoff starts to look favorable.

I published concrete schemas and examples that show this approach in practice here:
https://synapcores.com/sqlv2

If you run ML in production, ask one question: do training and serving share execution, or just data definitions?

That answer explains most failures.

DEV Community