Most factor libraries look reliable — at least at first.
They are cleanly implemented, mathematically consistent, and easy to plug into research workflows.
Backtests run smoothly. Results look stable. Explanations make sense.
And that’s exactly why they are dangerous.
Not because they are wrong,
but because they encourage a false sense of certainty long before real execution begins.
The gap between research code and execution systems
In research environments, factor computation usually assumes:
fully available, aligned data
stateless computation
deterministic execution paths
backtests that approximate live behavior
In real trading systems, none of these assumptions hold consistently.
Data arrives incrementally.
State persists across runs.
Execution order matters.
Backtests can never fully reproduce production constraints.
Factor libraries designed without these realities in mind often behave well in isolation —
while quietly drifting once embedded in live systems.
Why reproducibility fails in practice
Reproducibility in production is not about rerunning code and getting identical numbers.
It’s about being able to answer questions like:
Why did this factor change at this point?
What execution path produced this value?
Could the system have produced a different result under slightly different timing?
Many factor libraries cannot answer these questions because:
time semantics are implicit rather than enforced
rolling-window boundaries shift across executions
missing-data handling leaks into future information
execution paths reshape factor behavior
None of these issues are obvious during backtesting.
They only surface when systems operate continuously over time.
The most expensive failures are not performance failures
When factor behavior drifts, teams often respond by:
adjusting parameters
adding filters
retraining models
blaming market regime changes
These actions may temporarily stabilize performance,
but they rarely address the underlying issue.
The real problem is often this:
The system no longer trusts its own market description.
At that point, performance metrics become misleading,
and responsibility becomes difficult to assign.
This is not a math problem
Most factor libraries are mathematically correct.
Their failure is structural, not numerical.
They assume:
research computation equals execution behavior
calculability equals reliability
performance equals responsibility
Production systems must operate under different priorities:
auditability before elegance
reproducibility before convenience
execution responsibility before optimization
Libraries that ignore these constraints become latent risk sources.
Closing thought
Factor libraries don’t usually fail loudly.
They fail quietly —
by slowly training systems to trust results that can no longer be explained.
This is not a model problem.
It’s an execution responsibility problem.
Top comments (0)