Hello Dev community, Sutanto Ong here.
As we close out the 2025 trading year, I want to talk about Data Hygiene in financial modeling.
Today, December 30, the Jakarta market (JKSE) dropped -0.41% to 8,609. Yesterday, it rallied. If you are training a Machine Learning model on this week's data, you are likely introducing significant bias.
The "Window Dressing" Noise Yesterday's price action was artificial—driven by institutions marking up portfolio values for year-end reporting. Today's price action (Net Foreign Sell ~Rp728B) is the mean reversion.
If your algorithm treats Dec 29-30 as "normal market behavior," your backtest will fail in January.
Signal: Low.
Noise: High.
The "Clean" Data Points When traditional market data is corrupted by institutional flows, I look for decentralized signals.
Bitcoin ($87,246): This price is retail-driven and 24/7. It holds the $87k level regardless of banking holidays.
Gold (~$4,375): A global macro variable, less affected by local Jakarta fund flows.
My Code for 2026: I am adjusting my weights. I am reducing the significance of "End-of-Month" price data in my models due to the "Window Dressing" effect observed this week. Instead, I am increasing the weight of FDI Flows and Currency Stability (USD/IDR at 16,778) as leading indicators for the new "Fiscal Dominance" regime.
Takeaway: Don't just ingest data. Understand the incentives generating the data. Happy coding and Happy New Year.

Top comments (0)