How Analyzing Stock Market Data Taught Me What Time Series Textbooks Couldn't

#datascience #machinelearning #python

I spent a semester analyzing OGDC — Pakistan's largest oil and gas company — on the Pakistan Stock Exchange. Financial data breaks every assumption you were taught to rely on. This is what that actually looks like in practice.

Classroom Time Series: Deceptively Plain
When you first learn time series analysis, you get clean examples. A seasonal temperature dataset. A sales series that trends upward. The examples are chosen because the methods work on them.
Real financial data doesn't do that.
The first thing I did was plot OGDC's daily returns and run a normality test. The Jarque-Bera statistic came back at 172,348. The p-value was effectively zero. Excess kurtosis was 41.7.
A normally distributed series has kurtosis of 0. A kurtosis of 41.7 means the tails are so fat that standard deviation becomes a nearly meaningless risk metric. Extreme daily moves — the kind that would be essentially impossible under a Gaussian model — were happening regularly.
That number forced me to actually understand what kurtosis means instead of just knowing it's the fourth moment of a distribution. There's a difference.

Stationarity Is Not Academic
Every time series textbook starts with stationarity. Run the ADF test, check the p-value, and proceed. I did this robotically for two years before working with financial data made it concrete.
OGDC's price series has a unit root — it's non-stationary. The returns series is stationary. This distinction matters enormously because:

You cannot apply ARIMA to a non-stationary series without differencing it
If you use price levels as your ML target, you're teaching your model to predict a random walk with drift — it will learn "tomorrow's price ≈ today's price," score well on R², and be completely useless
Feature engineering on price levels creates look-ahead bias in ways that are subtle and easy to miss

Once I understood why we model returns instead of prices — not because a textbook said so, but because I watched what happened when I tried to model prices — stationarity became a tool rather than a checkbox.

Autocorrelation Actually Tells You Something
I ran a Ljung-Box test on the return series and found significant autocorrelation at lags 1, 3, 12, 22, and 29. Then I ran a Runs Test and found that the signs of returns — whether each day is up or down — are completely random (Z = 0.043, p = 0.97).
These two results together are more interesting than either one alone.
There is short-horizon autocorrelation in the magnitude and sequence of returns, but no predictability in direction. The Runs Test is essentially a direct test of the Efficient Market Hypothesis at the binary level. OGDC passes it — you cannot predict whether tomorrow is an up day from knowing today was an up day.
The ARMA(0,3) model I fit finds genuine structure — three moving-average terms capturing autocorrelation at those specific lags, with residuals that pass all diagnostic tests. The model is adequate. It just can't forecast direction, only structure.
That distinction — structure vs predictability — is something I didn't appreciate until I saw it in data where it actually mattered.

Volatility Is A Time Series Too
The most important thing I built in this project was not a price prediction model. It was a volatility model.
The Ljung-Box test on squared returns came back with Q(10) = 226.96 (p ≈ 0). Volatility was clustering — large moves followed by large moves, regardless of direction. This is the ARCH effect. It means a constant-variance assumption in any model you build on this data is simply wrong.
I implemented four GARCH family models from scratch using scipy.optimize for maximum likelihood estimation — ARCH(1), GARCH(1,1), EGARCH(1,1), and GJR-GARCH(1,1). No external library.
GJR-GARCH won by AIC. The leverage parameter γ = 0.33 means negative shocks amplify future volatility 33% more than positive shocks of equal magnitude. Seeing that emerge from your own MLE implementation — watching the optimizer converge to a parameter that confirms something the financial econometrics literature established decades ago — is a different kind of understanding than reading about it.
The persistence parameter came out at α+β = 0.78, implying a shock half-life of 2.8 trading days. Developed market equities typically show half-lives of 20–30 days. PSX stocks mean-revert faster, which is consistent with thinner liquidity and more retail-driven price discovery.

The Machine Learning Part Humbled Me
I built a Random Forest classifier to predict daily price direction. It achieved 99.7% accuracy and ROC-AUC above 0.999.
I spent about an hour thinking I had done something impressive before I figured out what had actually happened.
The raw data from Investing.com includes a Change % column — the same-day percentage change. I had included it as a feature. It is numerically equivalent to the classification target I was trying to predict. The model had learned to read the answer from the question paper.
When I removed it and all other same-day OHLC data, accuracy dropped to 53–54%. That's the honest number. Still above the 51.3% majority-class baseline — a real but modest edge consistent with the short-horizon autocorrelation the ARMA model detected. But nowhere near 99.7%.
Data leakage is easy to understand abstractly and much harder to catch in practice, especially when it comes from a column that looks obviously useful and has a name that doesn't immediately suggest it contains the target variable. I caught it because the accuracy was too high. If it had been 72%, I might never have looked.

What Financial Data Forces You To Learn
Working with market data has specific advantages over other domains as a learning environment.
Everything is measurable. You can immediately test whether a finding is statistically meaningful — run a Diebold-Mariano test, get a direct answer. The ground truth is public and continuous, producing a new observation every trading day. The feedback loop is tight in a way most applied ML projects aren't.
More importantly, the assumptions of standard methods are genuinely violated. Non-normality, heteroscedasticity, autocorrelation, structural breaks — financial data has all of them simultaneously. You cannot take shortcuts and get away with it the way you can with cleaner data. And the cost of ignoring violations is concrete: a risk model assuming Normal returns will underestimate the probability of a -10% day by an order of magnitude. That's not theoretical. It's the kind of mistake that has real consequences.

What I Would Tell Myself Before Starting
The statistical tests are not bureaucratic checkboxes before you get to the interesting modelling. They are the foundation that determines whether your model is even asking the right question.
The ADF test result determined the entire structure of the ML pipeline. The Jarque-Bera result determined which distribution to fit. The Ljung-Box result determined which lag features to include. The ARCH effect test justified the GARCH modelling. Each test was load-bearing.
If you skip them and go straight to building a model on price levels, you will get results that look plausible and are wrong in ways that are hard to diagnose after the fact. Financial data will make you do the statistics properly — not because a course requires it, but because it will break your model if you don't.

DEV Community

How Analyzing Stock Market Data Taught Me What Time Series Textbooks Couldn't

Top comments (0)