Failed Machine Learning Experiment: Training XGBoost Classifier with 1.5m signals

#ai #machinelearning #python #sideprojects

In 2022 I started creating trading strategies in Python, and I had in mind some powerful ML-based strategies, but had neither knowledge, nor abilities to code and test them. Now, although I still have no experience with professional Machine Learning with deep mathematics, I thought that I could use AI to write code for this (Sonnet 4.5), and suggest model parameters (Grok Thinking).

When looking at many market price charts, I was under the impression that there are some patterns that can be utilized and with a right set of trading strategy and position optimization to get automated at least a couple percents of return. It’s clear that this is not true - usually the market presents a distorted picture. Yet, tempted with the ability to check it myself in a quick prototype project, I proceeded with an experiment to verify if a hypothesis based on my earlier impressions could be true. I used 2 Jupyter notebooks: XGBoost model training and Strategy backtest.

First, I downloaded 5y data of 15m timeframe price points for top 30 crypto tokens into parquet files. Then created an algorithm to find all price points, after which there was a price drop bigger than 3% within the 10 next 15 min blocks, and extracted preceding 10 price points with technical analysis indicators as training data for XGBoost classifier - for identification of moments preceding price drops. 500k Drop signals were found, and I added another 1 million with random non-drop preceding samples, in total 1.5m training samples, with 20% from these used for testing.

I’ve also normalized drops, as 3% drop on Bitcoin has a different magnitude than the same drop on Dogecoin. So I’ve chose the drop threshold = -2 expressed with Z-score approach: drop_zscore=drop_pct/volatility. It means that it’s a drop with 2x typical volatility (based on std deviation).
Then feature engineering process based on indicators momentum, volatility, price differences. Data preparation, then XGBoost training with parameters from Grok’s recommendation:

*Recommended hyperparameters:**
- `max_depth`: 3-7 (prevents memorizing noise)
- `learning_rate`: 0.01-0.1 (smaller = better with more trees)
- `n_estimators`: 200-500 (with early stopping)
- `subsample` / `colsample_bytree`: 0.6-0.9 (prevents overfitting)
- `scale_pos_weight`: 3-10 (handles class imbalance)

The model performed very similarly on test predictions and train set:

============================================================
TRAIN SET PERFORMANCE
============================================================
ROC-AUC Score: 0.6899

Classification Report:
              precision    recall  f1-score   support

   No Signal       0.93      0.62      0.74   3149036
      Signal       0.19      0.66      0.29    426220

    accuracy                           0.62   3575256
   macro avg       0.56      0.64      0.52   3575256
weighted avg       0.84      0.62      0.69   3575256

Confusion Matrix:
[[1938267 1210769]
 [ 144995  281225]]

============================================================
TEST SET PERFORMANCE (Unseen Data)
============================================================
ROC-AUC Score: 0.6761

Classification Report:
...
Train AUC: 0.6899
Test AUC:  0.6761
Difference: 0.0138
✓ Good generalization - minimal overfitting

Basic returns turned out to be the most important feature for drop prediction. Yet there are too many false positives, which could hurt a portfolio.

So I thought, maybe a set of position parameters could save this signal and make it usable? So I proceed with the backtesting notebook. I loaded the model, created a backtesting trading simulation environment, and a set of trading position parameters: TP, SL, delay, cooldown. I tried a grid search optimization approach - to test 900 scenarios with parameter combinations and find algorithmically the best one. It took 3 hours on my local computer, and yet… All scenarios resulted in 100% loss! The process failed miserably.

It was nice working on this step-by-step with Cursor + Sonnet 4.5. I’ve read a lot about XGBoost when building this, so just telling the assistant what needs to be done and why, and seeing it creating neat notebooks that work out-of-the-box or after 1-2 debug-fix iterations, felt almost seamless. Working with Jupyter Notebooks in Cursor is not convenient - the notebook needs to be closed, reopened and rerunned manually after changes applied in Agent mode. So I ended up in Ask Mode and pasting the code blocks manually.

DEV Community

Failed Machine Learning Experiment: Training XGBoost Classifier with 1.5m signals

Top comments (0)