The Problem with Raw Predictions
Imagine a model that's 55% accurate. That means 45% of its signals are wrong. If you follow every signal, you're taking a lot of bad trades alongside the good ones.
What if there was a way to know which predictions are likely to be correct — before the trade happens?
That's meta-labeling.
What Is Meta-Labeling?
Meta-labeling is a two-stage prediction framework popularised by Marcos Lopez de Prado in Advances in Financial Machine Learning. The concept is simple:
Stage 1 — Primary Model: Predicts the direction (bullish or bearish). This is our XGBoost model.
Stage 2 — Meta Model: Takes the primary model's prediction and asks: "Is this specific prediction likely to be correct?"
The meta model doesn't predict direction — it predicts the quality of the primary prediction. It outputs a confidence score. If the meta-confidence is below our threshold, we withhold the signal.
Think of It Like a Quality Filter
- Primary model: "I think BTC will go up"
- Meta model: "I'm 72% confident that prediction is correct" → Signal passes
vs.
- Primary model: "I think ETH will go down"
- Meta model: "I'm only 48% confident that prediction is correct" → Signal withheld
The result: fewer signals, but higher quality.
How We Train the Meta Model
The training process uses a 70/30 split within each walk-forward window:
- Split the training data: 70% for the primary model, 30% for the meta model
- Train the primary model on the 70% portion
- Get primary predictions on the held-out 30%
- Create meta-labels: For each prediction, label it 1 (correct) or 0 (incorrect)
- Train the meta model on these correctness labels, using the original features plus the primary model's confidence as inputs
The meta model is a separate XGBoost classifier with more regularisation than the primary (shallower trees, higher regularisation) to avoid overfitting.
Critically, the meta model sees features the primary model doesn't optimise for. The primary model optimises for direction prediction. The meta model optimises for when the primary is right. These are different problems.
Our Experiment Results
We tested meta-labeling across 10 cryptocurrencies at three confidence thresholds:
1-Hour Timeframe
| Threshold | Accuracy | Coverage | Improvement |
|---|---|---|---|
| No filter (baseline) | 54.9% | 100% | — |
| Meta ≥ 0.55 | 56.0% | 53% | +1.1% |
| Meta ≥ 0.60 | 56.3% | 44% | +1.4% |
| Meta ≥ 0.65 | 56.1% | 38% | +1.2% |
4-Hour Timeframe
| Threshold | Accuracy | Coverage | Improvement |
|---|---|---|---|
| No filter | 52.3% | 100% | — |
| Meta ≥ 0.55 | 52.5% | 48% | +0.2% |
| Meta ≥ 0.60 | 52.2% | 41% | -0.1% |
| Meta ≥ 0.65 | 52.8% | 36% | +0.5% |
Daily Timeframe
| Threshold | Accuracy | Coverage | Improvement |
|---|---|---|---|
| No filter | 51.2% | 100% | — |
| Meta ≥ 0.55 | 54.4% | 52% | +3.2% |
| Meta ≥ 0.60 | 54.7% | 43% | +3.5% |
| Meta ≥ 0.65 | 55.7% | 39% | +4.5% |
The daily timeframe showed the strongest meta-labeling effect — a +4.5% improvement is substantial.
The Accuracy vs Coverage Tradeoff
This is the core tension in meta-labeling. Higher thresholds mean:
- Higher accuracy on signals that pass
- Lower coverage (fewer signals generated)
At threshold 0.65 on the daily timeframe, you only get signals ~39% of the time. The other 61% of periods, the meta model says "I'm not confident enough" and no signal is generated.
Is this a problem? It depends on your perspective:
- For active traders who want constant signals: Yes, reduced coverage is frustrating
- For quality-focused traders who prefer fewer, better trades: Meta-labeling is exactly what you want
- For automated systems (like our trading bot): Fewer but higher-quality signals actually improve risk-adjusted returns
We chose threshold 0.60 as the default — it gives the best accuracy-to-coverage balance on the hourly timeframe where most of our signals are generated.
Per-Coin Results
Meta-labeling doesn't help every coin equally:
| Coin | Baseline | With Meta | Improvement |
|---|---|---|---|
| AUCTION | 54.8% | 63.9% | +9.2% |
| BTC (1d) | 50.6% | 66.1% | +15.5% |
| ETH (1d) | 55.0% | 62.7% | +7.7% |
| SOL | 53.7% | 54.9% | +1.2% |
| ETH (1h) | 55.6% | 55.2% | -0.5% |
| HIVE (1d) | 51.0% | 49.8% | -1.2% |
Some observations:
- BTC daily showed the largest improvement (+15.5%), though this is on a small test set (60 samples per fold)
- AUCTION was consistently the most improved by meta-labeling across timeframes
- ETH on 1h actually got slightly worse — the meta model occasionally filtered out predictions that would have been correct
- HIVE daily was slightly negative, suggesting the meta model doesn't generalise well for all low-cap altcoins
What the Meta Model Learns
The meta model's feature importance reveals what it's actually learning:
- Primary model confidence (the probability the primary assigns to its prediction) is the single most important feature — unsurprisingly, more confident predictions are more likely to be correct
- Volatility indicators (ATR, Bollinger Width) rank high — the model is worse during high-volatility periods
- Trend indicators (EMA alignment, MACD) — predictions during clear trends are more reliable
- Volume — higher volume periods produce more reliable predictions
In other words, the meta model learns to trust the primary model more when:
- The primary is highly confident
- Volatility is moderate (not extreme)
- There's a clear trend (not choppy/ranging)
- Volume confirms the move
Implementation Considerations
If you're building a similar system:
Train/meta split matters. We use 70/30 within each walk-forward window. Too little meta-training data (e.g., 90/10) makes the meta model unreliable. Too much (e.g., 50/50) starves the primary model.
The meta model should be more regularised. We use shallower trees (depth 5 vs 8) and higher regularisation. The meta model sees fewer samples and has an easier classification task.
Include primary confidence as a meta feature. This is the single most important feature for the meta model. Without it, meta-labeling performance drops significantly.
Walk-forward prevents leakage. The meta model must only be trained on data the primary model hasn't seen. Our 70/30 split within each walk-forward window ensures this.
Key Takeaways
- Meta-labeling improves accuracy by 1-5% depending on timeframe and threshold
- Coverage drops to 39-44% — you trade less often but with higher quality
- Daily timeframe benefits most (+4.5% at threshold 0.65)
- Not all coins benefit equally — works best on BTC, ETH, and mid-cap tokens
- The accuracy-coverage tradeoff is real — you need to decide what matters more for your strategy
Part of Our Research Series
- 13,500 Model Fits Later: What Actually Works — Overview
- Why We Chose XGBoost Over LSTM — Model comparison
- How Macro Indicators Predict Crypto Prices — Macro features
- This post — Meta-labeling and signal quality
Full methodology: How Our AI Works
AI trading signals are probabilistic predictions, not financial advice. Meta-labeling improves signal quality but does not eliminate risk. Past performance does not guarantee future results.
Originally published at Nydar. Nydar is a free trading platform with AI-powered signals and analysis.

Top comments (0)