NydarTrading

Posted on Feb 20 • Originally published at nydar.co.uk

Meta-Labeling: Filtering Bad Trades Before They Happen

#ai #machinelearning #riskmanagement #datascience

The Problem with Raw Predictions

Imagine a model that's 55% accurate. That means 45% of its signals are wrong. If you follow every signal, you're taking a lot of bad trades alongside the good ones.

What if there was a way to know which predictions are likely to be correct — before the trade happens?

That's meta-labeling.

What Is Meta-Labeling?

Meta-labeling is a two-stage prediction framework popularised by Marcos Lopez de Prado in Advances in Financial Machine Learning. The concept is simple:

Stage 1 — Primary Model: Predicts the direction (bullish or bearish). This is our XGBoost model.

Stage 2 — Meta Model: Takes the primary model's prediction and asks: "Is this specific prediction likely to be correct?"

The meta model doesn't predict direction — it predicts the quality of the primary prediction. It outputs a confidence score. If the meta-confidence is below our threshold, we withhold the signal.

Think of It Like a Quality Filter

Primary model: "I think BTC will go up"
Meta model: "I'm 72% confident that prediction is correct" → Signal passes

vs.

Primary model: "I think ETH will go down"
Meta model: "I'm only 48% confident that prediction is correct" → Signal withheld

The result: fewer signals, but higher quality.

How We Train the Meta Model

The training process uses a 70/30 split within each walk-forward window:

Split the training data: 70% for the primary model, 30% for the meta model
Train the primary model on the 70% portion
Get primary predictions on the held-out 30%
Create meta-labels: For each prediction, label it 1 (correct) or 0 (incorrect)
Train the meta model on these correctness labels, using the original features plus the primary model's confidence as inputs

The meta model is a separate XGBoost classifier with more regularisation than the primary (shallower trees, higher regularisation) to avoid overfitting.

Critically, the meta model sees features the primary model doesn't optimise for. The primary model optimises for direction prediction. The meta model optimises for when the primary is right. These are different problems.

Our Experiment Results

We tested meta-labeling across 10 cryptocurrencies at three confidence thresholds:

1-Hour Timeframe

Threshold	Accuracy	Coverage	Improvement
No filter (baseline)	54.9%	100%	—
Meta ≥ 0.55	56.0%	53%	+1.1%
Meta ≥ 0.60	56.3%	44%	+1.4%
Meta ≥ 0.65	56.1%	38%	+1.2%

4-Hour Timeframe

Threshold	Accuracy	Coverage	Improvement
No filter	52.3%	100%	—
Meta ≥ 0.55	52.5%	48%	+0.2%
Meta ≥ 0.60	52.2%	41%	-0.1%
Meta ≥ 0.65	52.8%	36%	+0.5%

Daily Timeframe

Threshold	Accuracy	Coverage	Improvement
No filter	51.2%	100%	—
Meta ≥ 0.55	54.4%	52%	+3.2%
Meta ≥ 0.60	54.7%	43%	+3.5%
Meta ≥ 0.65	55.7%	39%	+4.5%

The daily timeframe showed the strongest meta-labeling effect — a +4.5% improvement is substantial.

The Accuracy vs Coverage Tradeoff

This is the core tension in meta-labeling. Higher thresholds mean:

Higher accuracy on signals that pass
Lower coverage (fewer signals generated)

At threshold 0.65 on the daily timeframe, you only get signals ~39% of the time. The other 61% of periods, the meta model says "I'm not confident enough" and no signal is generated.

Is this a problem? It depends on your perspective:

For active traders who want constant signals: Yes, reduced coverage is frustrating
For quality-focused traders who prefer fewer, better trades: Meta-labeling is exactly what you want
For automated systems (like our trading bot): Fewer but higher-quality signals actually improve risk-adjusted returns

We chose threshold 0.60 as the default — it gives the best accuracy-to-coverage balance on the hourly timeframe where most of our signals are generated.

Per-Coin Results

Meta-labeling doesn't help every coin equally:

Coin	Baseline	With Meta	Improvement
AUCTION	54.8%	63.9%	+9.2%
BTC (1d)	50.6%	66.1%	+15.5%
ETH (1d)	55.0%	62.7%	+7.7%
SOL	53.7%	54.9%	+1.2%
ETH (1h)	55.6%	55.2%	-0.5%
HIVE (1d)	51.0%	49.8%	-1.2%

Some observations:

BTC daily showed the largest improvement (+15.5%), though this is on a small test set (60 samples per fold)
AUCTION was consistently the most improved by meta-labeling across timeframes
ETH on 1h actually got slightly worse — the meta model occasionally filtered out predictions that would have been correct
HIVE daily was slightly negative, suggesting the meta model doesn't generalise well for all low-cap altcoins

What the Meta Model Learns

The meta model's feature importance reveals what it's actually learning:

Primary model confidence (the probability the primary assigns to its prediction) is the single most important feature — unsurprisingly, more confident predictions are more likely to be correct
Volatility indicators (ATR, Bollinger Width) rank high — the model is worse during high-volatility periods
Trend indicators (EMA alignment, MACD) — predictions during clear trends are more reliable
Volume — higher volume periods produce more reliable predictions

In other words, the meta model learns to trust the primary model more when:

The primary is highly confident
Volatility is moderate (not extreme)
There's a clear trend (not choppy/ranging)
Volume confirms the move

Implementation Considerations

If you're building a similar system:

Train/meta split matters. We use 70/30 within each walk-forward window. Too little meta-training data (e.g., 90/10) makes the meta model unreliable. Too much (e.g., 50/50) starves the primary model.

The meta model should be more regularised. We use shallower trees (depth 5 vs 8) and higher regularisation. The meta model sees fewer samples and has an easier classification task.

Include primary confidence as a meta feature. This is the single most important feature for the meta model. Without it, meta-labeling performance drops significantly.

Walk-forward prevents leakage. The meta model must only be trained on data the primary model hasn't seen. Our 70/30 split within each walk-forward window ensures this.

Key Takeaways

Meta-labeling improves accuracy by 1-5% depending on timeframe and threshold
Coverage drops to 39-44% — you trade less often but with higher quality
Daily timeframe benefits most (+4.5% at threshold 0.65)
Not all coins benefit equally — works best on BTC, ETH, and mid-cap tokens
The accuracy-coverage tradeoff is real — you need to decide what matters more for your strategy

Part of Our Research Series

13,500 Model Fits Later: What Actually Works — Overview
Why We Chose XGBoost Over LSTM — Model comparison
How Macro Indicators Predict Crypto Prices — Macro features
This post — Meta-labeling and signal quality

Full methodology: How Our AI Works

AI trading signals are probabilistic predictions, not financial advice. Meta-labeling improves signal quality but does not eliminate risk. Past performance does not guarantee future results.

Originally published at Nydar. Nydar is a free trading platform with AI-powered signals and analysis.

DEV Community