Disclaimer: This article is for educational purposes only. It is not investment advice. Crypto trading carries significant risk. Do your own research and comply with your local regulations.
I was about to launch a paid crypto signal service. Three strategies, Telegram delivery, the whole thing.
I killed it. Here's why.
The Plan
I'd been writing publicly about a "3-strategy consensus" bot for months. The idea was simple. Pick three strategies with solid individual Sharpe ratios, fire a signal only when at least two agree, push that to a paid Telegram channel. Clean, conservative, explainable.
The three strategies:
| Strategy | Sharpe | What it does |
|---|---|---|
| EMA Crossover (12/26) | 1.30 | Trend following |
| Parabolic SAR | 1.25 | Trend reversal |
| MACD (12/26/9) | 1.17 | Momentum |
All three numbers came from three years of BTC/USDT daily backtests. Paid launch was scheduled. Infrastructure was ready. I was maybe a week away from flipping a switch.
The First Red Flag
From April 1 to April 8, I watched the free channel to sanity-check the signals. Three pairs (BTC, ETH, SOL) times eight days equals 24 opportunities.
Buy: 0. Sell: 0. Hold: 24.
Eight straight days of nothing. Not a single signal across three pairs.
I assumed quiet markets. Checked the code. It was working as designed. Each strategy only fires on the exact day a crossover happens, and requiring two strategies to cross on the same day turns out to be statistically rare — roughly once or twice per quarter. Not a market issue. A design issue.
That was the moment I stopped and asked a harder question.
The Question I Should Have Asked Earlier
I had individual backtests for each of the three strategies. I had never actually backtested the consensus rule itself. "Two strategies agreeing" is a new composite strategy, and I had no out-of-sample evidence that it worked.
So I ran Walk-Forward Optimization.
Walk-Forward Setup
Rolling windows, 365-day in-sample, 90-day out-of-sample, 90-day step. Three pairs (BTC/USDT, ETH/USDT, SOL/USDT). Three candidate strategies:
-
Current consensus — fire when
buy_count >= 2 - Variant A: EMA only — drop the consensus, use the best individual strategy
- Variant B: Sharpe-weighted — weight each strategy by its historical Sharpe, threshold ±1.0
Data: Binance daily, Jan 2023 to Mar 2026. About 1,100+ bars per pair.
For significance I used the Deflated Sharpe Ratio (Bailey & López de Prado, 2014). DSR corrects for the selection bias that happens when you try multiple strategies and pick the best one. My threshold was DSR ≥ 0.95.
Here's the WFO config, in case it's useful:
from dataclasses import dataclass
@dataclass(frozen=True)
class WfoConfig:
in_sample_days: int = 365
out_of_sample_days: int = 90
rolling_step_days: int = 90
data_start: str = "2023-01-01"
data_end: str = "2026-03-21"
Pretty standard. Nothing clever.
The Results
I'm going to show all nine cells. No cherry-picking.
| Strategy | Pair | OOS Sharpe | Trades | DSR | IS→OOS Decay | Verdict |
|---|---|---|---|---|---|---|
| Current consensus | BTC | -0.812 | 14 | 0.005 | 2.05x | FAIL |
| Current consensus | ETH | -1.736 | 11 | 0.000 | 1.89x | FAIL |
| Current consensus | SOL | -1.974 | 10 | 0.000 | 3.12x | FAIL |
| Variant A: EMA | BTC | -10.409 | 18 | 0.000 | 11.38x | FAIL-FAIL |
| Variant A: EMA | ETH | -0.029 | 15 | 0.064 | 0.40x | FAIL |
| Variant A: EMA | SOL | -2.157 | 16 | 0.000 | 2.91x | FAIL |
| Variant B: Sharpe-weighted | BTC | 0.066 | 42 | 0.082 | 0.69x | MARGINAL |
| Variant B: Sharpe-weighted | ETH | -1.599 | 38 | 0.000 | 1.92x | FAIL |
| Variant B: Sharpe-weighted | SOL | 0.612 | 37 | 0.279 | 0.57x | MARGINAL |
Seven out of nine cells have negative out-of-sample Sharpe. The two positive cells don't clear the DSR significance threshold. Zero cells pass.
The worst cell is Variant A EMA on BTC: OOS Sharpe of -10.409 with an IS→OOS decay of 11.38x. That's the textbook shape of "the model memorized the in-sample period and the out-of-sample period walked directly into the opposite wall."
The Gate Review
I have a tool called gate-reviewer that runs a 4-Pitfall / 3-Gate framework against strategy results. I ran this one through it. The 3-Gate summary:
Gate 1 (Explainability): FAIL
"Multiple indicators agreeing = signal" is not empirically justified.
There is no structural reason the edge should exist.
Gate 2 (Tail Safety): FAIL
Max drawdown threshold is 15%. Estimated DD across variants: -25% to -70%.
No strategy-level circuit breaker.
Gate 3 (OOS Reproducibility): FAIL
Sharpe min > 0: 7 out of 9 cells negative.
Trade count per window: 10-42. Sample too small.
TOTAL VERDICT: DISCARD
All three gates fail. The verdict is blunt: discard the strategy.
Why This Failed (Honest Post-Mortem)
Four things, in rough order of blame.
1. The data window was a bull market.
2023 through early 2026 was a structurally bullish period for BTC. All three strategies are trend-following. You'd expect them to look fine in-sample. The second the walk-forward window shifts to even a modestly choppier period, the model collapses. That's the signature of regime dependence.
2. The three strategies were not independent.
EMA, SAR, and MACD are all moving-average / momentum family. They see similar information. "Three strategies agreeing" sounded like three independent votes. It was closer to the same vote counted three times. There was no real ensemble diversity.
3. Design fires only on crossover instants.
Signals only trigger on the exact day a crossover happens, so the bot is constantly hunting for trend starting points. When the starting point turns out to be a reversal, the bot takes the wrong side. Inside a sustained trend, it fires nothing.
4. I ignored my own multiple-testing warning.
I wrote a whole article about the Deflated Sharpe Ratio and how trying N strategies and picking the best one inflates the apparent edge. Then I built a signal service that tries three strategies and picks the best-looking combination. I did not apply DSR correction to my own design. I wrote the warning label and then ate the poison.
That last one is the one that actually bothers me.
What I'm Doing Instead
Three options were on the table:
- A. Rebuild strategies, rerun Phase 0.
- B. Kill the signal service. Sell the verification process instead.
- C. Shut the whole thing down.
I ran this through an internal AI council I use for hard calls (three personas judging independently). Unanimous for B.
My own reasoning landed in the same place. Option A would be me rerunning the same multiple-testing loop I just failed, which is exactly the trap DSR exists to warn you about. Option C wastes a working backtest engine, a Telegram audience, and a 13-article public trail. Option B converts the Phase 0 failure from a dead end into the most honest piece of content I've ever written.
So from now on, I'm publishing:
- A weekly "verification journal" on the free Telegram channel — every new strategy I test, pass or fail, with the numbers attached.
- Technical deep-dives on Walk-Forward Optimization and DSR.
- A paid report ("Phase 0: The Discard Record") covering the full WFO logs, all gate-reviewer outputs, and every reason the strategies failed. Coming soon.
Selling process instead of signals. Process is the thing I actually have.
What You Can Take From This
If you're building a signal service or any paid quantitative product, three takeaways.
Walk-forward your composite rules, not just your components. A consensus of good strategies is not automatically a good strategy. The composition has its own degrees of freedom and needs its own out-of-sample test.
Use DSR even for small N. Three strategies is already enough for selection bias to inflate your apparent Sharpe. Apply the correction. It takes ten lines of Python.
Ship the failure. A public "here's the strategy I was going to sell and here's why I didn't" is worth more than another confident-sounding bot tutorial. There are a lot of the second kind. There are almost zero of the first.
I'll take "honest" over "impressive" every time now. Especially after this.
Links
- GitHub @maymay5692 — backtest engine and WFO analysis scripts
- Telegram Free Channel — weekly verification journal going forward
- MEXC Signup — affiliate link, supports my research if you use it
Reference
- Bailey, D. H., & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting and Non-Normality. Journal of Portfolio Management.
Top comments (0)