Mike Czerwinski

Posted on Jun 28

I Mined 2,505 Traders. The Only Edge Was What Not to Do.

#ai #crypto #datascience #trading

I pointed a pipeline at Binance copy traders to see whether the best published track records survived contact with an outside verifier.

The promise of copy-trading is simple. Someone posts a verified track record. 105% ROI. Sharpe 1.92. Max drawdown 6.6%. You click follow, their fills mirror into your account, and their edge becomes yours. The platform computes the numbers. They may be entirely real.

I pulled 2,505 lead traders, 11,390 round-trip fills, and ran the whole thing through a validation harness. I found nine candidate edges. Eight turned out to be beta wearing a costume. The ninth survived only as a negative signal: not what to buy, but what to avoid.

This is the same thing I keep writing about in this series, one level deeper. A track record is the actor auditing itself. The interesting question is never the number the actor reports. It is what happens when you hand that number to a verifier the actor does not control.

The filter: 2,505 down to 2

Ranking traders by Sharpe alone is garbage. The top of that list is dead micro-accounts: Sharpe 3.41 on 0.6% ROI, $781 of capital, one copier. Statistical noise dressed as skill.

So I built a quality gate. Sharpe at least 1.2, ROI at least 8%, AUM at least 50k, at least 20 copiers, max drawdown under 30%. Out of 2,505 traders, two survived:

x1Boost: Sharpe 1.92, ROI 105%, max drawdown 6.6%, $114k AUM, 373 days, 300 copiers. The best track record on the board.
A 49-day hot streak: Sharpe 6.65, ROI 25.5%, but only 49 days of history. Too small a sample to mean anything.

So really, one survivor with a long enough record to trust. A single trader out of 2,505 whose published numbers cleared a sane quality bar.

The numbers are not fake. x1Boost actually returned 105%. The displayed numbers can be real and still fail as evidence of a copyable edge. That gap is the whole post.

Candidate one: the winners are dip-buyers. Until they aren't.

I reconstructed every round-trip with FIFO accounting, joined the fills against minute-level price data, and tagged each entry by regime: was the trader buying below the 1-day moving average, a dip, or chasing price above it, a pump?

The pattern was clean and beautiful.

Winners buy dips. AI-cypto-Rebalance: 89% win rate, profit factor 3.40, trades only BTC/ETH/XRP/SOL, enters 517 of 637 times below the 1-day MA. Mean reversion. Another trader, 100% BTC, 99.7% maker orders, never chases, enters 306 of 390 times on the dip.

Losers chase pumps. CryptoArabiaUAE: 91 entries above the MA in a single bull push, 0% win rate, minus $4,977, which was the entire loss on the account. Another chaser entered 107 of 107 times in an uptrend, 15% win rate, bought tops, never sold.

There it is, I thought. Buy below the MA on majors, never chase strength. A real lesson, written in 11,390 fills of other people's money.

So I tested it.

The exogenous test, and the flip

I took the lesson off the lead-trader data and asked a different question. Does "buy the dip below the 1-day MA, fade the chase above it" work as a rule on my own 12 majors, on minute data the lead traders never touched?

That is what I mean by an exogenous test: data the trader did not create and could not curate.

In-sample window: February to mid-March. Out-of-sample: mid-March to end of April. Quintiles frozen on the in-sample window so the out-of-sample data cannot leak backward. Forward returns measured at 60, 240, and 1,440 minutes. Ten basis points round-trip cost.

The thing I was hunting for was an illusion detector: does the sign of the edge stay stable when you cross from in-sample to out-of-sample?

It did not.

At the 240-minute horizon, the dip-minus-chase spread went from in-sample minus 0.082% to out-of-sample plus 0.043%. The sign flipped. The edge did not weaken. It reversed.

I did not trust that, so I built it a second way. I cloned the two archetypes as independent agents and ran them on my own shadow flow, not on the lead fills. A dip-buyer clone of AI-cypto-Rebalance. A trend-rider clone of x1Boost. Then I walk-forward tested them: train through March 1, evaluate on the bearish March to May window.

The sign reversed again. On the full bull-heavy data, the trend-rider made plus 13.9% and the dip-buyer lost 8.3%. Walk-forward into the bear window: dip-buyer now better at minus 4.9%, trend-rider now worse at minus 13.7%.

Two independent verifiers, neither of which the original track record controlled, both delivered the same verdict.

Which archetype wins depends entirely on the regime of the window you happened to look at. Bull rewards the trend-rider. Bear rewards the dip-buyer. There is no stable edge underneath. There is a coin, and the window decides which way it lands.

Why the win rate lies, mechanically

Here is the part that should bother you, because it is not about bad faith. It is about how the number is built.

The 100% BTC maker had a 77.2% win rate. That sounds elite. It is an artifact of FIFO accounting plus survivorship. When you account for round-trips first-in-first-out, every small sale taken from an old cheap lot books as a win, because the old lot was bought lower. A trader who buys and holds BTC and occasionally trims will show a gorgeous win rate by construction, regardless of whether the strategy has any edge. The losses sit in the open positions FIFO has not closed yet.

The actor did not lie. The measurement method inflated the number on the actor's behalf.

This is exactly the failure I wrote about in the signal-funnel teardown. A published win rate is the actor auditing itself, and the audit method quietly does the actor a favor. You cannot fix that by demanding the actor be more honest. The number was honestly computed. It is the computation that flatters.

Nine candidates, one body

I did not stop at two archetypes. I chased every angle the data offered.

The nine candidate edges were:

dip-buying majors
trend-riding momentum
BTC maker trimming
asymmetric runner management
pre-surge accumulation
BTC flush-and-bounce baskets
slow open-interest squeezes
panic accumulation context
extended-major avoidance

Eight of them died the same way. They looked like alpha in the window where I found them. They turned into beta plus risk management once a verifier I did not control got hold of them.

x1Boost's 105% was beta on a bull market plus a tight stop. The dip-buyer's 89% win rate was beta on BTC plus patience plus FIFO flattery. The asymmetric runner, a fast cut on losers and a long leash on winners, was the closest thing to real, and even that was one no-stop-loss bag away from becoming the chaser.

Nine candidates, one body underneath, wearing different costumes for different windows.

The phrase that ended up repeated four times in my own research notes was "the same death." Every promising edge died the same way.

The only thing that survived was a warning

Out of 2,505 traders, 11,390 fills, and nine candidate edges, exactly one survived out-of-sample with a stable sign.

It was negative.

The top quintile of "extended" majors, price stretched well above the 1-day MA, had forward returns that stayed reliably negative across the window boundary: in-sample minus 0.29% to out-of-sample minus 1.20%, 30.6% win rate.

The one thing that transferred was: do not chase majors that are already extended.

That is the whole yield. Not a strategy. A warning. From all that data, the only durable knowledge was about what to avoid, never about what to pursue.

I think that asymmetry is the actual law here, and it is not specific to trading.

Exogenous verification is very good at killing false positives and almost useless at minting true ones. The edges that survive an honest, actor-independent test tend to be prohibitions, because a prohibition only has to be robust in one direction. A positive edge has to survive every regime you did not test.

That is why eight "buy this" candidates flipped and the single "do not buy this" held. The verifier was never going to hand me a strategy. The most it could ever do was take bad ones away.

The track record is not the signal. The track record is the actor's self-report, and a self-report cannot be its own verification no matter how real the numbers are. The only honest question is what happens when you hand the number to something the actor does not control.

When I did that, eight times, the answer was the same. The ninth only told me what to avoid.

A track record can prove that someone made money. It does not prove that their edge survives being copied.

Top comments (3)

Cartone • Jun 28

Your analysis is really interesting, and it effectively exposes some typical behaviors in crypto trading that are well established but misleading. Let me tell you this: at the beginning of my project, when I still understood almost nothing, my CEO claude.ai designed the bots with a FIFO system (I didn't know what that was and I trusted it, but why did it choose that?). At some point, when in paper mode it looked like I could become a millionaire, I asked a simple question: why are we calculating everything in FIFO, when Binance (our target for testnet and mainnet) uses avg_cost? All hell broke loose, complete restructuring of the entire system and magically I was no longer making millions😂
As of today, still in testnet, we disclose everything, including the total unrealized of open positions, and not just the positive sells (which we do have).
As for the strategy, we're building a system that looks for what to invest in but at the same time has a brake based on regime. In keeping with your idea that the audit must be external, I wouldn't mind having you do a direct audit on my project when I go live 😁
translated by claude

Mike Czerwinski • Jun 29

The FIFO-to-avg_cost moment is the whole post in one anecdote. Paper mode wasn't lying to you, the accounting was answering a different question than the one you needed, and it flattered you right up until you changed the question. Same death as the 77.2% win rate in the piece: a real number, built by a method that quietly decided what counted.

Disclosing total unrealized including open positions instead of only the closed winners is the honest version of that fix. You stopped letting the measurement pick the side it grades itself on.

On the live audit, happy to take a look. One catch to make it worth doing: a friendly second read of your numbers won't find anything, because if it agrees it's just your own model nodding back at you. The version that bites is running the strategy against a split you didn't choose, walk-forward into regimes the backtest never saw. Bring me the part you can't tune, and we'll see if the edge survives leaving your hands.

Cartone • Jun 29

thank you so much for your willingness, I'll definitely take you up on it. give me the time to sort out a few things and go live with the code frozen and I'll pass you the public repo directly 😬
translated by claude