When Exchanges Lie: Outlier Detection Across 150+ Crypto Data Sources

#crypto #data #outlierdetection #exchanges

A few years ago I was working on a global market data platform. The job was straightforward on paper: integrate as many cryptocurrency exchanges as possible, aggregate their price and volume data, serve it reliably. We got to around 150 exchanges. That's when things got interesting.

I expected noise. APIs go down, timestamps drift, small exchanges have thin order books. That's just how it is. What I did not expect was how many exchanges were not just noisy, but actively wrong in ways that were hard to call accidental.

Here's what I ran into.

The Web Traffic Problem

The first signal that something was off had nothing to do with price data. It was Alexa rankings.

If you're not familiar, Alexa was a web traffic ranking service. It ranked websites by how much traffic they received. Imperfect, but external, independent, and hard to game without actual users. We started cross-referencing exchange-reported volume against their Alexa rank as a basic sanity check.

The pattern was immediate. Exchanges claiming hundreds of millions in daily volume sometimes had Alexa ranks in the millions, putting them somewhere between a niche hobbyist blog and a small regional news site. Real exchanges with real users have real web traffic. Fake volume does not come with fake visitors.

How I handled it: Computed a volume-to-traffic ratio for every exchange and compared it against the median ratio across all exchanges. Anything beyond a few standard deviations from that median got a reduced trust weight. Not a hard ban, just enough drag that a lying exchange could not move the aggregate on its own.

It became one of our most reliable filters. Not because it was precise, but because it was orthogonal. An exchange can manipulate its own API. It cannot manufacture a credible web presence overnight.

Alexa shut down in May 2022. If I were building this today I'd use SimilarWeb for the same purpose. Same principle: use external signals the exchange doesn't control.

Stale Data Served Fresh

Some exchanges would return data with a current timestamp but the underlying numbers hadn't changed in minutes. Sometimes longer.

This one could be a bug. Caching misconfiguration, a stuck worker, a failed background job. But it happened too consistently on too many exchanges to feel entirely accidental. At the very least, it was a bug they had no interest in fixing.

How I handled it: Each time I polled an exchange, I hashed the price and volume payload. If the hash matched the previous several responses but the timestamp had changed, I marked it stale. After enough consecutive stale responses, I pulled that exchange out of the live feed entirely until fresh data came through. Simple, cheap, and it worked. Real markets move. If your data is not moving, it's not real.

Price Hallucination

This is different from a price slightly off due to liquidity differences. Every exchange has its own order book, so minor variance is expected and fine.

Price hallucination is when an exchange quotes a price that has no relationship to what's happening anywhere else. Not slightly off. Structurally wrong. BTC trading at $30,000 when every other exchange has it at $43,000, and it's been that way for hours.

These prices never corrected through arbitrage because there was no real liquidity behind them. Nobody was actually trading at those prices. The exchange was just publishing a number.

How I handled it: Median absolute deviation across all exchanges for the same trading pair. MAD is the right tool here because it is resistant to manipulation. To move the median you need to control the majority of your data sources, which is hard when you are pulling from 150 places. Anything more than three times the MAD from the median got excluded from that price calculation entirely. The threshold sounds arbitrary but in practice it was conservative enough to catch real hallucinations while leaving legitimate variance alone.

Ghost Liquidity

Order books that look healthy until you try to use them.

On paper: $2M sitting on the bid, $2M on the ask, tight spread. Looks like a functioning market. In practice: the moment a real order touches that book, the liquidity vanishes. Bids and asks that were sitting there seconds ago are gone.

This one is particularly cynical because it's designed specifically to fool aggregators and ranking tools that look at order book depth as a quality signal. The order book is theater. It exists to look good in screenshots and API responses, not to fill trades.

How I handled it: Real order books fluctuate constantly. Prices shift, sizes change, levels appear and disappear. If an exchange's top order book levels had not changed in a meaningful window during active market hours, that was a flag. Combined this with a spread-to-volume ratio check. Deep books with suspiciously low spreads on low-volume pairs do not add up. Either signal alone could be a false positive. Together they are reliable.

Crawler-Aware APIs

This was the one that genuinely impressed me, in a grim sort of way.

Some exchanges were serving different data depending on who was asking. Known data aggregator IP ranges got clean, reasonable-looking data. Other IPs got inflated numbers. The exchange had modeled the fact that aggregators exist, figured out how to identify them, and was gaming the system accordingly.

How I handled it: Rotating residential proxies for verification polling. Periodically I'd re-fetch the same data from a different IP and compare the responses. Persistent divergence above a threshold meant the exchange was blacklisted from aggregation entirely. Not downweighted. Gone. There's no good-faith explanation for an exchange that shows different prices to different clients. That's not a bug you fix, it's a policy you enforce.

What This Actually Is

Outlier detection across many exchanges is not primarily a statistics problem. It is a trust problem. The standard approaches (Z-scores, IQR, median absolute deviation) are useful, but they assume your outliers are noise. Some of your outliers are lies, and lies require a different mindset.

Here's what actually worked:

Use external signals. Web traffic, app store rankings, social presence. Anything the exchange doesn't control and can't fake cheaply. SimilarWeb is the practical option today.
Weight by reputation over time. Exchanges that flag consistently get less weight. Build a scoring layer that updates as you collect data. Reputation should be earned and lost dynamically, not set once at integration time.
Consensus over mean. If 140 exchanges broadly agree and 10 do not, the 10 are your problem. The median is much harder to manipulate than the mean.
Watch for perfection. Real trading data is messy. If an exchange reports exactly $10,000,000 in volume, or produces price charts that look suspiciously smooth, that's a red flag. Markets do not round like that.
Treat divergence as intent. Noise is random. These patterns were consistent, directional, and self-serving. Once you start seeing that, you stop debugging and start filtering.

Most of this was figured out the hard way. You integrate a hundred exchanges optimistically, you start noticing things that don't add up, you dig in, and eventually you build a picture of what the data ecosystem actually looks like versus what it claims to be.

You can't unsee it once you've seen it.