FoxyyyBusiness

Posted on Apr 10

Seven crypto exchanges, one normalized schema, ~700 lines of Python

#python #api #crypto #opensource

This is a follow-up to my previous post on building a funding-rate arbitrage scanner. That post was about the product — what it does, the three non-obvious gotchas, and why I built it. This one is about the plumbing: how seven different exchange APIs handle the same data and what it took to unify them.

If you've ever thought "I'll just call the public APIs and join the data, how hard can it be" — this post is for you.

The dataset I wanted

For each USDT-margined perpetual on each major venue, I needed:

Current funding rate (per period, decimal — e.g. 0.0001 = 0.01%)
Funding interval in hours (8h, 4h, 1h depending on the venue and the symbol)
Mark price (for sizing calculations)
24h volume in USD (for liquidity filtering — without this, the scanner is useless)
Next funding time (UNIX seconds — for "this opportunity expires in X minutes" UI)

Sounds simple. It's not. Each exchange returns a subset of this in a different shape, and you usually need at least two API calls per exchange to assemble the full record. Here's how each one works.

Binance

Strategy: two bulk calls.

GET https://fapi.binance.com/fapi/v1/premiumIndex
# Returns ALL symbols with: lastFundingRate, markPrice, nextFundingTime
# One call, ~670 USDT-M symbols, instant.

GET https://fapi.binance.com/fapi/v1/ticker/24hr
# Returns ALL symbols with: quoteVolume (24h USDT volume)
# One call, same coverage, instant.

GET https://fapi.binance.com/fapi/v1/fundingInfo
# Returns the per-symbol fundingIntervalHours, but ONLY for symbols
# whose interval is non-default (4h, 1h). Default 8h symbols are omitted.
# Cache once, refresh weekly.

Binance is the cleanest. The funding interval endpoint returning only non-default symbols is mildly annoying (you have to assume 8h if a symbol is missing) but that's a minor wart.

Bybit

Strategy: bulk tickers + paginated instruments-info.

GET https://api.bybit.com/v5/market/tickers?category=linear
# Returns ALL linear perps with: fundingRate, markPrice, turnover24h (USD), nextFundingTime
# One call, ~544 symbols.

GET https://api.bybit.com/v5/market/instruments-info?category=linear&limit=1000&cursor=...
# Returns funding interval in MINUTES (not hours, not seconds). Convert.
# Paginated, but limit=1000 fits everything in 1 call usually.

Bybit is also clean once you know about the unit (minutes for fundingInterval, USD for turnover24h). The pagination cursor is technically required but in practice you fit everything in 1 page.

OKX

Strategy: bulk tickers + per-instrument funding rate (parallelized).

GET https://www.okx.com/api/v5/public/instruments?instType=SWAP
# Returns the list of SWAP instruments. Filter by settleCcy=USDT.
# ~285 USDT-margined SWAPs.

GET https://www.okx.com/api/v5/market/tickers?instType=SWAP
# Returns volume + last price per instrument. One bulk call.

GET https://www.okx.com/api/v5/public/funding-rate?instId=BTC-USDT-SWAP
# Returns funding rate, fundingTime, nextFundingTime for ONE instrument.
# Yes, one. Per. Call. There is no bulk endpoint for current funding rate at OKX.
# Public rate limit: ~20 req/2s.

This is where it gets ugly. OKX has the most awkward API of the major venues. To get current funding rates for 285 symbols you have to make 285 sequential calls (or 8 parallel workers respecting the rate limit). 285 calls × ~0.3s each ≈ 85 seconds sequential, 8 seconds parallelized.

The funding interval is computed from fundingTime - prevFundingTime because OKX doesn't expose it as a field. That works as long as both fields are present, which they are 99.9% of the time.

from concurrent.futures import ThreadPoolExecutor, as_completed

session = requests.Session()
with ThreadPoolExecutor(max_workers=8) as ex:
    futures = [ex.submit(_one, inst) for inst in usdt_swaps]
    for f in as_completed(futures):
        ...

8 workers stays comfortably under the 10 req/s limit. 0 failures in production for the past day.

Bitget

Strategy: two bulk calls. The cleanest of all.

GET https://api.bitget.com/api/v2/mix/market/tickers?productType=usdt-futures
# Returns ALL contracts: fundingRate, markPrice, usdtVolume, holdingAmount
# One call, ~537 symbols.

GET https://api.bitget.com/api/v2/mix/market/current-fund-rate?productType=usdt-futures
# Returns fundingRateInterval (hours, integer) per symbol.
# One call.

Bitget gets a gold star. Two calls, both bulk, fields named clearly, units obvious. If only every exchange were like this.

MEXC

Strategy: two bulk calls, parallel-friendly.

GET https://contract.mexc.com/api/v1/contract/funding_rate
# Returns ALL symbols with: fundingRate, collectCycle (hours), nextSettleTime
# One call, ~762 symbols.

GET https://contract.mexc.com/api/v1/contract/ticker
# Returns ALL symbols with: fairPrice, amount24 (USDT volume)
# One call.

MEXC's collectCycle is in hours (clean) and they expose every field on bulk endpoints. Symbol format is BTC_USDT instead of BTCUSDT, which requires a small normalization step. Otherwise, also a gold star.

Gate.io

Strategy: two bulk calls + an in-delisting filter.

GET https://api.gateio.ws/api/v4/futures/usdt/contracts
# Returns funding_interval (in SECONDS, divide by 3600), in_delisting flag, name.
# One call, ~642 symbols.

GET https://api.gateio.ws/api/v4/futures/usdt/tickers
# Returns funding_rate, mark_price, volume_24h_quote (USDT volume), contract.
# One call.

Gate.io is fine but has the most variety of unit conventions in a single response: funding_interval is in seconds, volume_24h_quote is in USD, funding_rate is decimal. The in_delisting flag is critical — without it you'll see violently spiking funding rates on coins that are about to disappear, which look like 5000% APY opportunities until you realize you can't trade them.

Hyperliquid

Strategy: ONE bulk POST call. Yes, POST, not GET.

POST https://api.hyperliquid.xyz/info
Content-Type: application/json
{"type": "metaAndAssetCtxs"}

Returns a 2-element array:

[0].universe — list of perp assets with their name (just the base, e.g. BTC, no quote suffix)
[1] — parallel array of contexts with funding, markPx, dayNtlVlm (USD volume)

Single call, 229 symbols, instant. Funding interval is hardcoded 1h (Hyperliquid's entire venue is 1-hour funding).

The catch: it's a JSON-RPC-ish POST API with parallel arrays as the response shape. If you're used to REST GET endpoints, the first time you see this you'll waste 10 minutes wondering why your params={} doesn't work. POST + JSON body, that's the trick.

The unified output

After all that, every exchange's collector function returns the same dict shape:

{
    "exchange": "binance",
    "symbol": "BTCUSDT",            # normalized to <BASE>USDT
    "base": "BTC",
    "funding_rate": 0.00004,
    "funding_interval_hours": 8,
    "next_funding_time": 1775606400,
    "mark_price": 65000.0,
    "volume_24h_usd": 1234567.0,
    "fetched_at": 1775597628,
}

These get inserted via INSERT OR REPLACE into the single SQLite table. The composite primary key (exchange, symbol, fetched_at) means every collector cycle is preserved as history.

7 exchanges, 3700+ symbols, ~12 seconds total per cycle. Most of that is the OKX parallelized fetch (8s) — Binance, Bybit, Bitget, MEXC, Gate.io are sub-second each, Hyperliquid is sub-second, and they all run in series (could parallelize them too, easy win).

What I learned

There is no industry standard for funding-rate data, even for the most-traded instrument class in crypto. Each exchange invents its own field names, units, endpoint shapes, and pagination conventions.
The "single bulk call returning everything" pattern is rare. Most exchanges require 2-3 calls to assemble the full record (rate + volume + interval). OKX requires N+2 calls.
Units matter more than you think. Hours, minutes, seconds, milliseconds, decimals, percentages — every exchange picks at least one unit that surprises you.
in_delisting flags exist on some venues and not others. Without them, you'll publish fake opportunities.
The 1000-multiplier coins (1000PEPE, 1000SHIB, 1000XEC) are a nightmare because their cross-exchange equivalence is non-trivial. I filter them out for v0.

The library is on GitHub at <TBD-after-push> under MIT. It's ~700 lines of Python, no async, no clever metaprogramming, no plugin system. Just seven fetch_* functions and one SQLite schema.

If you want to add an 8th exchange, the contract is documented in CONTRIBUTING.md. The first PR I'd love to merge is dYdX v4 — their REST API is cleaner than most CEXes and I just haven't written it yet.

Live scanner using this library: https://foxyyy.com/

API docs: https://foxyyy.com/docs