Building a Crypto Funding Rate Data Pipeline with Python

#python #crypto #webdev #data

I needed funding rate data across multiple exchanges in a single place. Nothing free gave me what I wanted: historical rates, cross-exchange comparisons, and annualised calculations. So I built it.

Here's how the data pipeline works, and what I learned along the way.

The Architecture

Exchange APIs (Bybit, Binance)
        |
    Scheduled collector (every 8h)
        |
    PostgreSQL + TimescaleDB
        |
    JSON snapshot generator (3x/day)
        |
    Static site (Astro on Cloudflare Pages)

The key design decision: don't serve live API queries to the frontend. Instead, generate JSON snapshots on a schedule and serve them as static files. This gives you:

Zero backend load from page views
Free hosting on Cloudflare Pages
Sub-50ms page loads globally
No API rate limits to worry about

Collecting the Data

Both Bybit and Binance expose funding rate history via their public REST APIs. No authentication required for reading rates.

The collector runs on a schedule, pulls the latest rates for each asset, and writes them to TimescaleDB (PostgreSQL with time-series extensions). Each row stores:

Asset (BTC, ETH, SOL, etc.)
Exchange
Funding rate (raw 8h)
Annualised rate
Timestamp

Annualising Rates

This is where most tools get it wrong. A raw funding rate of 0.01% per 8h doesn't mean 0.03% per day. Funding compounds:

annualised = ((1 + rate_8h) ** (3 * 365) - 1) * 100

The 3 * 365 comes from 3 settlements per day, 365 days per year. At 0.01% per 8h, that's ~11.6% annualised, not 10.95% (which is what simple multiplication gives you).

The difference matters when you're comparing a 0.03% rate against a 0.05% rate. Simple multiplication says the spread is 0.02%. Compound calculation says it's larger.

The Snapshot Generator

Every 8 hours, a scheduled job:

Queries the latest rates from the database
Calculates cross-exchange spreads
Generates JSON files per asset
Triggers a site rebuild on Cloudflare Pages

The static site reads these JSON files at build time using Astro's data layer. No runtime database queries.

What I'd Do Differently

Start with TimescaleDB from day one. I initially used plain PostgreSQL and migrated later. Time-series queries (rolling averages, period comparisons) are dramatically faster with hypertables.
Collect more frequently than you display. I collect every 8h but could go hourly. Having higher-resolution data lets you spot intraday patterns even if you only show 8h snapshots publicly.
Add WebSocket for liquidation data early. REST polling works for funding rates (they only change every 8h). Liquidation events are real-time — you need a persistent WebSocket connection.

The Result

FundingKai tracks 10 major assets across exchanges. The data pipeline runs autonomously — no manual intervention since launch.

If you're building something similar, the key insight is: separate collection from presentation. Collect into a proper database, serve via static snapshots. Your data pipeline and your frontend have completely different reliability and performance requirements.

The data is live at fundingkai.com. Built with Python, PostgreSQL/TimescaleDB, Astro, and Cloudflare Pages.