Introduction
In this project, I built a complete real-time cryptocurrency analytics system from the ground up, capable of ingesting, storing, and visualizing live crypto market data.
The system collects price and volume data from the Binance Exchange, streams it through Kafka (with Debezium CDC), stores it in Cassandra, and visualizes it live in Grafana.
This setup simulates a lightweight version of the kind of real-time infrastructure used by trading platforms, financial dashboards, and risk monitoring systems, emphasizing scalability, fault-tolerance, and live data analysis.
System Architecture Overview
Here’s the high-level flow of data through the pipeline:
Binance API → PostgreSQL → Debezium (CDC) → Kafka → Cassandra → Grafana
Components Breakdown
| Component | Technology | Function |
|---|---|---|
| Data Source | Binance API | Provides live cryptocurrency data |
| CDC Engine | Debezium | Captures real-time changes in PostgreSQL |
| Message Broker | Apache Kafka | Streams change events |
| Database | Apache Cassandra | Stores processed market data |
| Visualization | Grafana | Real-time dashboard and analytics |
Design Choice: No Sink Connector
In a typical Kafka-based architecture, a sink connector is used to automatically write streamed data into a destination like Cassandra.
However, in this project, I did not use a sink connector. Instead, I wrote a custom consumer script that listens to Kafka topics and inserts data manually into Cassandra.
Why I Chose This Approach
- Full Control Over Data Transformation: We could clean, transform, and enrich the messages before writing them into Cassandra. (For example, filtering noise, flattening JSON payloads, converting timestamps, etc.)
- Better Error Handling and Debugging: We could log failed inserts, handle schema mismatches gracefully, and replay specific batches.
- Simpler to Debug During Development: For educational and experimental purposes, it was easier to see what was flowing through each stage of the pipeline.
- Future Extensibility: This approach lets us modify the consumer to perform real-time computations or alerts before persisting the data.
How Data Flows Through the System
Binance WebSocket Feeds continuously push ticker, order book, and kline (candlestick) updates.
These updates are first written into PostgreSQL tables for structured storage.
Producer inserting data into Postgres
Debezium monitors PostgreSQL using Change Data Capture (CDC) and streams every row change into Kafka topics.
Our Kafka consumer (a Python script) reads from those topics and writes structured data into Cassandra tables.
Consumer streaming changes into Cassandra
Finally, Grafana queries either the Flask API (via Infinity plugin) or Cassandra directly to visualize the live data.
Database Design
Cassandra Keyspace: binance_keyspace
1. crypto_24h_stats
Stores 24-hour performance metrics for each asset.
CREATE TABLE crypto_24h_stats (
symbol text PRIMARY KEY,
price_change_percent double,
last_price double,
high_price double,
low_price double,
volume double,
quote_volume double,
updated_at timestamp
);
2. latest_prices
Holds the latest streaming prices for every cryptocurrency.
CREATE TABLE latest_prices (
symbol text,
price double,
updated_at timestamp,
PRIMARY KEY (symbol, updated_at)
) WITH CLUSTERING ORDER BY (updated_at DESC);
3. klines
Stores candlestick (OHLCV) data for financial charting.
CREATE TABLE klines (
symbol text,
open_time timestamp,
open_price double,
high_price double,
low_price double,
close_price double,
volume double,
number_of_trades int,
PRIMARY KEY ((symbol), open_time)
);
4. order_book
Captures market depth snapshots, showing the distribution of buy (bids) and sell (asks) orders for each asset.
CREATE TABLE order_book (
symbol text,
bids text,
asks text,
updated_at timestamp,
PRIMARY KEY ((symbol), updated_at)
) WITH CLUSTERING ORDER BY (updated_at DESC);
5. recent_trades
Logs the most recent individual trades for each symbol, including their price, quantity, and timestamps.
CREATE TABLE recent_trades (
symbol text,
trade_id bigint,
price double,
quantity double,
trade_time timestamp,
is_buyer_maker boolean,
PRIMARY KEY ((symbol), trade_id)
);
Grafana Integration
Grafana was connected in two ways:
- Cassandra Data Source Plugin – for direct database queries.
- Infinity Plugin (HTTP JSON) – I experimented with Flask API endpoints.
This dual approach allowed us to:
- Set up a back-up connection of pre-aggregated data through the API.
- Query Cassandra directly for time-series or historical analysis.
Key Visualizations
1. Top 10 Most Traded Cryptocurrencies
Displays the highest trading volume across all assets over the last 24 hours.
Query:
SELECT symbol, volume FROM crypto_24h_stats;
Visualization: Bar Chart
Insight: Quickly identifies the most active markets.
Example Output:
2. Trade Count Overview (ETHUSDT)
Shows the number of trades executed for ETHUSDT over time.
Query:
SELECT open_time, number_of_trades
FROM klines
WHERE symbol = 'ETHUSDT'
ALLOW FILTERING;
Visualization: Time Series / Line Chart
Insight: Useful for spotting spikes in market activity.
Example Output:
Advantages of This Architecture
| Advantage | Description |
|---|---|
| Modularity | Each component can scale independently. |
| Transparency | Every transformation stage (PostgreSQL → Kafka → Cassandra) is observable. |
| Custom Logic | The manual consumer allows you to filter or compute before persisting. |
| Grafana Flexibility | Supports both direct and API-based data sources. |
| Resilience | Cassandra ensures high availability and fast queries for time-series data. |
Challenges Faced & How I Solved Them
| Challenge | Solution |
|---|---|
ORDER BY not supported for non-clustering columns in Cassandra |
Redesigned tables with clustering keys (updated_at) |
| JSON array fields (order book) hard to query | Flattened data during Kafka consumer insertion |
| Grafana timeouts using localhost | Switched to host.docker.internal for API access |
Results
- Real-time data ingestion from Binance API
- Fully automated CDC from PostgreSQL to Cassandra
- Dynamic dashboards for price, volume, and trade count analytics
- Live financial-style visualizations similar to exchange UIs
The final Grafana dashboard effectively shows market volatility, trading activity, and price trends, all in real time.
Future Enhancements
- Use Kafka Streams for real-time computations before Cassandra.
- Add alerting rules in Grafana (e.g., trigger if ETHUSDT price drops 5%).
- Introduce machine learning models for predictive analytics.
Conclusion
This project demonstrates how open-source tools can be combined to build a real-time financial data analytics system with high performance and flexibility.
By avoiding a sink connector, I gained greater transparency, data control, and adaptability, all while maintaining a clean and observable data pipeline.


Top comments (0)