Introduction
Ever wondered how trading platforms display live crypto prices? In this article, I'll show you how I built a fully automated real-time data pipeline that streams cryptocurrency data from Binance and visualizes it like a Bloomberg Terminal - completely open source!
What you'll learn:
- Setting up Change Data Capture (CDC) with Debezium
- Building event-driven architectures with Kafka
- Handling time-series data at scale with Cassandra
- Creating real-time dashboards with Grafana
Tech Stack: Python | PostgreSQL | Debezium | Apache Kafka | Cassandra | Grafana | Docker
Why I Built This
I wanted to understand how major trading platforms handle real-time data at scale. Instead of just reading about it, I decided to build a production-grade pipeline that could:
- Handle thousands of price updates per minute
- Never lose data even if services crash
- Provide instant insights through dashboards
- Scale horizontally as data grows
This project taught me more about distributed systems in one month than a year of tutorials.
Challenges I Faced
Challenge 1: Database Polling Overhead
Initially, I was polling PostgreSQL every second. CPU usage was 80%+!
Solution: Implemented Debezium CDC using PostgreSQL's replication log. CPU dropped to 5%.
Challenge 2: Data Loss During Failures
When Cassandra went down, data disappeared.
Solution: Kafka acts as a durable buffer - it stores events until consumers catch up.
Challenge 3: Time-Series Query Performance
PostgreSQL struggled with millions of time-series records.
Solution: Moved analytics workload to Cassandra, optimized for time-series data.
What We've Achieved
Real-Time Data Collection: Automatically fetches live crypto market data from Binance every 3600 seconds
Automated Data Pipeline: Data flows seamlessly from Binance → PostgreSQL → Debezium CDC → Kafka → Cassandra without manual intervention
Change Data Capture (CDC): Allows this system to detect new data in PostgreSQL in realtime without polling. Instead of repeatedly querying the database, Debezium listens to changes directly through PostgreSQL's replication log, ensuring near-zero latency and minimal load.
Scalable Architecture: Built with enterprise-grade technologies (Debezium, Kafka, Cassandra) that can handle millions of records
Beautiful Visualizations: Ready-to-use Grafana dashboards for monitoring crypto markets
Architecture Overview
Binance API → PostgreSQL → Debezium CDC → Kafka → Cassandra → Grafana
↓ ↓ ↓ ↓ ↓ ↓
Prices Primary Change Message Fast Beautiful
Stats Storage Detection Queue Storage Dashboards
↓ ↓
Every INSERT Stream Changes
End-to-end pipeline from data ingestion to visualization.
Components Breakdown
-
Binance Data Collector (Python)
- Fetches 5 types of market data: prices, 24hr stats, order books, recent trades, and candlestick data
- Writes data to PostgreSQL every 3600 seconds
-
PostgreSQL Database
- Primary storage for all crypto market data
- Stores historical data with timestamps
- Change Data Capture (CDC) enabled via logical replication
-
Debezium Change Data Capture
- Automatically detects and captures database changes in real-time
- Monitors PostgreSQL for INSERT, UPDATE, DELETE operations
- Converts database changes into Kafka messages
- No impact on database performance
-
Apache Kafka
- Kafka acts as a real-time buffer between Debezium and Cassandra, ensuring data reliability. If Cassandra goes down, no data is lost. Kafka stores all change events until Cassandra comes back online.
-
Cassandra Sink Connector
- The Cassandra Sink Connector (Datastax) continuously listens to Kafka topics and mirrors every change into Cassandra table that match the PostgreSQL schema.
-
Apache Cassandra
- Fast, distributed database optimized for time-series data
- Powers our real-time dashboards
- Stores denormalized data for quick reads
-
Grafana Dashboards
- Visual interface for exploring crypto market data
- Live charts and analytics
Data We Collect
| Data Type | Description | Update Frequency |
|---|---|---|
| Prices | Latest price for all trading pairs | Every 3600 seconds |
| 24hr Stats | Price changes, volumes, and market movements | Every 3600 seconds |
| Order Books | Current buy/sell orders | Every 3600 seconds |
| Recent Trades | Latest market transactions | Every 3600 seconds |
| Candlesticks | Historical price patterns (OHLCV) | Every 6000 seconds |

Live dashboard displaying top-performing cryptocurrencies by 24h change.
Getting Started
Prerequisites
- Docker and Docker Compose installed on your computer
Quick Start (3 Steps)
- Create environment file
# Create a .env file with these contents:
POSTGRES_USER=crypto_user
POSTGRES_PASSWORD=crypto_pass
POSTGRES_DB=crypto_db
- Start everything
docker compose up --build -d
Shows container orchestration success
-
View your dashboards
- Open
http://localhost:3000in your browser - Login: admin / admin
- Explore the crypto market data!
- Open
Project Screenshots
Kafka UI - Monitoring Topics & Connectors

Kafka UI providing a real-time view of all Kafka topics, internal connector states, and message traffic.
The Kafka UI interface offers an intuitive dashboard for monitoring the Kafka ecosystem:
-
Topics View – Displays all internal and user-created topics such as
crypto_prices,crypto_order_book, and more. -
Consumers View – Shows active sink connectors and other consumers reading from Kafka topics (e.g.,
cassandra-sink). - Cluster Health – Visualizes broker status, topic replication, and partition metrics.
This provides a richer, more interactive way to inspect data flow across Kafka.
Architecture & Data Flow

Image showing sample query in our PostgreSQL database
A snap showing a query in our analytics database Cassandra
Configuration Files
-
docker-compose.yml- Orchestrates all services -
connectors/cassandra-sink.json- Cassandra data sink configuration -
connectors/postgres-source-temp.json- PostgreSQL change data capture configuration -
scripts/binance_ingestor.py- Main data collection script
Current Data Statistics
- Total Records Collected: Over 1.8 million rows
- Active Tables: 5 (prices, stats, order books, trades, candlesticks)
- Update Frequency: Every 60 seconds
- Data Sources: Binance REST API
- Storage: PostgreSQL (primary) + Cassandra (analytics)
Grafana Dashboards
Our dashboards provide:
- Real-time price monitoring across all trading pairs
- 24-hour market analysis with price changes and volumes
- Order book depth visualization
- Trade history with buy/sell indicators
- Candlestick charts for technical analysis
Troubleshooting
Check if services are running
docker ps
View data in PostgreSQL
docker exec postgres psql -U crypto_user -d crypto_db -c "SELECT * FROM crypto_prices LIMIT 10;"
View data in Cassandra
docker exec cassandra cqlsh -e "SELECT * FROM crypto_keyspace.crypto_prices LIMIT 10;"
Check connector status
curl -sS http://localhost:8083/connectors | jq

REST response of CDC pipeline configuration
Key Features
Fully Automated - Set it and forget it, data collects automatically
Real-Time - New data every 3600 seconds
Rich Visualizations - Beautiful Grafana dashboards out of the box
Reliable - Built on proven enterprise technologies
Scalable - Can handle millions of records effortlessly
Learn More
This project demonstrates:
- Change Data Capture (CDC) with Debezium - automatically captures database changes
- Real-time data streaming with Apache Kafka - reliable message queuing
- Time-series data storage with Cassandra - optimized for analytics
- Data visualization with Grafana - beautiful dashboards
- Microservices architecture with Docker - containerized services
How Change Data Capture Works
- Python script inserts data into PostgreSQL every 3600 seconds
- Debezium connector watches PostgreSQL for changes using logical replication
- When new rows are inserted, Debezium captures them automatically
- Changes are converted to JSON messages and sent to Kafka topics
- Cassandra sink connector consumes these messages and writes to Cassandra
- Result: Zero manual intervention - data flows automatically!
Support
For questions or issues, please check the logs:
docker logs binance_ingestor
docker logs debezium-connect
Explore the Full Project
You can find the complete source code, Docker setup, and connector configurations on GitHub:





Top comments (0)