DEV Community

Cover image for ⚑ Kafka ClickHouse: Real-Time Data Pipeline for Beginners
Mohamed Hussain S
Mohamed Hussain S

Posted on • Edited on

⚑ Kafka ClickHouse: Real-Time Data Pipeline for Beginners

Hey Devs πŸ‘‹,

I'm Mohamed Hussain S, currently working as an Associate Data Engineer Intern.
After building a batch pipeline with Airflow and Postgres, I wanted to step into the real-time data world β€” so I created this lightweight Kafka β†’ ClickHouse pipeline.

If you’re curious how streaming data pipelines actually work (beyond just theory), this one’s for you 🎯


πŸš€ What This Project Does

βœ… Generates mock user data (name, email, age)
βœ… Sends each message to a Kafka topic called user-signups
βœ… A ClickHouse Kafka engine table listens for those messages
βœ… A materialized view pushes clean data into a persistent table
βœ… All of this runs in Docker for easy setup and teardown

It’s super lightweight and totally beginner-friendly β€” perfect for learning how Kafka and ClickHouse can work together.


🧰 Tech Stack

  • Python β€” Kafka producer to simulate user signups
  • Kafka β€” distributed streaming platform
  • ClickHouse β€” OLAP database with native Kafka support
  • Docker β€” to spin up Kafka, Zookeeper, and ClickHouse
  • SQL β€” to define engine tables and views in ClickHouse

πŸ—‚οΈ Project Structure

kafka-clickhouse-pipeline/
β”œβ”€β”€ producer/              # Python Kafka producer
β”œβ”€β”€ clickhouse-setup.sql  # SQL to set up ClickHouse tables
β”œβ”€β”€ docker-compose.yml    # All services defined here
β”œβ”€β”€ screenshots/          # CLI outputs, topic messages, etc.
└── README.md              # Everything documented here
Enter fullscreen mode Exit fullscreen mode

βš™οΈ How It Works

  1. Run docker-compose up β€” spins up Kafka, Zookeeper & ClickHouse
  2. Run the SQL file to create:
  • Kafka engine table
  • Materialized view
  • Target users table
    1. Start the Python producer β€” sends mock user data to Kafka
    2. ClickHouse listens to the topic and stores data via materialized view
    3. Boom β€” your real-time pipeline is up and running!

πŸ§ͺ Example Output

A single message sent to Kafka looks like this:

{"name": "Alice", "email": "alice@example.com", "age": 24}
Enter fullscreen mode Exit fullscreen mode

And the users table in ClickHouse will store it like this:

name email age
Alice alice@example.com 24

Check the screenshots/ folder in the repo to see the whole thing in action πŸ“Έ


🧠 Key Learnings

βœ… How Kafka producers work with Python
βœ… Setting up Kafka topics and brokers in Docker
βœ… How ClickHouse can natively consume Kafka messages
βœ… How materialized views automate transformation & insert
βœ… Containerized orchestration made simple with Docker


πŸ’‘ What’s Next?

πŸ” Add a proper Kafka consumer (Python-based) as an alt to ClickHouse ingestion
πŸ” Add logging, retries, and dead-letter queue logic
πŸ“ˆ Simulate more complex streaming use cases like page visits
πŸ“Š Plug in Grafana for real-time metrics from ClickHouse


πŸ“Œ Why You Should Try This

If you're exploring real-time data engineering:

  • Start with Kafka and Python β€” it’s intuitive and powerful
  • ClickHouse’s Kafka engine + materialized view combo = πŸ’―
  • Docker lets you test and learn without messing up your local setup

This small project helped me understand the data flow in real-time systems β€” not just conceptually, but hands-on.


πŸ”— Repo

πŸ‘‰ GitHub Repo:


πŸ™‹β€β™‚οΈ About Me

Mohamed Hussain S
Associate Data Engineer Intern
LinkedIn | GitHub


βš™οΈ Building in public β€” one stream at a time.


Top comments (0)