Hey Devs π,
If you're exploring modern data engineering stacks or curious about mixing languages in one pipeline - this post is for you!
I wanted to try out something lightweight but real:
Using Python for data prep and Go for high-speed ingestion into ClickHouse.
Hereβs what I built, how it works, and what I learned π
π GitHub Repo
π¦ What This Project Does
This is a beginner-friendly, containerized mini-project that shows how a polyglot pipeline can work:
π Python β generates and prepares sample data
π Converts the data into a Parquet file
β‘ Go β reads the Parquet file and inserts into ClickHouse
π³ Everything runs locally using Docker Compose
π οΈ Tech Stack
- Python β flexible for data prep & Parquet generation
- Go β blazing fast for inserting data into ClickHouse
- ClickHouse β lightning-fast OLAP DB
- Docker Compose β to spin up ClickHouse locally
- Parquet β efficient columnar storage format
βοΈ How To Run It Locally
Step 1. Clone the repo
git clone https://github.com/mohhddhassan/go-clickhouse-parquet.git
cd go-clickhouse-parquet
Step 2. Generate sample Parquet data with Python
cd python
python3 generate_parquet.py
Step 3. Start ClickHouse using Docker Compose
docker compose up -d
Step 4. Run the Go app to ingest data
cd go
go run main.go
ποΈ Project Structure
go-clickhouse-parquet/
βββ docker-compose.yml # ClickHouse setup
βββ parquet-files/
β βββ sample.parquet # Auto-generated test file
βββ python/
β βββ generate_parquet.py # Script to create data
βββ go/
βββ go.mod
βββ go.sum
βββ main.go # Ingests Parquet into ClickHouse
π€― What I Learned
π‘ How to generate Parquet programmatically with Python
π‘ Using Go to connect with ClickHouse and perform inserts
π‘ Deploying ClickHouse quickly with Docker Compose
π‘ The idea of polyglot pipelines β mixing languages for their strengths
π Why You Should Try This
If you're learning data engineering or systems programming:
- Practice mixing Python + Go in real-world data movement
- Get hands-on with Parquet files (a must-have in analytics)
- See how ClickHouse handles fast inserts and queries
- Get used to wiring up multiple components into a pipeline
π Whatβs Next?
π Build a dashboard on top of ClickHouse
βοΈ Try streaming Parquet data into ClickHouse
π Experiment with more complex schemas
π Benchmark Python vs Go for performance in the pipeline
πββοΈ About Me
Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub
π§ͺ Building one mini project at a time to become a better data engineer.
Top comments (0)