DEV Community

Cover image for From Python to ClickHouse: Parquet ETL with Go
Mohamed Hussain S
Mohamed Hussain S

Posted on

From Python to ClickHouse: Parquet ETL with Go

Hey Devs πŸ‘‹,

If you're exploring modern data engineering stacks or want to try out ClickHouse with Go and Python β€” this post is for you!

I wanted to experiment with something lightweight but real:
Generating a Parquet file using Python and loading it into ClickHouse using Go.

Here’s what I built, how it works, and what I learned πŸ‘‡

πŸ”— GitHub Repo


πŸ“¦ What This Project Does

This is a beginner-friendly, containerized mini-project that:

πŸ§ͺ Generates sample data using a Python script
πŸ“ Converts it into a Parquet file
πŸ” Loads the data into a ClickHouse table using a Go app
🐳 Runs locally using Docker Compose


πŸ› οΈ Tech Stack

  • Python β€” to generate Parquet files
  • Go β€” to read Parquet and insert into ClickHouse
  • ClickHouse β€” lightning-fast OLAP DB
  • Docker Compose β€” to simplify ClickHouse setup
  • Parquet β€” for efficient columnar storage

βš™οΈ How To Run It Locally

Step 1. Clone the repo

   git clone https://github.com/mohhddhassan/go-clickhouse-parquet.git
   cd go-clickhouse-parquet
Enter fullscreen mode Exit fullscreen mode

Step 2. Generate sample Parquet data

   cd python
   python3 generate_parquet.py
Enter fullscreen mode Exit fullscreen mode

Step 3. Start ClickHouse using Docker Compose

   docker-compose up -d
Enter fullscreen mode Exit fullscreen mode

Step 4. Run the Go app to ingest data

   cd go
   go run main.go
Enter fullscreen mode Exit fullscreen mode

πŸ—‚οΈ Project Structure

go-clickhouse-parquet/
β”œβ”€β”€ docker-compose.yml         # ClickHouse setup
β”œβ”€β”€ parquet-files/
β”‚   └── sample.parquet         # Auto-generated test file
β”œβ”€β”€ python/
β”‚   └── generate_parquet.py    # Script to create data
└── go/
    β”œβ”€β”€ go.mod
    β”œβ”€β”€ go.sum
    └── main.go                # Ingests Parquet into ClickHouse
Enter fullscreen mode Exit fullscreen mode

🀯 What I Learned

πŸ’‘ How to programmatically create Parquet files
πŸ’‘ Connecting Go with ClickHouse and executing inserts
πŸ’‘ Using Docker Compose to deploy ClickHouse quickly
πŸ’‘ Structuring a mini ETL workflow with multiple languages


πŸ” Why You Should Try This

If you're learning data engineering or systems programming:

  • Try combining Python + Go for real-world data movement
  • Practice building and using Parquet files β€” they're everywhere in analytics
  • Explore ClickHouse and see how blazing fast OLAP can be
  • Get used to wiring up different components in a real pipeline

πŸ“Œ What’s Next?

πŸ“ˆ Build a ClickHouse dashboard on top of this data
βš™οΈ Try streaming Parquet data into ClickHouse
πŸ“‚ Expand schema complexity for more realistic ingestion
πŸ› οΈ Benchmark Go vs Python for loading speed into ClickHouse


πŸ™‹β€β™‚οΈ About Me

Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub

πŸ§ͺ Building one mini project at a time to become a better data engineer.


Top comments (0)