DEV Community

Cover image for Polyglot Data Engineering: Python + Go in the Same Pipeline
Mohamed Hussain S
Mohamed Hussain S

Posted on

Polyglot Data Engineering: Python + Go in the Same Pipeline

Hey Devs πŸ‘‹,

If you're exploring modern data engineering stacks or curious about mixing languages in one pipeline - this post is for you!

I wanted to try out something lightweight but real:
Using Python for data prep and Go for high-speed ingestion into ClickHouse.

Here’s what I built, how it works, and what I learned πŸ‘‡

πŸ”— GitHub Repo


πŸ“¦ What This Project Does

This is a beginner-friendly, containerized mini-project that shows how a polyglot pipeline can work:

🐍 Python β€” generates and prepares sample data
πŸ“ Converts the data into a Parquet file
⚑ Go β€” reads the Parquet file and inserts into ClickHouse
🐳 Everything runs locally using Docker Compose


πŸ› οΈ Tech Stack

  • Python β€” flexible for data prep & Parquet generation
  • Go β€” blazing fast for inserting data into ClickHouse
  • ClickHouse β€” lightning-fast OLAP DB
  • Docker Compose β€” to spin up ClickHouse locally
  • Parquet β€” efficient columnar storage format

βš™οΈ How To Run It Locally

Step 1. Clone the repo

git clone https://github.com/mohhddhassan/go-clickhouse-parquet.git
cd go-clickhouse-parquet
Enter fullscreen mode Exit fullscreen mode

Step 2. Generate sample Parquet data with Python

cd python
python3 generate_parquet.py
Enter fullscreen mode Exit fullscreen mode

Step 3. Start ClickHouse using Docker Compose

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

Step 4. Run the Go app to ingest data

cd go
go run main.go
Enter fullscreen mode Exit fullscreen mode

πŸ—‚οΈ Project Structure

go-clickhouse-parquet/
β”œβ”€β”€ docker-compose.yml         # ClickHouse setup
β”œβ”€β”€ parquet-files/
β”‚   └── sample.parquet         # Auto-generated test file
β”œβ”€β”€ python/
β”‚   └── generate_parquet.py    # Script to create data
└── go/
    β”œβ”€β”€ go.mod
    β”œβ”€β”€ go.sum
    └── main.go                # Ingests Parquet into ClickHouse
Enter fullscreen mode Exit fullscreen mode

🀯 What I Learned

πŸ’‘ How to generate Parquet programmatically with Python
πŸ’‘ Using Go to connect with ClickHouse and perform inserts
πŸ’‘ Deploying ClickHouse quickly with Docker Compose
πŸ’‘ The idea of polyglot pipelines β€” mixing languages for their strengths


πŸ” Why You Should Try This

If you're learning data engineering or systems programming:

  • Practice mixing Python + Go in real-world data movement
  • Get hands-on with Parquet files (a must-have in analytics)
  • See how ClickHouse handles fast inserts and queries
  • Get used to wiring up multiple components into a pipeline

πŸ“Œ What’s Next?

πŸ“ˆ Build a dashboard on top of ClickHouse
βš™οΈ Try streaming Parquet data into ClickHouse
πŸ“‚ Experiment with more complex schemas
πŸš€ Benchmark Python vs Go for performance in the pipeline


πŸ™‹β€β™‚οΈ About Me

Mohamed Hussain S
Associate Data Engineer
LinkedIn | GitHub

πŸ§ͺ Building one mini project at a time to become a better data engineer.


Top comments (0)