I built pq - the jq of Parquet. Here's why data engineers need a better CLI

#rust #cli #dataengineering #opensource

I got tired of spinning up DuckDB or writing throwaway Python just to peek inside a Parquet file. So I built pq - a single binary CLI (Rust) that handles the full Parquet workflow from your terminal

Quick taste:

pq data.parquet — metadata, schema, compression, row groups at a glance
pq head -n 5 -c id,name s3://bucket/data.parquet — preview specific columns directly from S3
pq schema extract --ddl postgres data.parquet — generate CREATE TABLE (supports Postgres, ClickHouse, DuckDB, Spark, BigQuery, Snowflake, Redshift, MySQL)
pq check --contract contract.toml data/ — validate file structure and data contracts in CI
pq schema diff a.parquet b.parquet — catch schema drift between files
pq compact data/ -o s3://bucket/compacted/ — merge small files into optimal sizes
pq convert raw/*.csv -o parquet/ — batch convert CSV/JSON to Parquet