I got tired of spinning up DuckDB or writing throwaway Python just to peek inside a Parquet file. So I built pq - a single binary CLI (Rust) that handles the full Parquet workflow from your terminal
Quick taste:
-
pq data.parquet— metadata, schema, compression, row groups at a glance -
pq head -n 5 -c id,name s3://bucket/data.parquet— preview specific columns directly from S3 -
pq schema extract --ddl postgres data.parquet— generate CREATE TABLE (supports Postgres, ClickHouse, DuckDB, Spark, BigQuery, Snowflake, Redshift, MySQL) -
pq check --contract contract.toml data/— validate file structure and data contracts in CI -
pq schema diff a.parquet b.parquet— catch schema drift between files -
pq compact data/ -o s3://bucket/compacted/— merge small files into optimal sizes -
pq convert raw/*.csv -o parquet/— batch convert CSV/JSON to Parquet
It auto-detects output format (table on TTY, JSON when piped), supports glob patterns, and works with S3, GCS, Azure Blob, and Cloudflare R2.
Install: brew install OrlovEvgeny/pq/pq or cargo install pq-parquet
What I'd love feedback on: What's your current Parquet inspection workflow? What commands would make this indispensable for your day-to-day?
Top comments (0)