DEV Community

Evgenii Orlov
Evgenii Orlov

Posted on

I built pq - the jq of Parquet. Here's why data engineers need a better CLI

I got tired of spinning up DuckDB or writing throwaway Python just to peek inside a Parquet file. So I built pq - a single binary CLI (Rust) that handles the full Parquet workflow from your terminal

Quick taste:

  • pq data.parquet — metadata, schema, compression, row groups at a glance
  • pq head -n 5 -c id,name s3://bucket/data.parquet — preview specific columns directly from S3
  • pq schema extract --ddl postgres data.parquet — generate CREATE TABLE (supports Postgres, ClickHouse, DuckDB, Spark, BigQuery, Snowflake, Redshift, MySQL)
  • pq check --contract contract.toml data/ — validate file structure and data contracts in CI
  • pq schema diff a.parquet b.parquet — catch schema drift between files
  • pq compact data/ -o s3://bucket/compacted/ — merge small files into optimal sizes
  • pq convert raw/*.csv -o parquet/ — batch convert CSV/JSON to Parquet

It auto-detects output format (table on TTY, JSON when piped), supports glob patterns, and works with S3, GCS, Azure Blob, and Cloudflare R2.

Install: brew install OrlovEvgeny/pq/pq or cargo install pq-parquet

What I'd love feedback on: What's your current Parquet inspection workflow? What commands would make this indispensable for your day-to-day?

GitHub: https://github.com/OrlovEvgeny/pq

Top comments (0)