<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Evgenii Orlov</title>
    <description>The latest articles on DEV Community by Evgenii Orlov (@eorlov).</description>
    <link>https://dev.to/eorlov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3843358%2Ffadf6c4f-487c-42ea-930c-f0fd56fd73d8.png</url>
      <title>DEV Community: Evgenii Orlov</title>
      <link>https://dev.to/eorlov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eorlov"/>
    <language>en</language>
    <item>
      <title>I built pq - the jq of Parquet. Here's why data engineers need a better CLI</title>
      <dc:creator>Evgenii Orlov</dc:creator>
      <pubDate>Wed, 25 Mar 2026 14:45:09 +0000</pubDate>
      <link>https://dev.to/eorlov/i-built-pq-the-jq-of-parquet-heres-why-data-engineers-need-a-better-cli-393e</link>
      <guid>https://dev.to/eorlov/i-built-pq-the-jq-of-parquet-heres-why-data-engineers-need-a-better-cli-393e</guid>
      <description>&lt;p&gt;I got tired of spinning up DuckDB or writing throwaway Python just to peek inside a Parquet file. So I built pq - a single binary CLI (Rust) that handles the full Parquet workflow from your terminal&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick taste:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pq data.parquet&lt;/code&gt; — metadata, schema, compression, row groups at a glance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq head -n 5 -c id,name s3://bucket/data.parquet&lt;/code&gt; — preview specific columns directly from S3&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq schema extract --ddl postgres data.parquet&lt;/code&gt; — generate CREATE TABLE (supports Postgres, ClickHouse, DuckDB, Spark, BigQuery, Snowflake, Redshift, MySQL)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq check --contract contract.toml data/&lt;/code&gt; — validate file structure and data contracts in CI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq schema diff a.parquet b.parquet&lt;/code&gt; — catch schema drift between files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq compact data/ -o s3://bucket/compacted/&lt;/code&gt; — merge small files into optimal sizes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pq convert raw/*.csv -o parquet/&lt;/code&gt; — batch convert CSV/JSON to Parquet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It auto-detects output format (table on TTY, JSON when piped), supports glob patterns, and works with S3, GCS, Azure Blob, and Cloudflare R2.&lt;/p&gt;

&lt;p&gt;Install: &lt;code&gt;brew install OrlovEvgeny/pq/pq&lt;/code&gt; or &lt;code&gt;cargo install pq-parquet&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'd love feedback on:&lt;/strong&gt; What's your current Parquet inspection workflow? What commands would make this indispensable for your day-to-day?&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/OrlovEvgeny/pq" rel="noopener noreferrer"&gt;https://github.com/OrlovEvgeny/pq&lt;/a&gt;&lt;/p&gt;

</description>
      <category>rust</category>
      <category>cli</category>
      <category>dataengineering</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
