<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: adam2go</title>
    <description>The latest articles on DEV Community by adam2go (@adam2go).</description>
    <link>https://dev.to/adam2go</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3978797%2F86448d74-07d1-433e-814c-f06b03d11655.jpeg</url>
      <title>DEV Community: adam2go</title>
      <link>https://dev.to/adam2go</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/adam2go"/>
    <language>en</language>
    <item>
      <title>How a pure-Python jq ended up 40x faster than the C bindings</title>
      <dc:creator>adam2go</dc:creator>
      <pubDate>Thu, 11 Jun 2026 06:13:33 +0000</pubDate>
      <link>https://dev.to/adam2go/how-a-pure-python-jq-ended-up-40x-faster-than-the-c-bindings-cpb</link>
      <guid>https://dev.to/adam2go/how-a-pure-python-jq-ended-up-40x-faster-than-the-c-bindings-cpb</guid>
      <description>&lt;p&gt;I spent yesterday building &lt;a href="https://github.com/adam2go/purejq" rel="noopener noreferrer"&gt;purejq&lt;/a&gt;, a&lt;br&gt;
pure-Python implementation of jq. I expected it to be the slow-but-portable&lt;br&gt;
option. Then I benchmarked it against the &lt;code&gt;jq&lt;/code&gt; package on PyPI (the C&lt;br&gt;
bindings everyone uses to run jq from Python) and got this, on a 100k-object&lt;br&gt;
array, in-process:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;workload&lt;/th&gt;
&lt;th&gt;purejq&lt;/th&gt;
&lt;th&gt;jq PyPI (C bindings)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;field-access stream&lt;/td&gt;
&lt;td&gt;9 ms&lt;/td&gt;
&lt;td&gt;368 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;filter + count&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;td&gt;442 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;map + aggregate&lt;/td&gt;
&lt;td&gt;18 ms&lt;/td&gt;
&lt;td&gt;444 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;group_by&lt;/td&gt;
&lt;td&gt;112 ms&lt;/td&gt;
&lt;td&gt;704 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;transform + sort&lt;/td&gt;
&lt;td&gt;136 ms&lt;/td&gt;
&lt;td&gt;899 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pure Python, 7-40x faster than the C extension. That number looked wrong to&lt;br&gt;
me too, so before publishing anything I made the benchmark script verify&lt;br&gt;
every output against the actual jq binary first (&lt;code&gt;tools/bench.py --verify&lt;/code&gt;),&lt;br&gt;
re-ran everything as median-of-7, and gave the bindings their best-case API.&lt;br&gt;
The gap is real. Here's why.&lt;/p&gt;
&lt;h2&gt;
  
  
  The serialization tax
&lt;/h2&gt;

&lt;p&gt;The C bindings wrap real jq, and real jq only speaks JSON. So every call&lt;br&gt;
does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your dicts -&amp;gt; JSON text -&amp;gt; C parser -&amp;gt; jq evaluates -&amp;gt; JSON text -&amp;gt; dicts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That round trip costs about 350-450 ms for 100k small objects on my&lt;br&gt;
machine, before any actual filtering happens. You can see it in the numbers:&lt;br&gt;
even a trivial field access pays the same ~400 ms floor as a group_by.&lt;/p&gt;

&lt;p&gt;purejq skips the trip entirely. It compiles the jq program once into Python&lt;br&gt;
closures and walks your dicts and lists directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;purejq&lt;/span&gt;

&lt;span class="n"&gt;prog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;purejq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;group_by(.team) | map({team: .[0].team, n: length})&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;prog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# operates on your objects, no serialization
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lesson generalizes beyond jq: when you embed a C library that has its&lt;br&gt;
own data model, the marshaling boundary is often more expensive than the&lt;br&gt;
work. An interpreter written &lt;em&gt;in&lt;/em&gt; your language gets to skip the boundary,&lt;br&gt;
and that can buy back an order of magnitude.&lt;/p&gt;
&lt;h2&gt;
  
  
  Surprise number two: the CLI beats the jq binary on big files
&lt;/h2&gt;

&lt;p&gt;This one I really didn't expect. End to end on a 93 MB file (1M objects),&lt;br&gt;
parse + filter + output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;workload&lt;/th&gt;
&lt;th&gt;purejq CLI&lt;/th&gt;
&lt;th&gt;jq 1.8.1 binary&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;single lookup&lt;/td&gt;
&lt;td&gt;0.51 s&lt;/td&gt;
&lt;td&gt;1.68 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;filter + count&lt;/td&gt;
&lt;td&gt;1.08 s&lt;/td&gt;
&lt;td&gt;1.96 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;group_by&lt;/td&gt;
&lt;td&gt;2.32 s&lt;/td&gt;
&lt;td&gt;3.89 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No trick here either, just arithmetic: on large files, most of the wall&lt;br&gt;
clock goes to parsing JSON, and CPython's C-backed &lt;code&gt;json&lt;/code&gt; module parses&lt;br&gt;
at ~130 MB/s on my machine (orjson does ~220 MB/s, purejq uses it when&lt;br&gt;
installed). jq's built-in parser is slower than both. purejq's actual&lt;br&gt;
filter evaluation is slower than jq's C engine, but it's sitting behind a&lt;br&gt;
faster parser, and the parser dominates.&lt;/p&gt;

&lt;p&gt;To be fair to jq: on already-parsed streams in a shell pipeline, or small&lt;br&gt;
inputs in a tight loop, the C binary still wins comfortably. If that's&lt;br&gt;
your workload, keep using jq.&lt;/p&gt;
&lt;h2&gt;
  
  
  What keeps the Python side from being embarrassing
&lt;/h2&gt;

&lt;p&gt;A few things mattered more than I expected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compile once, run many.&lt;/strong&gt; Programs become nested Python closures;
evaluation never touches the AST again.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static binding.&lt;/strong&gt; If a program never redefines &lt;code&gt;select&lt;/code&gt;, the call is
resolved at compile time instead of walking scopes at runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single-output fast paths.&lt;/strong&gt; Things like &lt;code&gt;.score * 2 + 1&lt;/code&gt; provably
yield exactly one value, so they compile to plain function calls instead
of generators. Object literals with constant keys skip the generator
product entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let C do the sorting.&lt;/strong&gt; When sort keys are uniformly strings or
numbers, &lt;code&gt;sort_by&lt;/code&gt;/&lt;code&gt;group_by&lt;/code&gt;/&lt;code&gt;unique&lt;/code&gt; fall through to Python's native
sort instead of a comparison callback. That one change was worth 5x on
sort-heavy workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPy for free.&lt;/strong&gt; Pure Python means PyPy just works: another 2-9x on
heavy workloads (map+aggregate drops from 18 ms to 2 ms).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Trust, but verify
&lt;/h2&gt;

&lt;p&gt;Claiming "it's jq" is easy; the repo vendors jq's official test suite and&lt;br&gt;
runs it in CI on CPython 3.9-3.14 and PyPy. 751 of 781 cases pass (96.2%),&lt;br&gt;
and the 30 failures are listed in a file with reasons: no module system&lt;br&gt;
yet, integers stay arbitrary-precision instead of rounding to doubles, and&lt;br&gt;
a few error-message wordings.&lt;/p&gt;

&lt;p&gt;One more disclosure, since the commit history shows it anyway: I'm a&lt;br&gt;
product manager, not a programmer, and I built this with Claude in a day.&lt;br&gt;
My role was picking the target, insisting on jq's own test suite as the&lt;br&gt;
acceptance bar, and being suspicious of every benchmark number until it&lt;br&gt;
had a verification path. I can't review the code line by line; I can read&lt;br&gt;
a conformance percentage and a &lt;code&gt;--verify&lt;/code&gt; output. Make of that what you&lt;br&gt;
will.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;purejq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="https://github.com/adam2go/purejq" rel="noopener noreferrer"&gt;https://github.com/adam2go/purejq&lt;/a&gt; — issues and PRs welcome,&lt;br&gt;
especially if you try it on Pyodide or PyPy.&lt;/p&gt;

</description>
      <category>python</category>
      <category>performance</category>
      <category>json</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
