<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tanay Kedia</title>
    <description>The latest articles on DEV Community by Tanay Kedia (@tanayk07).</description>
    <link>https://dev.to/tanayk07</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1358192%2Fa56439d7-f338-4a51-b229-a0715e4b7d99.jpeg</url>
      <title>DEV Community: Tanay Kedia</title>
      <link>https://dev.to/tanayk07</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tanayk07"/>
    <language>en</language>
    <item>
      <title>Your ORM is the bottleneck</title>
      <dc:creator>Tanay Kedia</dc:creator>
      <pubDate>Sun, 24 May 2026 08:14:07 +0000</pubDate>
      <link>https://dev.to/tanayk07/your-orm-is-the-bottleneck-11l2</link>
      <guid>https://dev.to/tanayk07/your-orm-is-the-bottleneck-11l2</guid>
      <description>&lt;p&gt;SQLite handles 88,000 writes per second on commodity hardware. My ORM caps at 3,800. PRAGMA tuning does not move that number.&lt;/p&gt;

&lt;p&gt;I learned this the hard way, over 9.4 hours of benchmarking. 11 SQLite configurations, 230 million rows, 37 measured datapoints. Across every config I threw at it, ORM throughput landed between 3,045 and 3,821 rows per second. A 26% spread on the thing every guide on the internet is busy tuning. Same workload, rewritten as raw &lt;code&gt;executemany&lt;/code&gt;, did 87,893.&lt;/p&gt;

&lt;p&gt;The database was fine. It was always fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  The headline numbers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;10M rows&lt;/th&gt;
&lt;th&gt;50M rows&lt;/th&gt;
&lt;th&gt;p99 latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQLAlchemy ORM&lt;/td&gt;
&lt;td&gt;3,696 r/s (45 min)&lt;/td&gt;
&lt;td&gt;3,682 r/s (3.8 hrs)&lt;/td&gt;
&lt;td&gt;1,492 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;87,893 r/s (1.9 min)&lt;/td&gt;
&lt;td&gt;65,742 r/s (12.7 min)&lt;/td&gt;
&lt;td&gt;478 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speedup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23.8×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;17.9×&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.1×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Zero errors across all runs. Every config. Every scale.&lt;/p&gt;

&lt;p&gt;What I actually ran:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;9.4 hours of continuous benchmarking&lt;/li&gt;
&lt;li&gt;230 million rows written across all runs&lt;/li&gt;
&lt;li&gt;11 configurations × 3 checkpoints + 2 paths × 2 scales = 37 datapoints&lt;/li&gt;
&lt;li&gt;Single host, NVMe storage, fresh database file per run&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why I started running this
&lt;/h2&gt;

&lt;p&gt;SQLite is having a moment. Rails 8 &lt;a href="https://rubyonrails.org/2024/11/7/rails-8-no-paas-required" rel="noopener noreferrer"&gt;ships with it&lt;/a&gt; as a first-class production database. Kent C. Dodds &lt;a href="https://kentcdodds.com/blog/i-migrated-from-a-postgres-cluster-to-distributed-sqlite-with-litefs" rel="noopener noreferrer"&gt;moved off a Postgres cluster&lt;/a&gt; to run it. Expensify is doing &lt;a href="https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server" rel="noopener noreferrer"&gt;4 million queries per second&lt;/a&gt; on a single SQLite file. Cloudflare D1 runs it at the edge with &lt;a href="https://dev.to/whoffagents/cloudflare-d1-sqlite-at-the-edge-after-6-months-in-production-551j"&gt;8 ms P99 reads&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So I sat down to tune mine. WAL, mmap, cache size, sync mode, the usual PRAGMA folklore. I had a question I wanted answered, and it was not the one most blogs answer.&lt;/p&gt;

&lt;p&gt;Most blogs tune SQLite. I wanted to know whether any of that tuning matters when you write through an ORM.&lt;/p&gt;

&lt;p&gt;It does not. Not even a little.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I tested it on
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;8-core / 16-thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAM&lt;/td&gt;
&lt;td&gt;23.2 GB DDR&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;Samsung 990 EVO Plus 1TB NVMe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OS&lt;/td&gt;
&lt;td&gt;Linux 6.8.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;3.11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQLAlchemy&lt;/td&gt;
&lt;td&gt;2.0 (sync)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;td&gt;WAL mode, QueuePool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema&lt;/td&gt;
&lt;td&gt;Single table, 11 columns, UUID primary key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything ran on the same machine, same NVMe, same kernel. No Docker. No network. Local SQLite, period.&lt;/p&gt;




&lt;h2&gt;
  
  
  The two benchmarks
&lt;/h2&gt;

&lt;p&gt;Two harnesses, two questions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config sweep.&lt;/strong&gt; 11 SQLite configurations, 10M rows each, all through SQLAlchemy. Does PRAGMA tuning matter?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before / after.&lt;/strong&gt; Same workload, ORM path vs raw &lt;code&gt;executemany&lt;/code&gt;, at 10M and 50M. How much throughput does the ORM cost?&lt;/p&gt;

&lt;h3&gt;
  
  
  The ORM path
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Insert phase: bulk_save_objects
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bulk_save_objects&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;BenchRow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Upsert phase: insert().on_conflict_do_update()
&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BenchRow&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on_conflict_do_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;index_elements&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sub_entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bucket_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;set_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;update_cols&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The raw path
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# sqlite3.executemany via the dbapi connection. Skips SQLAlchemy entirely.
&lt;/span&gt;&lt;span class="n"&gt;raw_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbapi_connection&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executemany&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO bench_rows (row_id, tenant_id, entity_id, ...) VALUES (?, ?, ?, ...)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rows_as_tuples&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;raw_conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both paths use the same generator, same schema, same target row count. The only difference is whether SQLAlchemy is in the picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Schema
&lt;/h3&gt;

&lt;p&gt;A single 11-column table, designed to look like a real production table rather than a synthetic benchmark.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;row_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;String(36)&lt;/td&gt;
&lt;td&gt;UUID primary key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tenant_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;String(36), indexed&lt;/td&gt;
&lt;td&gt;Logical sharding key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;entity_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;Natural-key component&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sub_entity_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;Natural-key component&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bucket_index&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;Natural-key component&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;measurement_x / y / z&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Float&lt;/td&gt;
&lt;td&gt;Numeric payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;category_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;Categorical field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;weight&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Float&lt;/td&gt;
&lt;td&gt;Probabilistic payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Integer&lt;/td&gt;
&lt;td&gt;State machine field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;created_at / updated_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DateTime&lt;/td&gt;
&lt;td&gt;Tracking timestamps&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Unique constraint on &lt;code&gt;(tenant_id, entity_id, sub_entity_id, bucket_index)&lt;/code&gt;, which is the upsert conflict target. Row size lands around 500 bytes including the UUID.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data generation
&lt;/h3&gt;

&lt;p&gt;A streaming generator that yields chunks of unique rows with deterministic natural keys. Constant memory regardless of total scale. It never materialises the full row set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;streaming_chunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;row_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sub_entity_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bucket_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="c1"&gt;# ... numeric and categorical fields
&lt;/span&gt;            &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The determinism matters. Re-running the same generator produces identical natural keys, which forces the upsert path to actually exercise ON CONFLICT instead of trivially inserting fresh rows. Without this, you are benchmarking inserts and calling it an upsert benchmark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Checkpoints and recovery
&lt;/h3&gt;

&lt;p&gt;Results saved at 3M, 5M, and 10M for the config sweep. 10M and 50M for the before / after. Intermediate JSON written after each segment, with &lt;code&gt;--resume&lt;/code&gt; to pick up where we left off. The 50M ORM run took 3.8 hours. The laptop suspended once. The resume capability paid for itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much compute went into this
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;Rows&lt;/th&gt;
&lt;th&gt;Datapoints&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Config sweep (11 configs × 10M)&lt;/td&gt;
&lt;td&gt;5.6 hours&lt;/td&gt;
&lt;td&gt;110M&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Before / after (10M + 50M, both modes)&lt;/td&gt;
&lt;td&gt;3.8 hours&lt;/td&gt;
&lt;td&gt;120M&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;9.4 hours&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;230M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;37&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Single host, single process at a time. Background &lt;code&gt;psutil&lt;/code&gt; sampler at 1Hz throughout. Each config got a freshly created database file via &lt;code&gt;tempfile.TemporaryDirectory()&lt;/code&gt;, so no run was polluted by any prior run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark 1: the config sweep
&lt;/h2&gt;

&lt;p&gt;11 SQLite configurations, 10 million rows each, all written through SQLAlchemy.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;Chunk size&lt;/th&gt;
&lt;th&gt;Sync&lt;/th&gt;
&lt;th&gt;Cache&lt;/th&gt;
&lt;th&gt;Pool&lt;/th&gt;
&lt;th&gt;mmap&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-4096&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_1000&lt;/td&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_5000&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_10000&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_25000&lt;/td&gt;
&lt;td&gt;25,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_50000&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;aggressive&lt;/td&gt;
&lt;td&gt;25,000&lt;/td&gt;
&lt;td&gt;OFF&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_3&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_5&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_8&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;NORMAL&lt;/td&gt;
&lt;td&gt;-64000&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;256MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Results at 10M rows
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;p99 latency&lt;/th&gt;
&lt;th&gt;Peak RSS&lt;/th&gt;
&lt;th&gt;Errors&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;chunk_1000&lt;/td&gt;
&lt;td&gt;3,821 r/s&lt;/td&gt;
&lt;td&gt;313 ms&lt;/td&gt;
&lt;td&gt;790 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;3,802 r/s&lt;/td&gt;
&lt;td&gt;1,418 ms&lt;/td&gt;
&lt;td&gt;195 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_5000&lt;/td&gt;
&lt;td&gt;3,673 r/s&lt;/td&gt;
&lt;td&gt;1,500 ms&lt;/td&gt;
&lt;td&gt;511 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_10000&lt;/td&gt;
&lt;td&gt;3,621 r/s&lt;/td&gt;
&lt;td&gt;2,887 ms&lt;/td&gt;
&lt;td&gt;610 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;3,576 r/s&lt;/td&gt;
&lt;td&gt;2,982 ms&lt;/td&gt;
&lt;td&gt;611 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_5&lt;/td&gt;
&lt;td&gt;3,574 r/s&lt;/td&gt;
&lt;td&gt;2,960 ms&lt;/td&gt;
&lt;td&gt;609 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_8&lt;/td&gt;
&lt;td&gt;3,573 r/s&lt;/td&gt;
&lt;td&gt;2,957 ms&lt;/td&gt;
&lt;td&gt;606 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_3&lt;/td&gt;
&lt;td&gt;3,445 r/s&lt;/td&gt;
&lt;td&gt;3,200 ms&lt;/td&gt;
&lt;td&gt;610 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_25000&lt;/td&gt;
&lt;td&gt;3,262 r/s&lt;/td&gt;
&lt;td&gt;8,490 ms&lt;/td&gt;
&lt;td&gt;629 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;aggressive&lt;/td&gt;
&lt;td&gt;3,243 r/s&lt;/td&gt;
&lt;td&gt;8,536 ms&lt;/td&gt;
&lt;td&gt;595 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_50000&lt;/td&gt;
&lt;td&gt;3,045 r/s&lt;/td&gt;
&lt;td&gt;20,508 ms&lt;/td&gt;
&lt;td&gt;633 MB&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Three things to notice.&lt;/p&gt;

&lt;p&gt;The first is the spread. Best to worst, 3,821 vs 3,045 rows per second. That is 26% across configs that range from &lt;code&gt;sync=OFF&lt;/code&gt; to default PRAGMAs, from 1K to 50K chunk sizes, from 3 to 8 pool connections. Whatever PRAGMA folklore you have read, on this workload, it is doing roughly nothing. The ORM is consuming the CPU budget before I/O becomes a factor.&lt;/p&gt;

&lt;p&gt;The second is &lt;code&gt;aggressive&lt;/code&gt;, which I configured with &lt;code&gt;sync=OFF&lt;/code&gt; and 25K chunks expecting it to win. It came second-from-last at 3,243 r/s. I sacrificed durability for nothing. Forward Email &lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;found the same thing&lt;/a&gt;: &lt;code&gt;sync=OFF&lt;/code&gt; is sometimes slower than NORMAL because the WAL machinery is different, not just absent. Shivek Khurana &lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;calls the difference "not very significant"&lt;/a&gt;. It is not.&lt;/p&gt;

&lt;p&gt;The third is pool size. &lt;code&gt;pool_3&lt;/code&gt;, &lt;code&gt;pool_5&lt;/code&gt;, &lt;code&gt;pool_8&lt;/code&gt;: 3,445 / 3,574 / 3,573. Under 4% spread. SQLite is single-writer no matter how many connections you open. SQLAlchemy's QueuePool just serialises writes at the application layer, which is functionally identical to the &lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;single-writer architecture&lt;/a&gt; every production SQLite deployment converges on.&lt;/p&gt;

&lt;p&gt;Baseline wins on the throughput-per-resource ratio. 5K chunks, default PRAGMAs, pool of 5. 3,802 r/s using only 195 MB of RSS. &lt;code&gt;chunk_1000&lt;/code&gt; ekes out 0.5% more throughput but uses 4× the memory. Not a trade I am taking.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmark 2: ORM vs raw SQL
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;th&gt;Peak RSS&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORM&lt;/td&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;3,696 r/s&lt;/td&gt;
&lt;td&gt;45.1 min&lt;/td&gt;
&lt;td&gt;1,492 ms&lt;/td&gt;
&lt;td&gt;177 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORM&lt;/td&gt;
&lt;td&gt;50M&lt;/td&gt;
&lt;td&gt;3,682 r/s&lt;/td&gt;
&lt;td&gt;226.3 min&lt;/td&gt;
&lt;td&gt;1,563 ms&lt;/td&gt;
&lt;td&gt;177 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;87,893 r/s&lt;/td&gt;
&lt;td&gt;1.9 min&lt;/td&gt;
&lt;td&gt;478 ms&lt;/td&gt;
&lt;td&gt;155 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;50M&lt;/td&gt;
&lt;td&gt;65,742 r/s&lt;/td&gt;
&lt;td&gt;12.7 min&lt;/td&gt;
&lt;td&gt;853 ms&lt;/td&gt;
&lt;td&gt;188 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;23.8× at 10M. 17.9× at 50M. Three and a half hours of wall time turn into thirteen minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the ORM time goes
&lt;/h3&gt;

&lt;p&gt;Every row through SQLAlchemy goes through this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n6pgiii3livxn9q90qc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n6pgiii3livxn9q90qc.png" alt="ORM stack: 7 layers per row" width="269" height="1150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Raw &lt;code&gt;executemany&lt;/code&gt; does this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvjmijni1mpytjftgrzc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkvjmijni1mpytjftgrzc.png" alt="Raw executemany stack: 2 layers per row" width="274" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two layers, seven gone. None of the seven are free. Each one is a Python attribute access, a dict lookup, a function call, something the JIT cannot inline because of how SQLAlchemy is structured. On a one-shot insert it is invisible. On 10 million rows it costs 43 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling behaviour, both paths
&lt;/h3&gt;

&lt;p&gt;ORM throughput is flat. 3,696 r/s at 10M. 3,682 r/s at 50M. I/O doesn't get a vote because the ORM is busy.&lt;/p&gt;

&lt;p&gt;Raw throughput degrades at scale, which is interesting. 87,893 r/s at 10M, 65,742 r/s at 50M. That is a 25% drop. The database file grew from 3.8 GB to 19.5 GB, the B-tree got deeper, WAL checkpoints got more expensive. This is SQLite's actual I/O scaling curve. The ORM never gets close enough to it to see it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Chunk size controls latency, not throughput
&lt;/h2&gt;

&lt;p&gt;Of all the things I tuned, chunk size was the only one that mattered, and not for the reason I expected.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Chunk size&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;p99&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;3,821 r/s&lt;/td&gt;
&lt;td&gt;313 ms&lt;/td&gt;
&lt;td&gt;1.0×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;3,673 r/s&lt;/td&gt;
&lt;td&gt;1,500 ms&lt;/td&gt;
&lt;td&gt;4.8×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;3,621 r/s&lt;/td&gt;
&lt;td&gt;2,887 ms&lt;/td&gt;
&lt;td&gt;9.2×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;25,000&lt;/td&gt;
&lt;td&gt;3,262 r/s&lt;/td&gt;
&lt;td&gt;8,490 ms&lt;/td&gt;
&lt;td&gt;27.1×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;3,045 r/s&lt;/td&gt;
&lt;td&gt;20,508 ms&lt;/td&gt;
&lt;td&gt;65.5×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;p99 scales linearly with chunk size. 50× larger chunks, 66× worse p99. Throughput moves 20%.&lt;/p&gt;

&lt;p&gt;Each chunk is one transaction. Larger transactions hold the write lock longer, block WAL checkpoints longer, allocate more memory. The SQLite write lock is exclusive. A 20-second transaction means every other writer waits 20 seconds.&lt;/p&gt;

&lt;p&gt;So: use the smallest chunks your throughput requirement allows. 1K to 5K is the sweet spot. You get ~3,800 r/s either way, and your p99 drops from 20 seconds to 300 milliseconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  The full data nobody usually shows
&lt;/h2&gt;

&lt;p&gt;The 10M table is the endpoint. The story is the flatness, which you can only see if you look at how throughput evolved during the run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cumulative throughput at each checkpoint
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;@3M&lt;/th&gt;
&lt;th&gt;@5M&lt;/th&gt;
&lt;th&gt;@10M&lt;/th&gt;
&lt;th&gt;Δ (3M→10M)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;aggressive&lt;/td&gt;
&lt;td&gt;3,232&lt;/td&gt;
&lt;td&gt;3,219&lt;/td&gt;
&lt;td&gt;3,243&lt;/td&gt;
&lt;td&gt;+0.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;3,760&lt;/td&gt;
&lt;td&gt;3,761&lt;/td&gt;
&lt;td&gt;3,802&lt;/td&gt;
&lt;td&gt;+1.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_1000&lt;/td&gt;
&lt;td&gt;3,871&lt;/td&gt;
&lt;td&gt;3,854&lt;/td&gt;
&lt;td&gt;3,821&lt;/td&gt;
&lt;td&gt;-1.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_5000&lt;/td&gt;
&lt;td&gt;3,734&lt;/td&gt;
&lt;td&gt;3,715&lt;/td&gt;
&lt;td&gt;3,673&lt;/td&gt;
&lt;td&gt;-1.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_10000&lt;/td&gt;
&lt;td&gt;3,640&lt;/td&gt;
&lt;td&gt;3,625&lt;/td&gt;
&lt;td&gt;3,621&lt;/td&gt;
&lt;td&gt;-0.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_25000&lt;/td&gt;
&lt;td&gt;3,384&lt;/td&gt;
&lt;td&gt;3,349&lt;/td&gt;
&lt;td&gt;3,262&lt;/td&gt;
&lt;td&gt;-3.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_50000&lt;/td&gt;
&lt;td&gt;2,966&lt;/td&gt;
&lt;td&gt;3,009&lt;/td&gt;
&lt;td&gt;3,045&lt;/td&gt;
&lt;td&gt;+2.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;3,640&lt;/td&gt;
&lt;td&gt;3,605&lt;/td&gt;
&lt;td&gt;3,576&lt;/td&gt;
&lt;td&gt;-1.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_3&lt;/td&gt;
&lt;td&gt;3,472&lt;/td&gt;
&lt;td&gt;3,463&lt;/td&gt;
&lt;td&gt;3,445&lt;/td&gt;
&lt;td&gt;-0.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_5&lt;/td&gt;
&lt;td&gt;3,570&lt;/td&gt;
&lt;td&gt;3,577&lt;/td&gt;
&lt;td&gt;3,574&lt;/td&gt;
&lt;td&gt;+0.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_8&lt;/td&gt;
&lt;td&gt;3,598&lt;/td&gt;
&lt;td&gt;3,588&lt;/td&gt;
&lt;td&gt;3,573&lt;/td&gt;
&lt;td&gt;-0.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No config moved more than ±3.6% from 3M to 10M. Every single one is flat. Whatever overhead the ORM imposes at row one, it imposes at row ten million.&lt;/p&gt;

&lt;p&gt;This is not how databases normally behave. Most B-tree workloads degrade at scale because the tree gets deeper, fewer pages fit in cache, fsync takes longer. None of that mattered here. The ORM was dominating so completely that SQLite's actual scaling curve never showed up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-segment throughput
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;0→3M&lt;/th&gt;
&lt;th&gt;3M→5M&lt;/th&gt;
&lt;th&gt;5M→10M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;3,760&lt;/td&gt;
&lt;td&gt;3,762&lt;/td&gt;
&lt;td&gt;3,844&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_1000&lt;/td&gt;
&lt;td&gt;3,871&lt;/td&gt;
&lt;td&gt;3,828&lt;/td&gt;
&lt;td&gt;3,789&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_50000&lt;/td&gt;
&lt;td&gt;2,966&lt;/td&gt;
&lt;td&gt;3,076&lt;/td&gt;
&lt;td&gt;3,081&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 5M→10M segment is sometimes faster than 0→3M. Filesystem cache is warm by then, no cold-start cost. The ORM is so far from being I/O-bound that even SQLite's I/O improvements show up positively.&lt;/p&gt;

&lt;h3&gt;
  
  
  p99 at each checkpoint
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;@3M&lt;/th&gt;
&lt;th&gt;@5M&lt;/th&gt;
&lt;th&gt;@10M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;chunk_1000&lt;/td&gt;
&lt;td&gt;312 ms&lt;/td&gt;
&lt;td&gt;312 ms&lt;/td&gt;
&lt;td&gt;313 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;1,403 ms&lt;/td&gt;
&lt;td&gt;1,410 ms&lt;/td&gt;
&lt;td&gt;1,418 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_5000&lt;/td&gt;
&lt;td&gt;1,463 ms&lt;/td&gt;
&lt;td&gt;1,536 ms&lt;/td&gt;
&lt;td&gt;1,500 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_5&lt;/td&gt;
&lt;td&gt;3,005 ms&lt;/td&gt;
&lt;td&gt;2,847 ms&lt;/td&gt;
&lt;td&gt;2,960 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_10000&lt;/td&gt;
&lt;td&gt;2,887 ms&lt;/td&gt;
&lt;td&gt;2,887 ms&lt;/td&gt;
&lt;td&gt;2,887 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;3,039 ms&lt;/td&gt;
&lt;td&gt;2,903 ms&lt;/td&gt;
&lt;td&gt;2,982 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_25000&lt;/td&gt;
&lt;td&gt;7,529 ms&lt;/td&gt;
&lt;td&gt;7,710 ms&lt;/td&gt;
&lt;td&gt;8,490 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;aggressive&lt;/td&gt;
&lt;td&gt;8,764 ms&lt;/td&gt;
&lt;td&gt;8,536 ms&lt;/td&gt;
&lt;td&gt;8,536 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chunk_50000&lt;/td&gt;
&lt;td&gt;20,647 ms&lt;/td&gt;
&lt;td&gt;16,492 ms&lt;/td&gt;
&lt;td&gt;20,508 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;p99 is flat too. Whatever your p99 looks like at 3M, that is what it looks like at 10M. Your latency budget is set by your chunk size, full stop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Before / after with every field
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;ORM @ 10M&lt;/th&gt;
&lt;th&gt;ORM @ 50M&lt;/th&gt;
&lt;th&gt;Raw @ 10M&lt;/th&gt;
&lt;th&gt;Raw @ 50M&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;3,696 r/s&lt;/td&gt;
&lt;td&gt;3,682 r/s&lt;/td&gt;
&lt;td&gt;87,893 r/s&lt;/td&gt;
&lt;td&gt;65,742 r/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Duration&lt;/td&gt;
&lt;td&gt;2,705 s&lt;/td&gt;
&lt;td&gt;13,581 s&lt;/td&gt;
&lt;td&gt;114 s&lt;/td&gt;
&lt;td&gt;761 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch count&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;td&gt;8,000&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch p50&lt;/td&gt;
&lt;td&gt;1,320 ms&lt;/td&gt;
&lt;td&gt;1,324 ms&lt;/td&gt;
&lt;td&gt;424 ms&lt;/td&gt;
&lt;td&gt;652 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch p95&lt;/td&gt;
&lt;td&gt;1,424 ms&lt;/td&gt;
&lt;td&gt;1,434 ms&lt;/td&gt;
&lt;td&gt;470 ms&lt;/td&gt;
&lt;td&gt;830 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch p99&lt;/td&gt;
&lt;td&gt;1,492 ms&lt;/td&gt;
&lt;td&gt;1,563 ms&lt;/td&gt;
&lt;td&gt;478 ms&lt;/td&gt;
&lt;td&gt;853 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch avg&lt;/td&gt;
&lt;td&gt;1,324 ms&lt;/td&gt;
&lt;td&gt;1,330 ms&lt;/td&gt;
&lt;td&gt;424 ms&lt;/td&gt;
&lt;td&gt;659 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Peak RSS&lt;/td&gt;
&lt;td&gt;177 MB&lt;/td&gt;
&lt;td&gt;177 MB&lt;/td&gt;
&lt;td&gt;155 MB&lt;/td&gt;
&lt;td&gt;188 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DB size&lt;/td&gt;
&lt;td&gt;3.5 GB&lt;/td&gt;
&lt;td&gt;17.8 GB&lt;/td&gt;
&lt;td&gt;3.9 GB&lt;/td&gt;
&lt;td&gt;19.5 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Segment throughput (10M→50M)&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;3,678 r/s&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;61,846 r/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things jump out.&lt;/p&gt;

&lt;p&gt;ORM memory is flat at 177 MB regardless of scale. We're streaming through chunks, no accumulation. The 5× DB size growth costs zero memory. The streaming approach is validated.&lt;/p&gt;

&lt;p&gt;Raw memory grows modestly, 155 to 188 MB. Larger chunks (50K vs 5K) hold larger tuple lists briefly. Trivial.&lt;/p&gt;

&lt;p&gt;The raw 10M→50M segment ran at 61,846 r/s versus 87,893 r/s for the first 10M. That 30% drop is SQLite's actual I/O cost showing through: B-tree depth, page splits, WAL checkpoint time. Completely invisible behind the ORM ceiling, which was at 3,700 r/s either way.&lt;/p&gt;

&lt;p&gt;ORM p99 is 3.1× worse than raw at 10M and gets worse with scale (1,492 → 1,563 ms versus 478 → 853 ms). Even the ORM's own baseline cannot keep p99 stable.&lt;/p&gt;

&lt;p&gt;DB sizes differ slightly between paths (3,518 MB ORM vs 3,851 MB raw at 10M). Both are correct. Page-fill heuristics and WAL checkpoint timing diverge slightly. Functionally identical data.&lt;/p&gt;




&lt;h2&gt;
  
  
  How this lines up with the rest of the internet
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Raw SQLite write throughput
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://marending.dev/notes/sqlite-benchmarks/" rel="noopener noreferrer"&gt;Marending (2024)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;M1 Mac&lt;/td&gt;
&lt;td&gt;113,684 w/s&lt;/td&gt;
&lt;td&gt;WAL + sync=NORMAL, mixed 80/20 hit 197,012 ops/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://andersmurphy.com/2025/12/02/100000-tps-over-a-billion-rows-the-unreasonable-effectiveness-of-sqlite.html" rel="noopener noreferrer"&gt;Anders Murphy (2025)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;M1 Pro 16GB&lt;/td&gt;
&lt;td&gt;121,922 TPS&lt;/td&gt;
&lt;td&gt;Dynamic batching, 1B rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://andersmurphy.com/2025/12/02/100000-tps-over-a-billion-rows-the-unreasonable-effectiveness-of-sqlite.html" rel="noopener noreferrer"&gt;Anders Murphy (2025)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;M1 Pro 16GB&lt;/td&gt;
&lt;td&gt;44,096 TPS&lt;/td&gt;
&lt;td&gt;No batching, 1B rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Linux x86 NVMe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87,893 r/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Python executemany @ 10M, UUID PK + 4-col upsert&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://marending.dev/notes/sqlite-benchmarks/" rel="noopener noreferrer"&gt;Marending&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Linux x86 Hetzner CPX31&lt;/td&gt;
&lt;td&gt;80,145 w/s&lt;/td&gt;
&lt;td&gt;WAL + sync=NORMAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tenthousandmeters.com/blog/sqlite-concurrent-writes-and-database-is-locked-errors/" rel="noopener noreferrer"&gt;tenthousandmeters&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Not specified&lt;/td&gt;
&lt;td&gt;72,769 ops/s&lt;/td&gt;
&lt;td&gt;1KB records, single-threaded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Linux x86 NVMe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;65,742 r/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Python executemany @ 50M&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Evan Schwartz&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Not specified&lt;/td&gt;
&lt;td&gt;60,061 r/s&lt;/td&gt;
&lt;td&gt;Single-writer connection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://marending.dev/notes/sqlite-benchmarks/" rel="noopener noreferrer"&gt;Marending&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Linux ARM Hetzner CAX31&lt;/td&gt;
&lt;td&gt;46,512 w/s&lt;/td&gt;
&lt;td&gt;WAL + sync=NORMAL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;Shivek Khurana&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;i9 MacBook 32GB&lt;/td&gt;
&lt;td&gt;15,576 w/s&lt;/td&gt;
&lt;td&gt;128 workers, WAL mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;Forward Email&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Node.js v20&lt;/td&gt;
&lt;td&gt;11,800 inserts/s&lt;/td&gt;
&lt;td&gt;wal_autocheckpoint=1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;Forward Email&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Node.js v20&lt;/td&gt;
&lt;td&gt;10,548 inserts/s&lt;/td&gt;
&lt;td&gt;Production baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My raw numbers sit in the middle. 65K to 88K rows per second on commodity NVMe through Python is what the ecosystem reports for similar hardware. Nothing exotic going on.&lt;/p&gt;

&lt;h3&gt;
  
  
  ORM overhead across stacks
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;th&gt;Raw&lt;/th&gt;
&lt;th&gt;ORM&lt;/th&gt;
&lt;th&gt;Overhead&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Python/SQLAlchemy 2.0/SQLite @ 10M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;87,893&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3,696&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23.8×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Python/SQLAlchemy 2.0/SQLite @ 50M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;65,742&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3,682&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;17.9×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tutorials.technology/tutorials/Fast-bulk-insert-with-sqlalchemy.html" rel="noopener noreferrer"&gt;SQLAlchemy bulk benchmarks&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python/SQLAlchemy/PG&lt;/td&gt;
&lt;td&gt;Core insert&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;session.add()&lt;/code&gt; loop&lt;/td&gt;
&lt;td&gt;40×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tutorials.technology/tutorials/Fast-bulk-insert-with-sqlalchemy.html" rel="noopener noreferrer"&gt;Same&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python/SQLAlchemy/PG&lt;/td&gt;
&lt;td&gt;Core insert&lt;/td&gt;
&lt;td&gt;&lt;code&gt;bulk_save_objects&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;15×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tutorials.technology/tutorials/Fast-bulk-insert-with-sqlalchemy.html" rel="noopener noreferrer"&gt;Same&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python/SQLAlchemy/PG&lt;/td&gt;
&lt;td&gt;PG &lt;code&gt;COPY&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;session.add()&lt;/code&gt; loop&lt;/td&gt;
&lt;td&gt;240×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Evan Schwartz&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rust/sqlx&lt;/td&gt;
&lt;td&gt;60,061&lt;/td&gt;
&lt;td&gt;2,586 (50-conn pool)&lt;/td&gt;
&lt;td&gt;23×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://remusao.github.io/posts/few-tips-sqlite-perf.html" rel="noopener noreferrer"&gt;remusao&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Python/sqlite3&lt;/td&gt;
&lt;td&gt;625K r/s&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;execute&lt;/code&gt; loop&lt;/td&gt;
&lt;td&gt;1.7×&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 17–40× tax on bulk operations through an ORM is consistent across stacks. The 240× COPY gap shows what happens when you bypass both the ORM and the Python driver. Worth flagging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production deployments at scale
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company&lt;/th&gt;
&lt;th&gt;Scale&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Notable choice&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server" rel="noopener noreferrer"&gt;Expensify&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4M QPS, 10B rows&lt;/td&gt;
&lt;td&gt;Custom Bedrock layer, bare metal 192 cores&lt;/td&gt;
&lt;td&gt;Modified SQLite (disabled POSIX locks)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://rubyonrails.org/2024/11/7/rails-8-no-paas-required" rel="noopener noreferrer"&gt;37signals (ONCE)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1000s of installs&lt;/td&gt;
&lt;td&gt;Per-customer SQLite, Rails 8&lt;/td&gt;
&lt;td&gt;Solid adapters (cache, queue, cable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://kentcdodds.com/blog/i-migrated-from-a-postgres-cluster-to-distributed-sqlite-with-litefs" rel="noopener noreferrer"&gt;Kent C. Dodds&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;6 regions&lt;/td&gt;
&lt;td&gt;LiteFS on Fly.io&lt;/td&gt;
&lt;td&gt;Postgres cluster → SQLite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://dev.to/whoffagents/cloudflare-d1-sqlite-at-the-edge-after-6-months-in-production-551j"&gt;Cloudflare D1&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Edge, global&lt;/td&gt;
&lt;td&gt;SQLite + Workers&lt;/td&gt;
&lt;td&gt;8 ms P99 reads, 500–2K writes/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://turso.tech/blog/local-first-cloud-connected-sqlite-with-turso-embedded-replicas" rel="noopener noreferrer"&gt;Turso&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Embedded replicas&lt;/td&gt;
&lt;td&gt;SQLite + libSQL&lt;/td&gt;
&lt;td&gt;624 µs reads, 40 µs connections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://fly.io/blog/all-in-on-sqlite-litestream/" rel="noopener noreferrer"&gt;Litestream / Ben Johnson&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Sub-ms queries&lt;/td&gt;
&lt;td&gt;SQLite + WAL replication to S3&lt;/td&gt;
&lt;td&gt;10–20 µs per query, 50–100× faster than intra-region Postgres&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;extensionpay.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;~120M req/month&lt;/td&gt;
&lt;td&gt;$14 DigitalOcean droplet&lt;/td&gt;
&lt;td&gt;3+ years on SQLite + Litestream to B2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;Glench (HN)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Production SaaS&lt;/td&gt;
&lt;td&gt;Single $14 droplet&lt;/td&gt;
&lt;td&gt;Memory mapping was biggest perf gain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;tazu (HN)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Mid-six-figure SaaS&lt;/td&gt;
&lt;td&gt;95% reads / 5% writes&lt;/td&gt;
&lt;td&gt;Separate reader pool + single writer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;hruk (HN)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8-figure ARR&lt;/td&gt;
&lt;td&gt;SQLite + Litestream on EC2&lt;/td&gt;
&lt;td&gt;~250 µs insert latency on EBS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some of these are extreme. Expensify is doing 4 million QPS on a custom SQLite fork running on a single $30K box. Most are not. Glench is doing 120 million requests a month on a $14 droplet. The interesting thing is the architectural convergence, which I'll come back to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sync mode, cross-referenced
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;sync=OFF&lt;/th&gt;
&lt;th&gt;sync=NORMAL&lt;/th&gt;
&lt;th&gt;sync=FULL/EXTRA&lt;/th&gt;
&lt;th&gt;Conclusion&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data (aggressive vs baseline)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3,243 r/s&lt;/td&gt;
&lt;td&gt;3,802 r/s&lt;/td&gt;
&lt;td&gt;not measured&lt;/td&gt;
&lt;td&gt;NORMAL wins via ORM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;Forward Email (Node.js)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;10,017&lt;/td&gt;
&lt;td&gt;10,548&lt;/td&gt;
&lt;td&gt;3,495 (EXTRA)&lt;/td&gt;
&lt;td&gt;OFF slower than NORMAL; EXTRA 3× slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tenthousandmeters.com/blog/sqlite-concurrent-writes-and-database-is-locked-errors/" rel="noopener noreferrer"&gt;tenthousandmeters&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;not measured&lt;/td&gt;
&lt;td&gt;72,769&lt;/td&gt;
&lt;td&gt;29,000 (FULL)&lt;/td&gt;
&lt;td&gt;NORMAL 2.5× faster than FULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;Shivek Khurana&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;not measured&lt;/td&gt;
&lt;td&gt;"not significant"&lt;/td&gt;
&lt;td&gt;"not significant"&lt;/td&gt;
&lt;td&gt;-6% to +10% variance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Consensus: NORMAL. OFF saves nothing meaningful. FULL and EXTRA can be 2 to 3× slower. The folklore that OFF is a free win for performance is just wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection pool architecture
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Note&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Our data (pool_3 / pool_5 / pool_8)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;QueuePool 3–8&lt;/td&gt;
&lt;td&gt;3,445 / 3,574 / 3,573 r/s&lt;/td&gt;
&lt;td&gt;&amp;lt;4% spread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Evan Schwartz&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;50-conn pool&lt;/td&gt;
&lt;td&gt;2,586 r/s&lt;/td&gt;
&lt;td&gt;p99 = 182 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Evan Schwartz&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Single writer&lt;/td&gt;
&lt;td&gt;60,061 r/s&lt;/td&gt;
&lt;td&gt;23× faster, p99 = 82 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://fractaledmind.com/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/" rel="noopener noreferrer"&gt;Stephen Margheim&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Reader pool + IMMEDIATE writer&lt;/td&gt;
&lt;td&gt;Zero errors up to 16 concurrent&lt;/td&gt;
&lt;td&gt;Default DEFERRED fails at 4+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://tenthousandmeters.com/blog/sqlite-concurrent-writes-and-database-is-locked-errors/" rel="noopener noreferrer"&gt;tenthousandmeters&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;App-level mutex&lt;/td&gt;
&lt;td&gt;56K–66K r/s stable&lt;/td&gt;
&lt;td&gt;Holds up to 256 threads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everyone arrives at the same answer eventually: serialise writes at the application layer. Whether that's QueuePool, an app mutex, or a single-writer connection, the architecture is identical in spirit.&lt;/p&gt;




&lt;h2&gt;
  
  
  The production config everyone agrees on
&lt;/h2&gt;

&lt;p&gt;If you cross-reference &lt;a href="https://oneuptime.com/blog/post/2026-02-02-sqlite-production-setup/view" rel="noopener noreferrer"&gt;OneUptime&lt;/a&gt;, &lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;Forward Email&lt;/a&gt;, &lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;Shivek Khurana&lt;/a&gt;, &lt;a href="https://fractaledmind.com/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/" rel="noopener noreferrer"&gt;Stephen Margheim&lt;/a&gt;, the &lt;a href="https://sqlite.org/wal.html" rel="noopener noreferrer"&gt;SQLite docs&lt;/a&gt;, and my own data, the config converges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;journal_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;          &lt;span class="c1"&gt;-- Concurrent reads + writes&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;synchronous&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NORMAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;-- Safe for WAL mode, 2-13% faster than FULL&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;busy_timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;-- 5s minimum; 30s for high-contention apps&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;cache_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;64000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;-- 64MB; diminishing returns beyond this&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;mmap_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;268435456&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;-- 256MB; helps read-heavy workloads&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;temp_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MEMORY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;         &lt;span class="c1"&gt;-- Faster temp tables (watch RSS on big VACUUMs)&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;foreign_keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;-- Data integrity&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;auto_vacuum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;INCREMENTAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;-- Reclaim space without full rebuild&lt;/span&gt;
&lt;span class="n"&gt;PRAGMA&lt;/span&gt; &lt;span class="n"&gt;wal_autocheckpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;-- Default, ~4MB WAL before checkpoint&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connection architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkoz9w7zvp9ub1m8ppvf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqkoz9w7zvp9ub1m8ppvf.png" alt="Single-writer plus multi-reader" width="612" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every production deployment I looked at converged on single-writer plus multi-reader. SQLAlchemy's QueuePool with &lt;code&gt;pool_size=5&lt;/code&gt; does the same thing in practice, which is why I saw zero errors across 110 million rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does not move the needle when you're using an ORM
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PRAGMA&lt;/th&gt;
&lt;th&gt;Effect on ORM throughput&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;synchronous = OFF vs NORMAL&lt;/td&gt;
&lt;td&gt;&amp;lt; 5%&lt;/td&gt;
&lt;td&gt;ORM overhead dominates I/O savings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cache_size 4MB vs 64MB&lt;/td&gt;
&lt;td&gt;&amp;lt; 3%&lt;/td&gt;
&lt;td&gt;B-tree lookups cheap vs Python object creation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mmap_size 0 vs 256MB&lt;/td&gt;
&lt;td&gt;&amp;lt; 2%&lt;/td&gt;
&lt;td&gt;Reads are fast, writes are ORM-bound&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool_size 3 vs 8&lt;/td&gt;
&lt;td&gt;&amp;lt; 4%&lt;/td&gt;
&lt;td&gt;Single-writer means pool size is irrelevant for writes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you skipped to this table, here is the summary: nothing on the left changes the right.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to throw the ORM out
&lt;/h2&gt;

&lt;p&gt;The ORM is fine for most things. CRUD, reads, validation, relationship traversal, normal application work. The fast path is for the hot bulk routes. Use this decision tree:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8x1qgr5bypgwothtoln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl8x1qgr5bypgwothtoln.png" alt="When to throw the ORM out" width="800" height="796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;ORM&lt;/th&gt;
&lt;th&gt;Raw SQL&lt;/th&gt;
&lt;th&gt;What to do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRUD, &amp;lt; 1K rows&lt;/td&gt;
&lt;td&gt;Good enough&lt;/td&gt;
&lt;td&gt;Premature&lt;/td&gt;
&lt;td&gt;ORM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk load 10K–1M rows&lt;/td&gt;
&lt;td&gt;2–5 min per million&lt;/td&gt;
&lt;td&gt;7 sec per million&lt;/td&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bulk load 1M–100M rows&lt;/td&gt;
&lt;td&gt;45 min per 10M&lt;/td&gt;
&lt;td&gt;1.9 min per 10M&lt;/td&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sustained &amp;gt; 1K r/s ingest&lt;/td&gt;
&lt;td&gt;Ceiling at 3.8K&lt;/td&gt;
&lt;td&gt;Headroom to 88K&lt;/td&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upsert-heavy workloads&lt;/td&gt;
&lt;td&gt;Works but slow&lt;/td&gt;
&lt;td&gt;18× faster&lt;/td&gt;
&lt;td&gt;Raw &lt;code&gt;executemany&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Hybrid repositories
&lt;/h3&gt;

&lt;p&gt;The pattern I now use in production: ORM for normal CRUD, a raw fast path for bulk.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BenchRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;BenchRow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Single-row CRUD. ORM is fine.
&lt;/span&gt;        &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BenchRow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;bulk_insert_turbo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Iterable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Bulk path. 23× faster than the ORM version.
&lt;/span&gt;        &lt;span class="n"&gt;raw_conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dbapi_connection&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;tuples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_dict_to_tuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executemany&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;INSERT_SQL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tuples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;raw_conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tuples&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ORM for correctness. Raw SQL for throughput. Both in the same repository, both committed to the same session, both touching the same tables.&lt;/p&gt;




&lt;h2&gt;
  
  
  Six things I did not expect
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;temp_store = MEMORY&lt;/code&gt; can be slower than disk
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;Forward Email's measurements&lt;/a&gt; showed disk-based temp storage beating memory. Large VACUUM operations can chew through 10+ GB with memory temp storage and either OOM or thrash swap. The folklore that "memory is faster" is not always true. Measure.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQLite's default busy handler uses exponential backoff, and it is bad
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://fractaledmind.com/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/" rel="noopener noreferrer"&gt;Stephen Margheim showed&lt;/a&gt; that uniform 1ms retry intervals beat SQLite's built-in exponential backoff. P99.99 drops from seconds to ~500 ms. The default backoff penalises long-waiting queries, which is the wrong end of the distribution to penalise.&lt;/p&gt;

&lt;h3&gt;
  
  
  WAL is slower than DELETE at low concurrency
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;Khurana's data&lt;/a&gt; shows WAL is 43% slower than DELETE at 1 worker. It only breaks even at 4 workers and starts winning at 8. If you have a single-threaded batch job, the rollback journal is faster. Counterintuitive, but the WAL machinery has overhead you only amortise across concurrent operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  A 50-connection pool is 23× slower than a single writer
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Evan Schwartz measured 2,586 vs 60,061 r/s&lt;/a&gt;. The 50-connection pool creates pseudo-serialisation with scheduling overhead. The single writer just goes. Same throughput pattern as ORM, different layer: software making sure writers don't collide, but doing it badly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node.js version matters more than SQLite config
&lt;/h3&gt;

&lt;p&gt;Forward Email reported Node v24 reads were 57% slower than v20 on the same database with the same PRAGMAs. The application layer dominates so completely that a runtime upgrade can dwarf years of database tuning. I cannot help feeling vindicated by this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Write degradation at 10M–100M rows is real
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;Multiple HN reports&lt;/a&gt; mention writes degrading once tables hit 8–9 figures. My raw numbers show it: 87,893 r/s at 10M, 65,742 r/s at 50M, a 25% drop driven by B-tree depth, page splits, and WAL checkpoint cost. Through the ORM, you never see it. The ORM is too slow to reach the curve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reproducing this
&lt;/h2&gt;

&lt;p&gt;If you want to run the same benchmarks, the &lt;a href="https://github.com/TanayK07/sqlite-orm-bench" rel="noopener noreferrer"&gt;repo is open source&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/TanayK07/sqlite-orm-bench.git
&lt;span class="nb"&gt;cd &lt;/span&gt;sqlite-orm-bench
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Smoke test, ~5 seconds&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; sqlite_bench.before_after &lt;span class="nt"&gt;--scales&lt;/span&gt; 10K &lt;span class="nt"&gt;--mode&lt;/span&gt; both

&lt;span class="c"&gt;# ORM vs raw at 10M, ~50 minutes&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; sqlite_bench.before_after &lt;span class="nt"&gt;--scales&lt;/span&gt; 10M &lt;span class="nt"&gt;--mode&lt;/span&gt; both

&lt;span class="c"&gt;# Full 11-config sweep at 10M, ~5.5 hours&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; sqlite_bench.scale_benchmark &lt;span class="nt"&gt;--scales&lt;/span&gt; 3M,5M,10M &lt;span class="nt"&gt;--configs&lt;/span&gt; all

&lt;span class="c"&gt;# Pick up after a crash&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; sqlite_bench.scale_benchmark &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scales&lt;/span&gt; 3M,5M,10M &lt;span class="nt"&gt;--configs&lt;/span&gt; all &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resume&lt;/span&gt; results/scale/results.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Requirements: Python 3.11+, SQLAlchemy 2.0+, NVMe storage, 8+ GB RAM, Linux.&lt;/p&gt;

&lt;p&gt;The harness measures throughput cumulative and per-segment, p50/p95/p99 per batch, peak RSS, DB and WAL sizes, and error counts. Results go to JSON with &lt;code&gt;--resume&lt;/code&gt; support, because long runs crash and the laptop will close its lid.&lt;/p&gt;

&lt;p&gt;If you run it on different hardware, please open an issue. I'm collecting a cross-hardware table.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I would do differently if I were starting today
&lt;/h2&gt;

&lt;p&gt;Six conclusions, all from the data above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ORM is the bottleneck. Not SQLite.&lt;/strong&gt; Across 11 configurations, my ORM throughput never broke 3,821 r/s. Raw &lt;code&gt;executemany&lt;/code&gt; on the same hardware hit 87,893. SQLite can absorb 23× more writes than my ORM can produce. Hours spent tuning PRAGMAs against an ORM-bound workload are hours you do not get back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PRAGMA tuning is irrelevant for ORM-bound workloads.&lt;/strong&gt; &lt;code&gt;sync=OFF&lt;/code&gt;, 64 MB cache, 256 MB mmap, 8-connection pool, none of it moves the needle more than 26%. The ORM's per-row work (object creation, attribute instrumentation, identity map, SQL compilation) consumes the CPU budget before SQLite gets a chance to be slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunk size is the only tunable that matters for latency.&lt;/strong&gt; p99 scales 66× across chunk sizes (313 ms at 1K, 20,508 ms at 50K), and throughput stays within 20%. One knob. Turn it down. 1K to 5K chunks gives you predictable latency without giving up throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The baseline config wins.&lt;/strong&gt; &lt;code&gt;sync=NORMAL&lt;/code&gt;, default cache, &lt;code&gt;pool=5&lt;/code&gt;, &lt;code&gt;chunk=5000&lt;/code&gt;. No exotic PRAGMAs needed. This matches the production consensus across OneUptime, Forward Email, Litestream, Rails 8, and SQLite's own documentation. It is also what I would have shipped if I had skipped the benchmarks entirely. Annoying, but useful to know.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Raw SQL degrades gracefully at scale.&lt;/strong&gt; 87,893 r/s at 10M, 65,742 r/s at 50M, a 25% drop. This is SQLite's actual I/O scaling curve: B-tree depth, page splits, WAL checkpoints. Real, predictable, manageable. The ORM masks it entirely because the ORM never gets close to the I/O boundary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;QueuePool eliminates concurrency errors.&lt;/strong&gt; Zero errors across 110 million rows, 11 configurations, 2 paths, 2 scales. QueuePool serialises writes at the application layer, matching the &lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;single-writer pattern&lt;/a&gt; every production deployment converges on. The fact that I never saw a &lt;code&gt;database is locked&lt;/code&gt; is itself a result.&lt;/p&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Production case studies
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Expensify&lt;/strong&gt;, &lt;a href="https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a-single-server" rel="noopener noreferrer"&gt;Scaling SQLite to 4M QPS on a Single Server&lt;/a&gt;. 10B rows, 192 cores, custom Bedrock layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ben Johnson / Litestream&lt;/strong&gt;, &lt;a href="https://fly.io/blog/all-in-on-sqlite-litestream/" rel="noopener noreferrer"&gt;All-In on Server-Side SQLite&lt;/a&gt;. 10–20 µs per query, 50–100× faster than intra-region Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DHH / 37signals&lt;/strong&gt;, &lt;a href="https://rubyonrails.org/2024/11/7/rails-8-no-paas-required" rel="noopener noreferrer"&gt;Rails 8: No PaaS Required&lt;/a&gt;. SQLite as Rails 8 default, per-customer DB pattern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kent C. Dodds&lt;/strong&gt;, &lt;a href="https://www.epicweb.dev/why-you-should-probably-be-using-sqlite" rel="noopener noreferrer"&gt;Why You Should Probably Be Using SQLite&lt;/a&gt;. Postgres cluster to distributed SQLite migration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare D1&lt;/strong&gt;, &lt;a href="https://dev.to/whoffagents/cloudflare-d1-sqlite-at-the-edge-after-6-months-in-production-551j"&gt;SQLite at the Edge After 6 Months&lt;/a&gt;. P99 8 ms reads, 40–60% latency reduction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OneUptime&lt;/strong&gt;, &lt;a href="https://oneuptime.com/blog/post/2026-02-02-sqlite-production-setup/view" rel="noopener noreferrer"&gt;SQLite Production Setup&lt;/a&gt;. Production configuration with monitoring thresholds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turso&lt;/strong&gt;, &lt;a href="https://turso.tech/blog/local-first-cloud-connected-sqlite-with-turso-embedded-replicas" rel="noopener noreferrer"&gt;Embedded Replicas&lt;/a&gt;. 624 µs read latency, 40 µs connection time.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Benchmarks and technical analysis
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Shivek Khurana&lt;/strong&gt;, &lt;a href="https://shivekkhurana.com/blog/sqlite-in-production/" rel="noopener noreferrer"&gt;SQLite in Production: A Real-World Benchmark&lt;/a&gt;. WAL vs DELETE, sync modes, concurrency scaling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anders Murphy&lt;/strong&gt;, &lt;a href="https://andersmurphy.com/2025/12/02/100000-tps-over-a-billion-rows-the-unreasonable-effectiveness-of-sqlite.html" rel="noopener noreferrer"&gt;100K TPS Over a Billion Rows&lt;/a&gt;. Dynamic batching, SQLite vs Postgres.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marending&lt;/strong&gt;, &lt;a href="https://marending.dev/notes/sqlite-benchmarks/" rel="noopener noreferrer"&gt;How Fast Is SQLite?&lt;/a&gt;. Cross-platform write throughput (46K–113K w/s).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tenthousandmeters.com&lt;/strong&gt;, &lt;a href="https://tenthousandmeters.com/blog/sqlite-concurrent-writes-and-database-is-locked-errors/" rel="noopener noreferrer"&gt;SQLite Concurrent Writes&lt;/a&gt;. Multi-threaded scaling, app-level mutex pattern.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evan Schwartz&lt;/strong&gt;, &lt;a href="https://emschwartz.me/psa-your-sqlite-connection-pool-might-be-ruining-your-write-performance/" rel="noopener noreferrer"&gt;Your SQLite Connection Pool Might Be Ruining Your Write Performance&lt;/a&gt;. Single-writer 23× faster than 50-connection pool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forward Email&lt;/strong&gt;, &lt;a href="https://forwardemail.net/en/blog/docs/sqlite-performance-optimization-pragma-chacha20-production-guide" rel="noopener noreferrer"&gt;SQLite Performance Optimization PRAGMA Guide&lt;/a&gt;. sync=OFF no better than NORMAL; temp_store=MEMORY can be slower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stephen Margheim&lt;/strong&gt;, &lt;a href="https://fractaledmind.com/2024/04/15/sqlite-on-rails-the-how-and-why-of-optimal-performance/" rel="noopener noreferrer"&gt;SQLite on Rails: Optimal Performance&lt;/a&gt;. IMMEDIATE transactions, uniform busy handler retry.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  SQLite documentation
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SQLite Official&lt;/strong&gt;, &lt;a href="https://sqlite.org/whentouse.html" rel="noopener noreferrer"&gt;Appropriate Uses for SQLite&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite Official&lt;/strong&gt;, &lt;a href="https://sqlite.org/wal.html" rel="noopener noreferrer"&gt;Write-Ahead Logging&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLite Official&lt;/strong&gt;, &lt;a href="https://sqlite.org/speed.html" rel="noopener noreferrer"&gt;Speed Comparison&lt;/a&gt;. Transaction batching is the 10–20× improvement.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ORM performance
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SQLAlchemy&lt;/strong&gt;, &lt;a href="https://docs.sqlalchemy.org/en/20/faq/performance.html" rel="noopener noreferrer"&gt;Performance FAQ&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQLAlchemy Bulk Insert Benchmarks&lt;/strong&gt;, &lt;a href="https://tutorials.technology/tutorials/Fast-bulk-insert-with-sqlalchemy.html" rel="noopener noreferrer"&gt;Fast Bulk Insert&lt;/a&gt;. Core insert 40× faster than session.add().&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Community reports
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hacker News&lt;/strong&gt;, &lt;a href="https://news.ycombinator.com/item?id=39955288" rel="noopener noreferrer"&gt;SQLite in Production Discussion&lt;/a&gt;. Multiple production deployments, 8-figure ARR on SQLite.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  About this work
&lt;/h2&gt;

&lt;p&gt;I started running these benchmarks while building a high-throughput spatial data ingestion pipeline on SQLite, on edge hardware, with structured measurements arriving by the million. The ORM ceiling was the forcing function. I needed to know whether I could fix it with PRAGMA tuning or whether I had to bypass SQLAlchemy entirely.&lt;/p&gt;

&lt;p&gt;The answer turned out to be the latter, and the benchmark suite I built to get there became this repo.&lt;/p&gt;

&lt;p&gt;Hardware context matters, of course. My NVMe results will differ from EBS, spinning disk, or eMMC. The relative findings (the ORM ratio, the config irrelevance, the chunk size vs latency story) should hold across storage tiers. The absolute numbers will not. If you run this on different hardware, &lt;a href="https://github.com/TanayK07/sqlite-orm-bench/issues" rel="noopener noreferrer"&gt;open an issue&lt;/a&gt;, and I'll add your numbers to the comparison table.&lt;/p&gt;

</description>
      <category>database</category>
      <category>backend</category>
      <category>sqlite</category>
      <category>python</category>
    </item>
    <item>
      <title>What happens when you send an Ethernet frame? I wrote 289 lessons to find out.</title>
      <dc:creator>Tanay Kedia</dc:creator>
      <pubDate>Sat, 23 May 2026 15:29:01 +0000</pubDate>
      <link>https://dev.to/tanayk07/what-happens-when-you-send-an-ethernet-frame-i-wrote-289-lessons-to-find-out-2eoh</link>
      <guid>https://dev.to/tanayk07/what-happens-when-you-send-an-ethernet-frame-i-wrote-289-lessons-to-find-out-2eoh</guid>
      <description>&lt;p&gt;Every networking course I found had the same problem.&lt;/p&gt;

&lt;p&gt;Either it's pure theory — OSI model diagrams, "TCP does a three-way handshake," done. Or it skips straight to &lt;code&gt;socket()&lt;/code&gt; and &lt;code&gt;connect()&lt;/code&gt; and treats everything underneath as a black box.&lt;/p&gt;

&lt;p&gt;Nothing showed me what actually happens between the bits on a wire and &lt;code&gt;curl https://example.com&lt;/code&gt; returning a response.&lt;/p&gt;

&lt;p&gt;So I wrote 289 lessons that build the entire network stack from scratch in C.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "from scratch" means
&lt;/h2&gt;

&lt;p&gt;You don't call libraries. You construct the bytes yourself.&lt;/p&gt;

&lt;p&gt;Here's how you build an Ethernet frame in the course:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;nfs_frame_build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;ethertype&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;payload_len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;out_sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;eff_payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eff_payload&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_MIN_DATA&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;eff_payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_MIN_DATA&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_HLEN&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;eff_payload&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;memset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="cm"&gt;/* Destination MAC */&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_ALEN&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="cm"&gt;/* Source MAC */&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_ALEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_ALEN&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="cm"&gt;/* EtherType in network byte order */&lt;/span&gt;
    &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="n"&gt;et_net&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;htons&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ethertype&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;et_net&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;payload_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;NFS_ETH_HLEN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload_len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;6 bytes destination MAC. 6 bytes source MAC. 2 bytes EtherType. Payload. That's an Ethernet frame. You built it. You understand it.&lt;/p&gt;

&lt;p&gt;Same approach for everything else — ARP, IP, ICMP, TCP, DNS, TLS.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 15 phases
&lt;/h2&gt;

&lt;p&gt;The course is organized bottom-up:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phases 1-3: Bits → Frames → Packets&lt;/strong&gt;&lt;br&gt;
You start with encoding schemes and bit manipulation, build Ethernet frames with raw sockets, then implement IP headers with proper checksums, write an ICMP ping, and build a router.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4: TCP (40 lessons)&lt;/strong&gt;&lt;br&gt;
This is the big one. You implement the full TCP state machine — three-way handshake, sliding window, retransmission, SACK, congestion control (Reno, CUBIC, BBR), TIME-WAIT. 40 lessons because TCP is that deep.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phases 5-7: Sockets, Protocols, TLS&lt;/strong&gt;&lt;br&gt;
You build an HTTP parser, a DNS resolver, implement the TLS 1.3 handshake including Diffie-Hellman key exchange, AES-GCM encryption, and X.509 certificate parsing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 8: The capstone&lt;/strong&gt;&lt;br&gt;
Everything comes together. You build a complete userspace TCP/IP stack in C that can make HTTPS requests. Your code, your stack, real traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phases 9-14: Go deeper&lt;/strong&gt;&lt;br&gt;
Linux kernel networking internals (sk_buff, NAPI, netfilter), eBPF/XDP programs, container networking, building a CNI plugin, service mesh. The stuff you debug at 3am in production but never learned properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 15: DDS &amp;amp; Robotics&lt;/strong&gt;&lt;br&gt;
Bonus phase covering DDS middleware used in ROS 2 and real-time systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  What a lesson looks like
&lt;/h2&gt;

&lt;p&gt;Each lesson has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A README explaining the concept&lt;/li&gt;
&lt;li&gt;C (or Python) source code&lt;/li&gt;
&lt;li&gt;A Makefile&lt;/li&gt;
&lt;li&gt;Tests&lt;/li&gt;
&lt;li&gt;Exercises&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You clone the repo, &lt;code&gt;cd&lt;/code&gt; into a lesson, run &lt;code&gt;make&lt;/code&gt;, and start building. No IDE required, no special tools. Just &lt;code&gt;gcc&lt;/code&gt; and a Linux machine.&lt;/p&gt;

&lt;p&gt;Here's DNS name encoding from the DNS lesson — turning &lt;code&gt;example.com&lt;/code&gt; into the wire format that RFC 1035 specifies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;nfs_dns_name_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;out_sz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;name_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;name_len&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;strchr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;label_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                               &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name_len&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label_len&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label_len&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;63&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;label_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label_len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;pos&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;label_len&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;label_len&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="cm"&gt;/* root terminator */&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;example.com&lt;/code&gt; becomes &lt;code&gt;\x07example\x03com\x00&lt;/code&gt;. Length-prefixed labels, null-terminated. You read the RFC, you write the code, you understand the format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why build instead of read
&lt;/h2&gt;

&lt;p&gt;I used to think I understood TCP. I'd read the textbook diagrams, I could explain the three-way handshake on a whiteboard.&lt;/p&gt;

&lt;p&gt;Then I tried to implement a sliding window with SACK support and realized I understood almost nothing. What happens when an ACK arrives for a segment you already retransmitted? How do you handle window scaling across 32-bit sequence space? What does CUBIC actually do differently from Reno?&lt;/p&gt;

&lt;p&gt;Building forces you to answer questions that reading lets you skip.&lt;/p&gt;

&lt;h2&gt;
  
  
  The repo
&lt;/h2&gt;

&lt;p&gt;289 lessons. 15 phases. MIT licensed. No signup, no paywall.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/TanayK07/networking-from-scratch" rel="noopener noreferrer"&gt;github.com/TanayK07/networking-from-scratch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Site:&lt;/strong&gt; &lt;a href="https://networkingfromscratch.vercel.app" rel="noopener noreferrer"&gt;networkingfromscratch.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you've ever wanted to understand what actually happens between your application and the wire, this is the course I wish existed when I started.&lt;/p&gt;

&lt;p&gt;Contributions welcome — especially reviews of the kernel internals and eBPF phases. Open an issue or PR on GitHub.&lt;/p&gt;

</description>
      <category>networking</category>
      <category>programming</category>
      <category>systems</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I Built a Pokédex for AI Coding Companions</title>
      <dc:creator>Tanay Kedia</dc:creator>
      <pubDate>Sat, 04 Apr 2026 12:15:15 +0000</pubDate>
      <link>https://dev.to/tanayk07/i-built-a-pokedex-for-ai-coding-companions-odk</link>
      <guid>https://dev.to/tanayk07/i-built-a-pokedex-for-ai-coding-companions-odk</guid>
      <description>&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;Claude Code has a &lt;code&gt;/buddy&lt;/code&gt; feature — it gives you a random AI companion with ASCII art, a name, a personality, and stats. It's cute. It sits in your config file. Nobody else ever sees it.&lt;/p&gt;

&lt;p&gt;I thought: what if we made it competitive?&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://buddyboard.xyz" rel="noopener noreferrer"&gt;Buddy Board&lt;/a&gt; — a competitive leaderboard and trading card system for Claude Code companions.&lt;/p&gt;

&lt;p&gt;One command to join:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx buddy-board
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
`&lt;/p&gt;

&lt;p&gt;&lt;a href="https://buddyboard.xyz/og-image.png" rel="noopener noreferrer"&gt;https://buddyboard.xyz/og-image.png&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Your buddy is deterministic. It's computed from a hash of your Claude Code account ID using a seeded Mulberry32 PRNG — the same algorithm Claude Code uses internally. Your species, rarity, stats, eyes, and hat are all derived from this hash.&lt;/p&gt;

&lt;p&gt;That means your buddy is truly yours. Same account, same buddy, every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stats
&lt;/h2&gt;

&lt;p&gt;Every buddy has 5 stats (0-100): Debugging, Patience, Chaos, Wisdom, Snark. These are summed for a total score that determines your leaderboard rank.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rarity
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
┌───────────┬────────┬──────────────────────────────────────────────┐&lt;br&gt;
│   Tier    │ Chance │               Visual Treatment               │&lt;br&gt;
├───────────┼────────┼──────────────────────────────────────────────┤&lt;br&gt;
│ Common    │ 60%    │ Clean border                                 │&lt;br&gt;
├───────────┼────────┼──────────────────────────────────────────────┤&lt;br&gt;
│ Uncommon  │ 25%    │ Green border                                 │&lt;br&gt;
├───────────┼────────┼──────────────────────────────────────────────┤&lt;br&gt;
│ Rare      │ 10%    │ Blue glow                                    │&lt;br&gt;
├───────────┼────────┼──────────────────────────────────────────────┤&lt;br&gt;
│ Epic      │ 4%     │ Purple glow                                  │&lt;br&gt;
├───────────┼────────┼──────────────────────────────────────────────┤&lt;br&gt;
│ Legendary │ 1%     │ Gold pulse + holographic shimmer + scanlines │&lt;br&gt;
└───────────┴────────┴──────────────────────────────────────────────┘&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;shell&lt;/p&gt;

&lt;p&gt;And there's a 1% chance of being Shiny on top of rarity. A Shiny Legendary is a 0.01% pull.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Species
&lt;/h2&gt;

&lt;p&gt;18 species, all rendered as ASCII art: duck, goose, blob, cat, dragon, octopus, owl, penguin, turtle, snail, ghost, axolotl, capybara, cactus, robot, rabbit, mushroom, chonk.&lt;/p&gt;

&lt;p&gt;Each species has swappable eyes (6 types) and hats (8 types), giving 1,728 possible visual combinations. The BuddyDex tracks which ones have been discovered globally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Global leaderboard — ranked by total stats, filterable by species, rarity, org&lt;/li&gt;
&lt;li&gt;Trading cards — rarity-specific CSS treatments, embeddable in GitHub READMEs&lt;/li&gt;
&lt;li&gt;BuddyDex — Pokédex-style gallery, undiscovered species show as silhouettes&lt;/li&gt;
&lt;li&gt;Organizations — register your team, compete on combined power&lt;/li&gt;
&lt;li&gt;Compare — head-to-head stat breakdowns between any two buddies&lt;/li&gt;
&lt;li&gt;OG cards — auto-generated 1200x675 share images&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Holographic CSS
&lt;/h2&gt;

&lt;p&gt;The part I'm most proud of is the legendary card treatment. It uses three layered CSS effects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scanlines — repeating-linear-gradient at 2px intervals, 3% opacity (::before)&lt;/li&gt;
&lt;li&gt;Holographic shimmer — rainbow gradient sweep via background-position animation on ::after&lt;/li&gt;
&lt;li&gt;Gold pulse — breathing box-shadow keyframes (3s ease-in-out infinite)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Simple CSS, big visual impact.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
npx buddy-board&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Then add your card to your GitHub README:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;markdown&lt;br&gt;
[![buddy](https://buddyboard.xyz/card/yourname)](https://buddyboard.xyz/u/yourname)&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/TanayK07/buddy-board" rel="noopener noreferrer"&gt;TanayK07/buddy-board&lt;/a&gt;&lt;br&gt;
Live: &lt;a href="//buddyboard.xyz"&gt;buddyboard.xyz&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'd love to know what species you got. Mine is a Common duck. I'm still waiting for the reroll feature.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>showdev</category>
      <category>webdev</category>
      <category>cli</category>
    </item>
  </channel>
</rss>
