<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tyler Tan</title>
    <description>The latest articles on DEV Community by Tyler Tan (@tyler_tan_13b1f742020d35a).</description>
    <link>https://dev.to/tyler_tan_13b1f742020d35a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938418%2F11691764-c9e5-4c1c-9738-7fe2c8eb3d08.png</url>
      <title>DEV Community: Tyler Tan</title>
      <link>https://dev.to/tyler_tan_13b1f742020d35a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tyler_tan_13b1f742020d35a"/>
    <language>en</language>
    <item>
      <title>Cache Deep Dive III — Replacement Policies, Prefetch, and Single-Thread Memory Access</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Tue, 09 Jun 2026 05:01:34 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-iii-replacement-policies-prefetch-and-single-thread-memory-access-1e1a</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-iii-replacement-policies-prefetch-and-single-thread-memory-access-1e1a</guid>
      <description>&lt;p&gt;The previous article discussed the static structure of caches. This part moves into dynamic aspects: when a program continuously issues read requests, how the cache decides which lines to retain and which to evict; how hardware prefetchers pull data into the cache before the program even issues a request; and the performance characteristics of two extreme access patterns under single-threaded execution — sequential and random access.&lt;/p&gt;




&lt;h2&gt;
  
  
  Replacement and Placement Policies
&lt;/h2&gt;

&lt;p&gt;Cache replacement and placement policies are primarily managed by hardware logic; the programmer has no direct control in the vast majority of cases. Modern ISAs do provide instructions that can influence cache behavior — x86's &lt;code&gt;PREFETCH&lt;/code&gt; instruction, &lt;code&gt;CLFLUSH&lt;/code&gt; / &lt;code&gt;CLWB&lt;/code&gt; flush instructions, and non-temporal stores (&lt;code&gt;MOVNTI&lt;/code&gt; and the like) that bypass the cache on writes — but these are, in essence, merely hints to the hardware: the architecture does not guarantee that a prefetch will actually occur or that a flush will actually complete; at the microarchitecture level, however, modern processors typically convert such instructions into real prefetch or flush requests. Whether actual benefit materializes depends on the prefetch distance, current bandwidth pressure, and cache state. What truly decides which line gets evicted and where data gets placed is always the combinational logic in hardware. If these decisions were delegated to the OS, every replacement would require triggering an interrupt, trapping into kernel mode, and executing hundreds of software instructions to select a victim line — an overhead far greater than any performance gain it could bring. Hardware cache replacement is done by pure digital logic in less than a single clock cycle.&lt;/p&gt;

&lt;p&gt;When a program needs data from level k+1, the cache first checks the current level. A hit is a &lt;strong&gt;cache hit&lt;/strong&gt;, saving the latency of accessing the lower level; a miss is a &lt;strong&gt;cache miss&lt;/strong&gt;. Since level k is necessarily smaller than level k+1, and the amount of memory a program uses often exceeds the cache capacity, a full cache must evict an existing line to make room for a new one. The decision of which block to evict is controlled by the &lt;strong&gt;replacement policy&lt;/strong&gt;. The simplest policy is random replacement — picking a line to sacrifice at random. A more sophisticated one is LRU — evicting the line that was Least Recently Used. LRU is non-trivial to implement in hardware, as it requires maintaining the access order of all lines within a set.&lt;/p&gt;

&lt;p&gt;In real chips, LRU approximations or alternatives are widely used. Both academic research and reverse engineering generally conclude that modern Intel last-level caches (since Haswell) use a replacement mechanism conceptually similar to &lt;strong&gt;RRIP&lt;/strong&gt; (Re-Reference Interval Prediction), rather than strict LRU: each cache line is tagged with a Re-Reference Prediction Value (RRPV), cleared to zero on access, and when eviction is needed, the line with the highest current RRPV is selected. When all RRPVs saturate, a global aging event is triggered. DRRIP (Dynamic RRIP) further adaptively switches between SRRIP (biased toward protecting newly inserted lines) and BRRIP (biased toward quickly evicting new lines). Compared to strict LRU, such mechanisms do not unconditionally promote every accessed line to the "most recently used" top — thereby avoiding cases where a single incidental access evicts hot data, and performing better under mixed access patterns.&lt;/p&gt;

&lt;p&gt;Note that the complex policies described above appear primarily in larger-capacity LLCs. Caches closer to the core (L1, L2) emphasize access latency more — the few hundred picoseconds of additional delay that an extra replacement state machine might introduce are already unacceptable on the L1 path — so various LRU approximations (such as Tree-PLRU, NRU) and even random replacement are common in L1/L2. The complexity of replacement policies increases outward along the cache hierarchy, inversely related to latency tolerance.&lt;/p&gt;

&lt;p&gt;Beyond the replacement policy, the hardware must also decide where a new piece of data should be placed — that is, the &lt;strong&gt;placement policy&lt;/strong&gt;. The placement policy determines the type of a miss. If data is being accessed for the first time and is not in the cache, that is a &lt;strong&gt;cold miss&lt;/strong&gt;, which is unavoidable. If there is still available space in the cache, but mapping-rule constraints cause certain addresses to be repeatedly mapped to the same location while other locations sit empty, that is a &lt;strong&gt;conflict miss&lt;/strong&gt;. If the entire working set is too large and exceeds the cache capacity, that is a &lt;strong&gt;capacity miss&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;From a measurement perspective, the three types of misses can be distinguished by shrinking/enlarging the working set and cross-referencing against the miss counts from &lt;code&gt;perf stat&lt;/code&gt;: seeing misses even on a very small dataset (far smaller than the cache) → dominated by cold misses; miss rate varying with data distribution on a medium dataset → conflict misses; miss rate asymptotically approaching a fixed value on a large dataset → capacity misses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hardware Prefetchers
&lt;/h2&gt;

&lt;p&gt;Since program execution inevitably requires data and instructions, can they be fetched asynchronously in advance to reduce or even eliminate misses? This is &lt;strong&gt;prefetching&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Modern CPU hardware prefetchers employ multiple strategies. The basic behavioral pattern of a prefetcher is: upon detecting a sequence of consecutive accesses, preemptively pull the next expected address into the cache before it is actually accessed. Implementations from different vendors each have their own emphasis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intel&lt;/strong&gt;'s typical configuration includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L1 Data Prefetcher&lt;/strong&gt;: monitors L1d access patterns, and upon detecting two consecutive cache-line loads (within the same 4 KB page), prefetches the next cache line into L1d.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2 Streamer (Spatial Prefetcher)&lt;/strong&gt;: monitors L1 miss requests, and upon detecting a sequence of misses at consecutive addresses, prefetches several subsequent cache lines into L2 along the same direction, typically with a prefetch depth of 2–4 lines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2 Adjacent Cache Line Prefetcher&lt;/strong&gt;: within a 128-byte-aligned pair, when the L2 receives a miss request for one half, it simultaneously pulls the other half (the adjacent 64-byte line) into L2 as well. This prefetcher does not rely on pattern detection; it is purely spatial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next Page Prefetcher&lt;/strong&gt;: when the access sequence approaches a 4 KB page boundary, if the next page is already registered in the TLB, the prefetcher can continue prefetching across the page boundary — crossing the page boundary without stalling, provided no page fault is triggered.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AMD Zen&lt;/strong&gt; series (Zen 4/5) corresponding implementations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L1/L2 Stream Detector&lt;/strong&gt;: similar to Intel's L1 Data Prefetcher + L2 Streamer, detects sequential accesses and prefetches subsequent lines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L2 Up/Down Prefetcher&lt;/strong&gt;: a bidirectional prefetcher — prefetches not only in the direction of access, but also backward (fetching the previous cache line relative to the current line), more friendly to scenarios requiring bidirectional traversal (such as prefix and suffix scans).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prefetcher aggressiveness is tunable in specific BIOS or MSR settings, but is generally not directly controlled by the application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Software prefetch&lt;/strong&gt; allows the programmer to explicitly specify a prefetch address in code. GCC and Clang provide &lt;code&gt;__builtin_prefetch&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;__builtin_prefetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;  &lt;span class="c1"&gt;// prefetch 16 steps ahead&lt;/span&gt;
    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This built-in function accepts three arguments: the target address, the read/write type (0 for read-only, 1 for read-write), and a temporal-locality hint (0 for use-once-and-discard, 3 for retain-as-long-as-possible). On x86, it maps to the &lt;code&gt;PREFETCH&lt;/code&gt; instruction; the compiler intrinsic is &lt;code&gt;_mm_prefetch&lt;/code&gt;. If the prefetch distance is too short — the data arrives while the CPU is still in the middle of prior computation — the prefetch is meaningless, merely occupying a cache line ahead of time. If the distance is too long — the data arrives but is evicted by subsequent accesses before use — the bandwidth is wasted. Effective use of software prefetching therefore requires measured parameter tuning; improper use not only brings no benefit but actually evicts useful data by consuming additional bandwidth.&lt;/p&gt;




&lt;h2&gt;
  
  
  Single-Thread Sequential Access
&lt;/h2&gt;

&lt;p&gt;The theoretical latencies given in the previous article — roughly 12 cycles for L2, roughly 200 cycles for main memory — rarely appear in their full magnitude during real sequential access. The reason is that hardware prefetchers hide most of the latency through overlapping.&lt;/p&gt;

&lt;p&gt;Consider the following scenario: a single thread sequentially traversing a &lt;code&gt;uint64_t&lt;/code&gt; array. When the total number of elements n is small and the entire array fits in L1d, the vast majority of accesses are L1 hits, with latency around 4–5 cycles. When n exceeds the L1d capacity but remains within the L2 range, the theoretical L2 hit penalty is 12–14 cycles, yet the measured effective cost typically lands around 8 cycles — the prefetcher has already moved subsequent data from L2 into L1d before the L1 miss occurs. When n grows large enough that main memory becomes the source, the single-access latency to DRAM is still about 200 cycles; but in a sequential streaming access, the prefetcher and MLP overlap multiple requests, bringing the amortized cost per element down to single-digit cycles. These values depend on microarchitecture and prefetcher configuration, but the direction is consistent across all modern CPUs: in sequential access, effective throughput far exceeds what single-access latency alone would suggest (these orders of magnitude can be reproduced on a target machine using Intel MLC or a comparable benchmark).&lt;/p&gt;

&lt;p&gt;The fundamental reason prefetchers can so effectively mask main-memory latency is that sequential access provides them with the most ideal input — the increment in access addresses is fixed and predictable. This allows the prefetcher's pattern detection to lock onto the direction and stride after the first or second access, and subsequent prefetches can pull multiple lines at a time, forming a pipeline of inflowing data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Impact of Stride
&lt;/h3&gt;

&lt;p&gt;When the traversal stride increases, prefetcher efficiency drops. If each element is 64 bytes (exactly one cache line), each access crosses one line; if 128 bytes, it crosses two lines, and the prefetcher's pipeline speed must double just to keep up with downstream consumption. At a stride of 256 bytes, the cache's "effective capacity" is diluted — although the hardware still pulls every full cache line, only a small fraction of each line is actually used, and the remaining bytes are wasted.&lt;/p&gt;

&lt;p&gt;The most common engineering case is traversing an array of large structs while accessing only one field at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Entity&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;hot_field&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="n"&gt;Entity&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hot_field&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each loop iteration loads a 64-byte cache line but uses only 4 bytes (&lt;code&gt;hot_field&lt;/code&gt; being &lt;code&gt;int&lt;/code&gt;), for an effective utilization of 4/64 ≈ 6%. The prefetcher's bandwidth is filled with a large amount of useless data, and actual throughput drops to 6% of the theoretical bandwidth.&lt;/p&gt;

&lt;p&gt;In such scenarios, separating the hot fields into their own independent array can dramatically improve cache efficiency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Entities&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;hot_fields&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;paddings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the transition from AoS (Array of Structures) to SoA (Structure of Arrays). Hot fields are now laid out contiguously; a single cache line can hold 16 &lt;code&gt;int&lt;/code&gt;s (64 B / 4 B), and each line the prefetcher brings back feeds 16 loop iterations. When a program accesses only a small subset of fields while ignoring the rest, SoA-style traversal typically significantly outperforms AoS. However, if all fields of the same element need to be accessed together (e.g., the &lt;code&gt;x&lt;/code&gt;, &lt;code&gt;y&lt;/code&gt;, &lt;code&gt;z&lt;/code&gt; coordinates of a &lt;code&gt;Particle&lt;/code&gt; struct), AoS instead makes fuller use of all the data pulled back per line by the prefetcher. This design principle, fundamental to Data-Oriented Design, is a basic strategy for cache optimization: arrange data by access pattern, not by conceptual model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Single-Thread Random Access
&lt;/h2&gt;

&lt;p&gt;The discussion above is entirely based on sequential access — scenarios where prefetchers can function effectively. When the memory access pattern becomes completely random, the situation is reversed.&lt;/p&gt;

&lt;p&gt;Under sequential access, the amortized cost per element can be as low as a few cycles; under random access, because prefetchers and MLP struggle to be effective, the program begins to be directly exposed to the hundreds of cycles of main-memory latency. Main memory itself has not become slower. The problem lies with the hardware prefetcher — unable to recognize a random pattern, it still issues prefetch requests according to its own policy, but the prefetch addresses bear no relation to the data the program actually needs. The useless data pulled back by the prefetcher not only occupies memory bus bandwidth but also evicts useful hot data from the cache. Prefetching transforms from a means of reducing latency into a burden on the system.&lt;/p&gt;

&lt;p&gt;The performance curves of sequential and random access exhibit fundamentally different shapes. Using a pointer-chasing benchmark to plot a latency-vs-working-set-size curve: sequential access appears as a staircase — clear latency steps at the capacity boundaries of each cache level (L1/L2/L3), with the prefetcher pressing latency down to near the theoretical hit latency at the edge of each step. Random access, by contrast, is a smoothly rising ramp — the larger the working set, the more the cache hit rate continuously declines, and latency slides gradually from L1 to L2, to L3, and finally to the DRAM plateau. On either side of the LLC capacity boundary, random-access latency transitions almost continuously rather than jumping — because the probability of a miss rises smoothly with the working set size, rather than flipping abruptly at a capacity threshold.&lt;/p&gt;

&lt;p&gt;The reason pointer chasing is the standard method for measuring random-access latency is that it constructs a dependency chain that is fundamentally impossible for any prefetcher to predict: allocate an array with elements randomly permuted, each element storing a pointer to the next element. The CPU cannot know the address of the next &lt;code&gt;load&lt;/code&gt; until the current &lt;code&gt;load&lt;/code&gt; completes — this is the most stringent form of data dependency. Pointer chasing does not merely suffer from a low cache hit rate; it simultaneously destroys the entire foundation on which prefetchers, MLP, and out-of-order execution rely to hide latency: the prefetcher is defeated because it cannot predict the next address, MLP is paralyzed because addresses are serially dependent, and all instructions in the ROB that indirectly depend on the load result stall and wait.&lt;/p&gt;

&lt;p&gt;Random access is equally devastating to the TLB — when the number of pages in the working set exceeds the number of TLB entries, every new random jump may land on an uncached page, triggering a full page-table walk. Detailed analysis of this topic, however, is deferred to Part IV.&lt;/p&gt;

&lt;p&gt;From an engineering perspective, the irrecoverability of random-access performance means that for data structures based primarily on pointer traversal (linked lists, hash tables, B-trees, skip lists), even with arbitrarily large caches, as long as the working set exceeds the cache, performance degrades irreversibly. This is the fundamental dividing line between "cache-friendly data structures" and "memory-intensive data structures." A contiguous array is one of the data layouts most easily exploited by caches and prefetchers — not only because of sequential memory access, but more importantly because its memory layout is entirely transparent to the prefetcher. Any data structure built around indirect pointer access inherently surrenders a portion of its performance back to the memory wall. That said, in real systems, cache-optimized index structures (such as B+-trees aligning internal nodes to cache-line size, or Adaptive Radix Tree with path compression) do exploit intra-cache-line locality as much as possible — but their core access paths still involve at least one level of pointer-level indirection.&lt;/p&gt;




&lt;p&gt;The next part will focus on the TLB, the cost of page-table walks, and how huge pages alleviate both of these problems.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>computerscience</category>
      <category>performance</category>
      <category>systems</category>
    </item>
    <item>
      <title>Cache Deep Dive II — Cache Organization and CPU Topology</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:57:34 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-ii-cache-organization-and-cpu-topology-3fip</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-ii-cache-organization-and-cpu-topology-3fip</guid>
      <description>&lt;p&gt;Part I discussed the physical roots of the memory wall, the design principles of the memory hierarchy, and the interaction between virtual and physical addresses during cache lookup. This part delves into cache internals: how addresses are partitioned into tag, set index, and block offset; the hardware trade-offs among the three organization schemes; the rationale behind the 64-byte cache line; the actual cache topologies of modern CPUs; and the inclusion policies between cache levels.&lt;/p&gt;




&lt;h2&gt;
  
  
  Address Partitioning
&lt;/h2&gt;

&lt;p&gt;When a 64-bit address is sent to the L1 data cache, the hardware partitions it into three fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;|←──── tag ─────→|←── set index ──→|← block offset →|
      T bits            S bits          O bits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a cache line size of 2^O bytes, the low O bits are the block offset, ensuring every byte within the line can be addressed. For a 64-byte cache line, O = 6. The middle S bits form the set index — the cache has 2^S sets, and the address maps to exactly one of them. The remaining high T bits constitute the tag, stored alongside the data in the cache line's metadata and used during set-internal comparison to confirm a match. The set index itself is not stored — all lines in the same set share the same index bits.&lt;/p&gt;

&lt;p&gt;Taking AMD Zen 5's L1d as an example for bit-width calculation: 48 KB / 12-way / 64 B = 64 sets, so S = 6 (2^6 = 64). O = 6 (2^6 = 64 B). Logically, T = 64 − S − O = 52 bits. However, the actual storage width of the tag is determined by the number of effective physical address bits — x86-64 physical address width currently ranges from 48 to 52 bits (depending on LAM and 5-level paging support). Subtracting the 12 bits for S + O, the actual tag width participating in comparison is approximately 36–40 bits. This calculation is also influenced by VIPT design: as described in Part I, L1d is VIPT, so the low 12 bits of the virtual address (page offset) directly serve as the set index and block offset, while tag comparison uses the high-order physical address bits output by the TLB. Under VIPT, S + O ≤ 12 (the page offset bit width), a constraint that ensures the set index and block offset are identical between virtual and physical addresses.&lt;/p&gt;

&lt;p&gt;For programmers, the most important corollary of address partitioning is this: if the access stride happens to be an integer multiple of 2^S × cache line size, every request maps to the same set. For example, with a 64-set, 64-byte-line cache (S = 6, O = 6), a stride of 64 × 64 = 4096 bytes — exactly 4 KB, one page — forces all requests into the same set. With 8-way associativity, the first 8 accesses fill the set, and from the 9th onward, each access triggers an eviction. This is the hardware root of &lt;strong&gt;conflict misses&lt;/strong&gt; — a programmer may "see" high miss rates even knowing the cache has plenty of empty capacity elsewhere, purely due to addressing rules. Notably, a stride of 4096 not only triggers cache set conflicts but also causes every access to cross a page boundary, a topic that will resurface in the discussion of TLB.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Cache Organization Schemes
&lt;/h2&gt;

&lt;p&gt;The most intuitive way to implement a cache is to allow every cache line to store any block from main memory. This is called a &lt;strong&gt;fully associative cache&lt;/strong&gt;. Its advantage is maximum cache utilization — new data can be placed in any empty slot. However, its lookup cost is prohibitive. For a 4 MB L2 cache with 64-byte lines, there are 65,536 lines. On every memory access, the processor must compare the target address's tag against the tags of every single line — 65,536 comparisons per cycle — which is infeasible in power and timing. Fully associative designs are only viable for extremely small caches, such as the TLBs in some Intel CPUs. For L1i, L1d, and larger caches, other approaches are required.&lt;/p&gt;

&lt;p&gt;The other extreme is to map each main-memory address to a unique, fixed location in the cache — a &lt;strong&gt;direct-mapped cache&lt;/strong&gt;. On access, the processor extracts several bits from the address to compute the target slot and compares against only that one slot's tag. A single comparator and multiplexer suffice, making it extremely fast. But the drawback is obvious: if a program repeatedly accesses multiple addresses that map to the same slot, conflict misses occur — multiple addresses fight for one slot while others sit idle. Real programs rarely exhibit uniform access patterns, causing direct-mapped cache utilization to drop sharply. A classic degenerate scenario: a program alternates between two addresses spaced exactly one cache capacity apart — each access evicts the other, yielding a zero hit rate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set-associative caches&lt;/strong&gt; combine the strengths of both. The cache is partitioned into sets, each containing a fixed number of cache lines (the associativity, or number of "ways"). On access, the address first identifies the set, then all tags within that set are compared in parallel. Within a set, the behavior is fully associative; across sets, it is direct-mapped. This design mitigates conflict misses while preserving lookup speed. Virtually all contemporary CPU caches use set-associative designs.&lt;/p&gt;

&lt;p&gt;The fundamental formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cache capacity = number of sets × associativity (ways) × cache line size
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache associativities are not arbitrary — they represent trade-offs under physical design constraints for target workloads:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cache Level&lt;/th&gt;
&lt;th&gt;Zen 5&lt;/th&gt;
&lt;th&gt;Golden Cove (P-core)&lt;/th&gt;
&lt;th&gt;Apple M3 (P-core)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;L1d&lt;/td&gt;
&lt;td&gt;48 KB / 12-way&lt;/td&gt;
&lt;td&gt;48 KB / 12-way&lt;/td&gt;
&lt;td&gt;128 KB / 16-way&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L1i&lt;/td&gt;
&lt;td&gt;32 KB / 8-way&lt;/td&gt;
&lt;td&gt;32 KB / 8-way&lt;/td&gt;
&lt;td&gt;192 KB / 16-way&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2&lt;/td&gt;
&lt;td&gt;1 MB / 16-way&lt;/td&gt;
&lt;td&gt;2 MB / 16-way&lt;/td&gt;
&lt;td&gt;32 MB / 20-way (per cluster)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3&lt;/td&gt;
&lt;td&gt;32 MB / 16-way (per CCD)&lt;/td&gt;
&lt;td&gt;36 MB / 12-way&lt;/td&gt;
&lt;td&gt;48 MB SLC&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;L1d and L1i associativities and capacities are constrained by VIPT (see Part I): under 4 KB pages, 12-way 48 KB or 8-way 32 KB are natural choices within that constraint. For L2, around 16 ways has become the common balance point among access latency, power, and conflict rate in current high-performance processors — too few ways raise conflict miss rates, while too many ways lengthen the tag-comparison timing path, requiring either frequency reduction or additional pipeline stages to accommodate the comparison logic. Apple's M3 achieves 20-way in some P-cluster L2s, partly enabled by its lower clock frequency target (~4 GHz vs. x86's ~5.5 GHz), offering more physical timing margin per cycle for parallel tag comparison.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why 64-Byte Cache Lines
&lt;/h2&gt;

&lt;p&gt;The cache line is not only the fundamental unit of data transfer, but also the minimum granularity at which cache coherence protocols maintain ownership — MESI and similar protocols track state, broadcast invalidations, and transfer ownership at the cache-line level. The false sharing problem discussed later is, at its core, multiple cores contending for ownership of the same cache line while operating on logically unrelated variables. Recognizing that "64 bytes is the common granularity for both data movement and ownership tracking" is prerequisite to understanding many multi-core performance problems.&lt;/p&gt;

&lt;p&gt;Cache line size is determined by three factors.&lt;/p&gt;

&lt;p&gt;First, &lt;strong&gt;tag overhead&lt;/strong&gt;: every line must store a tag and status bits (valid, dirty, MESI state). Smaller lines mean higher tag overhead. For a 4 MB cache: with 32-byte lines (~40-bit tag, ~5-bit status), there are 131,072 lines, tag overhead ≈ 750 KB (≈ 18%). With 64-byte lines, 65,536 lines, tag overhead ≈ 370 KB (≈ 9%).&lt;/p&gt;

&lt;p&gt;Second, &lt;strong&gt;spatial locality&lt;/strong&gt;: larger cache lines pull in more nearby data on a single miss, indirectly improving hit rates.&lt;/p&gt;

&lt;p&gt;Third, &lt;strong&gt;DRAM physical transfer characteristics&lt;/strong&gt;: DDR SDRAM transfers data in bursts on consecutive clock edges. Once a row is activated, multiple columns can be read from it sequentially without additional activate overhead. 64 bytes corresponds exactly to the most common DDR4/DDR5 burst length = 8 × 64-bit data bus width = 8 × 8 B = 64 B.&lt;/p&gt;

&lt;p&gt;Under these three constraints, 64 bytes became the industry standard. Historically, the Intel Pentium (1993) used 32-byte cache lines; the Pentium 4 (2000) mixed 64-byte and 128-byte lines in some caches; from Core 2 (2006) onward, all caches unified at 64 bytes. Note that 64 bytes refers only to the data portion. Counting tag, valid bit, dirty bit, and MESI state bits, each cache line actually occupies about 72 bytes — roughly 12% metadata overhead. A manufacturer-labeled 32 MB L3 cache actually requires about 36 MB of SRAM transistors etched on the silicon.&lt;/p&gt;

&lt;p&gt;C++17 provides &lt;code&gt;std::hardware_destructive_interference_size&lt;/code&gt; and &lt;code&gt;std::hardware_constructive_interference_size&lt;/code&gt;, exposing the 64-byte alignment constant. &lt;code&gt;alignas(std::hardware_destructive_interference_size)&lt;/code&gt; forces two variables that may be concurrently written by different cores onto separate cache lines, avoiding false sharing (detailed in Part VI).&lt;/p&gt;




&lt;h2&gt;
  
  
  Modern CPU Cache Topology
&lt;/h2&gt;

&lt;p&gt;CPU cores are not directly connected to main memory; all reads and writes must pass through the cache hierarchy. Caches are first divided into &lt;strong&gt;data caches&lt;/strong&gt; and &lt;strong&gt;instruction caches&lt;/strong&gt; — Intel adopted this split design starting with the Pentium in 1993 and has maintained it ever since. The L1 cache is divided into L1i and L1d, implementing a &lt;strong&gt;Harvard architecture&lt;/strong&gt;: instruction fetch and data read can proceed in parallel, avoiding bandwidth contention on a single interface. L2 and L3 caches are generally unified — instructions and data share the same storage, achieving higher space utilization: when the workload is instruction-heavy, more space goes to instructions; when data-heavy, more goes to data.&lt;/p&gt;

&lt;p&gt;The above is a general description. Specific topologies differ significantly across vendors, with direct performance implications.&lt;/p&gt;

&lt;h3&gt;
  
  
  AMD: CCDs and Chiplet
&lt;/h3&gt;

&lt;p&gt;Since Zen 2, AMD has employed a chiplet architecture, dividing a single physical package into one I/O Die (IOD) and multiple Core Complex Dies (CCDs). Each CCD contains 8 cores sharing one L3 cache (32 MB for both Zen 4 and Zen 5). Each core has private L1i (32 KB) and L1d (48 KB), plus private L2 (1 MB). When a core accesses an address residing in its local CCD's L3, latency is roughly 50 cycles; if the address resides in a different CCD's L3, the request must be routed through the IOD's Infinity Fabric to the target CCD, raising latency to approximately 100 cycles or more.&lt;/p&gt;

&lt;p&gt;The direct implication for programmers: on dual-CCD consumer processors (e.g., Ryzen 9 7950X, two CCDs with 16 cores total), if a thread frequently migrates between CCDs, its hot cache lines in private L1/L2 must be transferred via the coherence protocol across the IOD, with each migration incurring the cost of inter-core RFO handshakes plus the physical trace delay across CCDs. On EPYC server platforms, a single package may contain up to 12 or 16 CCDs, making cross-CCD latency non-uniformity even more pronounced — this is an on-die Non-Uniform Cache Access effect, distinct from traditional NUMA defined by memory controller distance, but with similar performance impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Intel: Ring and Mesh
&lt;/h3&gt;

&lt;p&gt;Intel client processors (e.g., Core i9-14900K) use a ring bus connecting all cores, L3 slices, GPU, and memory controller. L3 is evenly divided into slices, with each core accessing any slice via the ring. Each ring hop takes about 4–5 cycles, giving a worst-case latency of roughly 20–30 cycles on an 8–12 node ring. Since all nodes on the ring are equidistant in terms of access, the ring bus provides approximately uniform latency — in contrast to AMD's CCD architecture.&lt;/p&gt;

&lt;p&gt;Server-class Xeon Scalable processors (e.g., Sapphire Rapids) employ a 2D mesh interconnect, with latency growing linearly with the number of mesh hops. CHAs (Caching &amp;amp; Home Agents) are distributed across mesh nodes, each responsible for directory tracking of a portion of the address space. A core accessing memory managed by its local CHA experiences lower latency; accessing a region managed by a remote CHA requires multiple mesh hops, with latency reaching 2–3× that of local access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Apple: P-Clusters and SLC
&lt;/h3&gt;

&lt;p&gt;Apple's M series adopts a cache hierarchy distinct from x86. Taking M3 as an example: P-cores have 128 KB L1d and 192 KB L1i (both 16-way). L2 configurations across different M-series SKUs vary significantly, typically with clusters of P-cores sharing large L2 caches (Apple has not published precise official specifications; publicly available data largely comes from reverse-engineering analysis). E-cores have smaller caches but still substantial associativity (128 KB L1d / 96 KB L1i). All CPU clusters and GPU share a System Level Cache (SLC) — 8 MB on the base M3, up to 48 MB on Pro/Max variants. The SLC is part of the unified memory architecture: DRAM (LPDDR5) is packaged alongside the chip, and CPU and GPU access the same physical memory pool through the SLC, eliminating the need for dedicated video memory.&lt;/p&gt;

&lt;p&gt;Apple's L1i and L1d capacities far exceed contemporary x86 — 128 KB L1d / 192 KB L1i vs. x86's 48 KB / 32 KB — enabled by the 16 KB default page size, which lifts the VIPT capacity constraint (16 KB × 16-way = 256 KB ceiling), and explains why Apple can invest far more SRAM budget at the L1 level than x86. Additionally, Apple's ultra-wide decode design (M3 is 8-wide issue) demands extremely high instruction supply bandwidth — L1i output bandwidth must be sufficient to keep the decoders fed — and the combination of large L1i and a micro-op cache (estimated at roughly 4K–6K uops from M1 reverse engineering) collectively sustains the frontend.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uncachable Regions
&lt;/h3&gt;

&lt;p&gt;Certain memory regions are not cached, such as MMIO (Memory-Mapped I/O). The OS marks these physical pages as UC (Uncacheable) via page table attributes and hardware mechanisms such as PAT (Page Attribute Table) / MTRR (Memory Type Range Register). Reads and writes to such addresses fully bypass L1–L3 caches and go directly onto the bus to the device. Meanwhile, the ISA provides instructions that allow programmers to bypass the cache — for large volumes of "write-once, discard" data (such as streaming writes to a GPU framebuffer), non-temporal stores (x86 &lt;code&gt;MOVNTI&lt;/code&gt;, or the corresponding compiler intrinsic &lt;code&gt;_mm_stream_si128&lt;/code&gt;) write directly to memory, avoiding cache pollution. These instructions direct data into write-combining buffers (WC buffers), which batch 64 bytes before issuing a single burst onto the bus, rather than sending one transaction per byte. Detailed discussion of WC and UC mechanisms appears in Part V.&lt;/p&gt;




&lt;h2&gt;
  
  
  Inclusive, Exclusive, and Non-Inclusive
&lt;/h2&gt;

&lt;p&gt;The inclusion relationship between cache levels is an important microarchitectural choice that directly determines effective cache capacity and coherence protocol overhead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inclusive&lt;/strong&gt;: every line in L1 must also exist in L2; likewise for L2–L3. Writebacks are faster when reads dominate, but capacity waste is significant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exclusive&lt;/strong&gt;: a line in L1 does not exist in L2 or L3. A line of data exists in exactly one cache level. Writebacks evict from level to level, wasting no capacity but requiring a longer eviction path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-inclusive&lt;/strong&gt;: the inclusion relationship is neither guaranteed nor denied. A lower level may or may not have the line.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern processors are universally non-inclusive between L1 and L2 — L1 and L2 store data independently without mandatory duplication. Between L2 and L3, there are two camps.&lt;/p&gt;

&lt;p&gt;Intel Core client processors have seen significant changes in L2–L3 inclusion across microarchitecture generations. Early Nehalem through Broadwell (2008–2015) used strict inclusive LLC, motivated not by capacity management but by the &lt;strong&gt;snoop filter&lt;/strong&gt;: when Core A needs to know whether a cache line at a given address is held by other cores, full-die broadcast (querying every core individually) would cause interconnect traffic to grow linearly with core count. An inclusive L3 provides a shortcut — since every line in L2 must have a copy in L3, simply checking L3's tag array answers "which core holds this address." L3's tag array doubles as a snoop filter, suppressing coherence query broadcast traffic within L3. The cost is capacity loss: L3 effective capacity = nominal capacity − Σ(all core L2 capacities).&lt;/p&gt;

&lt;p&gt;Starting with Skylake (2015), Intel gradually transitioned to non-inclusive or weakly-inclusive LLC. Contemporary Golden Cove and Raptor Cove no longer require L2 lines to keep copies in L3, instead relying on distributed directory information and LLC metadata to independently track the ownership of each cache line. This shift eliminates the duplicate storage overhead of L2 data in L3, making L3's nominal capacity its effective capacity, but introduces the SRAM overhead of the directory itself and additional lookup latency.&lt;/p&gt;

&lt;p&gt;AMD's Zen architecture is non-inclusive between L2 and L3. There is no requirement that "L2 contents must be backed in L3"; the full 32 MB of L3 is used for independent data. Snooping functionality is achieved through independent probe filters or directory tracking, without relying on inclusion. This choice gives AMD higher effective utilization of the labeled L3 capacity — for memory-intensive workloads with large working sets and low data reuse, non-inclusive is superior.&lt;/p&gt;

&lt;p&gt;Apple M-series SLC is a variant with inclusion-like properties (forward-compatible in certain versions with subsets of L2), but Apple has not disclosed the exact inclusion semantics between SLC and L2.&lt;/p&gt;

&lt;p&gt;Subject to correct memory model enforcement, the CPU enjoys considerable freedom in cache management. Take x86 TSO (Total Store Order): as long as Core 0's sequence of writes to A then B is observed by all other cores as A changing before B, any optimization is permitted above that TSO baseline — for instance, opportunistically writing back dirty cache lines to main memory during idle bus cycles and clearing their dirty bits. Such operations are fully transparent to the programmer as long as the memory model is not violated.&lt;/p&gt;




&lt;p&gt;This part analyzed the internal organization of caches. The next part moves into dynamic behavior: the hardware implementation of cache replacement policies, the classification and behavior of hardware prefetchers, and the performance characteristics of sequential versus random access under single-thread conditions, as shaped by prefetchers and TLBs.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>computerscience</category>
      <category>performance</category>
      <category>systems</category>
    </item>
    <item>
      <title>Cache Deep Dive I — The Memory Wall and Locality</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Mon, 08 Jun 2026 12:57:06 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-i-the-memory-wall-and-locality-2l5i</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/cache-deep-dive-i-the-memory-wall-and-locality-2l5i</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;A core conviction behind this series: engineers who truly understand the underlying systems possess an intuition in performance engineering that is difficult to replace. When code exhibits unexpected latency, they can trace the problem down through the cache hierarchy, pipeline state, coherence protocol, and even kernel scheduling paths to its physical root cause. This ability is not built on algorithm textbooks — it rests on low-level foundations that are easily overlooked: CPU caches, pipelines, NUMA, Linux kernel memory management, and more. None of these pieces are "difficult" in isolation, but once they combine into a complete picture, they fundamentally change how one reads code.&lt;/p&gt;

&lt;p&gt;This series assumes readers have a basic understanding of computer architecture and operating systems. If not, reading CSAPP first would be advisable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Wall
&lt;/h2&gt;

&lt;p&gt;Since the 1980s, processor performance has grown at roughly 60% per year, while DRAM access latency has improved by only about 7% annually. By the late 1990s, this divergence was stark enough that Wulf and McKee coined the term "Memory Wall" in 1995 — the processor's computational capacity is constrained by the speed at which data reaches the registers from memory. Nearly three decades later, DRAM's absolute access latency still hovers at the 60–80 ns range; the core problem has not been eliminated by process advances.&lt;/p&gt;

&lt;p&gt;Typical access latencies across the storage hierarchy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Storage Level&lt;/th&gt;
&lt;th&gt;Latency (Typical)&lt;/th&gt;
&lt;th&gt;Zen 5 (~5 GHz)&lt;/th&gt;
&lt;th&gt;Golden Cove (~5.5 GHz)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Register&lt;/td&gt;
&lt;td&gt;≤1 cycle&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TLB&lt;/td&gt;
&lt;td&gt;≤1 cycle&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L1 Cache&lt;/td&gt;
&lt;td&gt;~4–5 cycles, ~1 ns&lt;/td&gt;
&lt;td&gt;4 cycles&lt;/td&gt;
&lt;td&gt;5 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L2 Cache&lt;/td&gt;
&lt;td&gt;~12–14 cycles, ~3 ns&lt;/td&gt;
&lt;td&gt;14 cycles&lt;/td&gt;
&lt;td&gt;13 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;L3 Cache&lt;/td&gt;
&lt;td&gt;~40–50 cycles, ~10 ns&lt;/td&gt;
&lt;td&gt;50 cycles&lt;/td&gt;
&lt;td&gt;44 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main Memory&lt;/td&gt;
&lt;td&gt;~150–250 cycles, ~60–80 ns&lt;/td&gt;
&lt;td&gt;~200 cycles&lt;/td&gt;
&lt;td&gt;~200 cycles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVMe SSD&lt;/td&gt;
&lt;td&gt;~15,000 ns&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HDD&lt;/td&gt;
&lt;td&gt;~5,000,000 ns&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For context, a lightweight syscall costs tens to hundreds of nanoseconds — meaning a single main-memory access is already comparable to a system call. With KPTI (Kernel Page Table Isolation, the Meltdown mitigation) enabled, additional page-table switching and TLB-related overhead narrow the gap further. It is also worth noting that Apple's M3 series uses 16 KB pages, giving TLB coverage inherently superior to 4 KB pages, and its system call mechanism differs from x86, with syscall latencies in the 15–30 ns range — platform differences mean the "memory wall" does not feel the same everywhere.&lt;/p&gt;

&lt;p&gt;The latency hierarchy stems from physical mechanisms. L1–L3 caches are built from SRAM, requiring six transistors per bit — fast but large in area and high in cost. DRAM needs only one transistor and one capacitor per bit, offering high density at low cost, but every access must go through a full timing sequence of row activate, column strobe, and precharge — these tens of nanoseconds of overhead are the hard floor of DRAM physics. For NVMe SSDs, the NAND media read itself contributes relatively little to latency; most comes from PCIe bus transfer and NVMe protocol stack processing. HDDs involve mechanical seek delays from the read head, belonging to an entirely different physical regime than electronic latencies.&lt;/p&gt;

&lt;p&gt;These latencies are not always fully exposed on every memory access. Modern CPUs rely on two complementary mechanisms to hide them.&lt;/p&gt;

&lt;p&gt;The first is &lt;strong&gt;out-of-order execution&lt;/strong&gt; (OOO): when a load instruction waits for memory, the CPU picks later instructions from the reorder buffer (ROB) that do not depend on that load's result and continues executing, overlapping computation with memory access in time. The ROB typically holds hundreds of instructions — valuable buffering within a 200-cycle DRAM latency window, but nowhere near enough to fill the entire wait.&lt;/p&gt;

&lt;p&gt;The second, equally important but often overlooked, is &lt;strong&gt;Memory-Level Parallelism&lt;/strong&gt; (MLP): the CPU's memory subsystem allows multiple outstanding memory requests to be in flight on the bus simultaneously. Hardware MSHRs (Miss Status Holding Registers, Intel terminology) or LFBs (Line Fill Buffers) track each outstanding cache miss — each core typically has 10–12 such tracking slots. If a program has two independent load instructions — for example, traversing two unrelated linked lists — the CPU can issue both to the memory subsystem concurrently, with the second request not waiting for the first to complete. Two 200-cycle requests are overlapped, yielding an effective latency of roughly 200 cycles rather than 400. Modern server CPUs survive on the memory wall precisely through this MLP + OOO collaboration: OOO finds parallelizable memory operations in the instruction stream, and MLP enables them to actually execute in parallel on the bus.&lt;/p&gt;

&lt;p&gt;Both defenses share the same blind spot: data dependency chains. In &lt;code&gt;a = p-&amp;gt;next; b = a-&amp;gt;next;&lt;/code&gt;, the address of the second load depends on the result of the first — the second cannot be issued until the first returns. MLP drops to zero at this point, and all instructions in the ROB that indirectly depend on these addresses stall, gradually exhausting the OOO window. This is the fatal weakness of dependency-chain-intensive operations such as pointer chasing, hash table probing, and B-tree traversal: the program hits DRAM's 200-cycle hard wall while hundreds of execution units inside the CPU sit idle. The fundamental motivation for cache design lies exactly here — by keeping "faster, smaller copies" across multiple storage levels, each step in the dependency chain lands in L1/L2's single-digit cycle latency rather than DRAM.&lt;/p&gt;

&lt;p&gt;These latency numbers can be observed directly on a target machine using &lt;code&gt;perf stat&lt;/code&gt; or Intel MLC (Memory Latency Checker). By constructing a pointer-chasing benchmark with a singly linked list, using &lt;code&gt;RDTSC&lt;/code&gt; for timestamping and &lt;code&gt;LFENCE&lt;/code&gt; to eliminate instruction-reordering bias, one can precisely measure the access latency of each cache level. The methodology will be explained further when discussing random access patterns later in this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Memory Hierarchy
&lt;/h2&gt;

&lt;p&gt;The central idea of the memory hierarchy: each level (level k) serves as a smaller, faster cache for level k+1. Data is transferred between two levels in fixed-size units called blocks. For example, if level k has 4 blocks and level k+1 has 16 blocks, data moves back and forth between these layers. The block size between any pair of adjacent levels is fixed — between main memory and cache, the block corresponds to a cache line — but different level pairs may use different block sizes. In modern CPUs, the block size across L1, L2, L3 caches and main memory is almost uniformly 64 bytes.&lt;/p&gt;

&lt;p&gt;Anchoring this in real CPUs, the cache parameters of major microarchitectures (production models as of 2024):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Microarchitecture&lt;/th&gt;
&lt;th&gt;L1d (per core)&lt;/th&gt;
&lt;th&gt;L2 (per core/cluster)&lt;/th&gt;
&lt;th&gt;L3 / LLC (shared)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AMD Zen 5&lt;/td&gt;
&lt;td&gt;48 KB / 12-way&lt;/td&gt;
&lt;td&gt;1 MB / 16-way&lt;/td&gt;
&lt;td&gt;32 MB / 16-way (per CCD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intel Golden Cove&lt;/td&gt;
&lt;td&gt;48 KB / 12-way&lt;/td&gt;
&lt;td&gt;2 MB / 16-way (P-core)&lt;/td&gt;
&lt;td&gt;36 MB (i9-14900K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apple M3 P-core&lt;/td&gt;
&lt;td&gt;128 KB / 16-way&lt;/td&gt;
&lt;td&gt;32 MB (per P-cluster, shared)&lt;/td&gt;
&lt;td&gt;48 MB (SLC)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That total cache capacity is roughly 1/1000 of main memory is no coincidence: SRAM is about 5–10× larger in area than DRAM and roughly 100× more expensive per bit. Equipping a processor with gigabytes of SRAM would push die area and power consumption far beyond manufacturing feasibility. This economic constraint fundamentally determines the capacity ratios across cache levels.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Locality Principle
&lt;/h2&gt;

&lt;p&gt;Caches work not because program memory access is uniform and random — quite the opposite: typical programs access memory in a highly non-uniform fashion. This non-uniformity manifests in two dimensions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Temporal locality&lt;/strong&gt;: an address, once accessed, is very likely to be accessed again in the near future. Typical sources include loop variables, frequently-called function stack frames, and short-lived counters or status flags.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spatial locality&lt;/strong&gt;: after accessing an address, nearby addresses are very likely to be accessed soon afterwards. Sources include sequential array traversal, contiguously-allocated struct fields, and the sequential execution of instruction streams.&lt;/p&gt;

&lt;p&gt;These two forms of locality are the prerequisite for caches to function at all: if programs truly accessed memory purely at random, no cache hierarchy design could prevent hit rates from approaching zero. Data from standard benchmarks (e.g., SPEC CPU 2017) shows that well-written programs typically achieve L1 data cache hit rates above 90%, with L2 hit rates exceeding 70% on the subset that misses L1. This means the vast majority of instructions never face DRAM's full latency.&lt;/p&gt;

&lt;p&gt;The quantitative tool that unifies temporal and spatial locality is &lt;strong&gt;reuse distance&lt;/strong&gt; (also called LRU stack distance): between two consecutive accesses to a given address, how many other distinct addresses does the program access? If the reuse distance is smaller than the cache capacity — more precisely, smaller than the number of sets in the cache — the access is a hit; otherwise, a miss. Analyzing a program's reuse distance distribution allows one to predict cache behavior without running benchmarks. This concept is the core working model of cache simulators such as Valgrind's Cachegrind. In real hardware, set-associative structures also introduce conflict misses, so this relationship holds only as an approximate analytical tool.&lt;/p&gt;

&lt;p&gt;Sequential access and random access form two extremes within this framework. Sequential access exhibits short reuse distances and strong spatial locality, allowing prefetchers to stay ahead; random access often has reuse distances exceeding the effective range of any cache level and prefetcher. Detailed analysis of these two access patterns appears in Parts III and IV of this series.&lt;/p&gt;




&lt;h2&gt;
  
  
  Virtual Addresses and Cache Addressing
&lt;/h2&gt;

&lt;p&gt;A cache line identifies its corresponding main-memory block via an address tag. This address can be virtual or physical. The choice between the two involves fundamental design trade-offs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VIVT&lt;/strong&gt; (Virtually Indexed, Virtually Tagged): Cache indexing and tagging both use virtual addresses. The advantage is that the cache lookup can complete without waiting for TLB translation, minimizing latency. However, the synonym problem arises: when the same physical page is mapped to multiple virtual addresses (common in shared-memory scenarios), multiple copies of the same data may simultaneously exist in the cache, with a modification to one semantically invalidating the others. Additionally, different physical addresses with identical virtual addresses (the same VA in different processes) pollute each other, forcing a cache flush on every context switch. As a result, VIVT is almost never used in modern general-purpose processors, appearing only in a few special-purpose tiny caches (such as the internal structures of some TLB implementations).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PIPT&lt;/strong&gt; (Physically Indexed, Physically Tagged): Both indexing and tagging use physical addresses. Data correctness is guaranteed, but every cache access must first translate the virtual address to a physical address via the TLB — effectively adding TLB latency on top of cache latency. This is unacceptable for L1d, which has a total latency target of only 4–5 cycles and cannot afford an additional TLB translation stage. PIPT is therefore mainly used for L2 and lower-level caches, where latency budgets are more generous.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VIPT&lt;/strong&gt; (Virtually Indexed, Physically Tagged): The compromise for L1d. The cache uses the low-order bits of the virtual address as the set index while simultaneously sending the virtual address to the TLB for translation; after locating the target set, the physical address from the TLB is compared against the physical tags of each way in the set. The key insight: the low-order bits of the virtual address (the page offset) are identical to the low-order bits of the physical address — address translation only modifies the high-order bits. Therefore, as long as all the cache index bits fall within the page-offset range (i.e., bits that do not change during translation), cache indexing can proceed in parallel with TLB translation. Address bits beyond the page offset would be ambiguous until TLB translation completes, causing the aliasing problem.&lt;/p&gt;

&lt;p&gt;This constraint directly limits the maximum capacity of a VIPT L1d cache: with 4 KB pages, the page offset is 12 bits (bits 0–11). If the cache's set index bits fall within bits 0–11, the total cache capacity must not exceed associativity × page size. For an 8-way set-associative cache, maximum capacity = 8 × 4 KB = 32 KB. This explains why x86 processors long had L1d caches of 32 KB 8-way — not a coincidence, but a VIPT addressing constraint. Starting with Zen 4, AMD enlarged L1d to 48 KB 12-way (12 × 4 KB = 48 KB), while Apple's M3 L1d reaches 128 KB 16-way — the latter, aided by 16 KB pages, supports a VIPT ceiling of 16 × 16 KB = 256 KB, far exceeding the constraint under 4 KB pages.&lt;/p&gt;

&lt;p&gt;Virtually all modern high-performance general-purpose processors use VIPT for L1d and L1i, with PIPT from L2 downwards. The fundamental reason for choosing VIPT is performance: it allows cache indexing and TLB translation to proceed in parallel within the same pipeline stage, saving a pipeline cycle that is decisive for the L1 critical latency path.&lt;/p&gt;

&lt;p&gt;The aliasing problem introduced by VIPT (when different virtual addresses map to the same physical address, identical physical tags but different virtual indices cause the same physical line to appear in multiple sets) must be handled by the OS during page allocation. Traditional Unix used page coloring to ensure that the low-order bits of virtual addresses for shared pages match, thereby avoiding aliasing. Modern operating systems increasingly rely on hardware cache design and page-mapping constraints to avoid aliasing-related correctness issues.&lt;/p&gt;




&lt;h2&gt;
  
  
  Measurement Primer
&lt;/h2&gt;

&lt;p&gt;The following two commands form the measurement foundation for all subsequent performance discussions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;perf &lt;span class="nb"&gt;stat&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; L1-dcache-load-misses,L1-dcache-loads,LLC-load-misses,LLC-loads ./program
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reports the miss counts and total accesses for the L1 data cache and the Last Level Cache (LLC, typically L3). High L1 miss rates usually point to data layout problems (see Part III); high LLC miss rates usually point to working sets exceeding cache capacity (see Part IV).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;perf &lt;span class="nb"&gt;stat&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; cycles,instructions,cache-misses,cache-references ./program
&lt;span class="nv"&gt;IPC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;instructions / cycles&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ratio of &lt;code&gt;instructions&lt;/code&gt; to &lt;code&gt;cycles&lt;/code&gt; is IPC (Instructions Per Cycle). When IPC is significantly below the microarchitecture's theoretical peak (e.g., Zen 5's 8-wide issue width corresponds to a typical value of about 4–6), and &lt;code&gt;cache-misses&lt;/code&gt; is simultaneously high, the problem usually points to cache efficiency.&lt;/p&gt;

&lt;p&gt;To precisely measure the latency of each cache level, the core method is to construct a pointer-chasing benchmark: allocate an array internally linked as a singly linked list in random order, traverse the list, and measure the average time per step. By controlling the list length to be less than L1d capacity (measuring L1 latency), greater than L1d but less than L2 (measuring L2 latency), greater than L2 but less than L3 (measuring L3 latency), and greater than LLC (measuring main memory latency), one can precisely fit the access latency for each level.&lt;/p&gt;




&lt;p&gt;This part begins at the top of the picture: why the memory wall exists, how the memory hierarchy is designed, why locality enables caching to work, and how virtual and physical addresses interact during cache lookup. The next part dives into the cache internals: the hardware trade-offs of the three organization schemes, why cache lines are 64 bytes, the cache topology and inclusion policies of modern CPUs, and the specific mechanism by which an address is partitioned into tag, set index, and block offset.&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>computerscience</category>
      <category>performance</category>
      <category>systems</category>
    </item>
    <item>
      <title>Building an Interpreter from Scratch: What 1600 Lines of Modern C++ Can Do</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Fri, 05 Jun 2026 08:55:01 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/building-an-interpreter-from-scratch-what-1600-lines-of-modern-c-can-do-4p2j</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/building-an-interpreter-from-scratch-what-1600-lines-of-modern-c-can-do-4p2j</guid>
      <description>&lt;p&gt;When you type &lt;code&gt;python3 main.py&lt;/code&gt; and hit enter, what actually happens? How does text sitting on your hard drive end up executing on your CPU?&lt;/p&gt;

&lt;p&gt;The answer is a program called an &lt;strong&gt;interpreter&lt;/strong&gt;. Unlike a compiler, which translates source code into a standalone executable before running it, an interpreter reads your code directly, understands what it means, and executes it on the spot. Python, Ruby, JavaScript, Lua — the languages you use every day all run on interpreters under the hood.&lt;/p&gt;

&lt;p&gt;We built one. LoxInterp is a complete interpreter written in C++23. It has full lexical scoping, closures, class inheritance, constructors, &lt;code&gt;super&lt;/code&gt;, 39 token types, 13 AST node types — roughly 1600 lines of source, 1200 lines of tests, zero external dependencies.&lt;/p&gt;

&lt;p&gt;The project is based on Robert Nystrom's classic tutorial &lt;em&gt;Crafting Interpreters&lt;/em&gt;. The original uses Java; we hand-rolled a C++23 version. Open source at &lt;a href="https://github.com/Tenaryo/LoxInterp" rel="noopener noreferrer"&gt;https://github.com/Tenaryo/LoxInterp&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Looks Like
&lt;/h2&gt;

&lt;p&gt;Let's see it in action first. Here's a snippet of Lox — classes, inheritance, &lt;code&gt;super&lt;/code&gt;, instances, all in one shot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Animal {
    init(name) { this.name = name; }
    speak()  { print this.name; }
}
class Dog &amp;lt; Animal {
    speak() {
        super.speak();
        print "woof!";
    }
}
var d = Dog("Buddy");
d.speak();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save it as &lt;code&gt;demo.lox&lt;/code&gt; and run it with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;./build/LoxInterp run demo.lox
&lt;span class="go"&gt;Buddy
woof!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These dozen lines of Lox go through a full pipeline from raw text to CPU execution. Let's unwrap that pipeline layer by layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Simpler Than You Think
&lt;/h2&gt;

&lt;p&gt;An interpreter's skeleton has just four stages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source Code
    │
    ▼
┌──────────┐      ┌──────────┐      ┌──────────────┐      ┌──────────┐
│ Scanner  │ ───▶ │  Parser  │ ───▶ │   Resolver   │ ───▶ │Interpreter│
│  Lexer   │      │  Parser  │      │  Binder+Check  │      │  Runtime  │
└──────────┘      └──────────┘      └──────────────┘      └──────────┘
 Token Stream         AST              Annotated AST          Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;run&lt;/code&gt; command in main.cpp is nothing more than these four steps called in sequence, maybe a dozen lines total:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scanner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scan_tokens&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;          &lt;span class="c1"&gt;// 1. Text → Token stream&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;statements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse_statements&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// 2. Tokens → AST&lt;/span&gt;
&lt;span class="n"&gt;resolver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                  &lt;span class="c1"&gt;// 3. Binding + semantic checks&lt;/span&gt;
&lt;span class="n"&gt;interpret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                         &lt;span class="c1"&gt;// 4. Walk the tree and execute&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One stage feeds into the next. Let's start at the front door — the Scanner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scanner: Text → Token Stream
&lt;/h2&gt;

&lt;p&gt;The Scanner's job is brutally mechanical. It doesn't care about logic, doesn't care about structure, doesn't even know whether &lt;code&gt;1 + 1&lt;/code&gt; is valid syntax. Its only job is to &lt;strong&gt;recognize what's in this blob of characters and slap labels on it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A Token is just four fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Token&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;TokenType&lt;/span&gt; &lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;        &lt;span class="c1"&gt;// LEFT_PAREN / NUMBER / STRING / IF / ...&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;lexeme&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;    &lt;span class="c1"&gt;// Raw text: "(" / "42" / "\"hello\"" / "if"&lt;/span&gt;
    &lt;span class="n"&gt;TokenLiteral&lt;/span&gt; &lt;span class="n"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// Parsed value: null / 42.0 / "hello" / null&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// Line number, for error reporting&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feed it &lt;code&gt;class A &amp;lt; B { fun f(){} }&lt;/code&gt; and it spits out 14 flat tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLASS, IDENTIFIER(A), LESS, IDENTIFIER(B), LEFT_BRACE,
FUN, IDENTIFIER(f), LEFT_PAREN, RIGHT_PAREN,
LEFT_BRACE, RIGHT_BRACE, RIGHT_BRACE, EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice &lt;code&gt;&amp;lt;&lt;/code&gt; is labeled &lt;code&gt;LESS&lt;/code&gt;. The Scanner doesn't know if this &lt;code&gt;&amp;lt;&lt;/code&gt; means inheritance or comparison — that's the Parser's problem. &lt;code&gt;class&lt;/code&gt; and &lt;code&gt;fun&lt;/code&gt; are recognized as keywords, not generic identifiers. The entire scanning process is one giant switch statement, dispatching on the first character:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;Scanner&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;scan_token&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;char&lt;/span&gt; &lt;span class="n"&gt;ch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;advance&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'('&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;add_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LEFT_PAREN&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// Single-char: produce directly&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'"'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scan_string&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;// String: while peek != '"'&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'0'&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="sc"&gt;'9'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scan_number&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// Number: greedy consume&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'!'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;add_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'='&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="n"&gt;BANG_EQUAL&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="n"&gt;BANG&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Two-char&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="sc"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="n"&gt;skip_comment&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;add_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SLASH&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nl"&gt;default:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_alpha&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;                     &lt;span class="c1"&gt;// Identifier / keyword&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_alphanumeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;peek&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt; &lt;span class="n"&gt;advance&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kKeywords&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lexeme&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;add_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;second&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IDENTIFIER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unexpected character."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Illegal character&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A keyword map decides which identifiers are "built-in" — sixteen entries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TokenType&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kKeywords&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"print"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PRINT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"var"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;VAR&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"if"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"while"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WHILE&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"class"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CLASS&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"fun"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FUN&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"and"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AND&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"or"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;OR&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"return"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RETURN&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"super"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SUPER&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"this"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;THIS&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is also why &lt;code&gt;print&lt;/code&gt; becomes the keyword &lt;code&gt;PRINT&lt;/code&gt; while &lt;code&gt;clock&lt;/code&gt; remains a plain &lt;code&gt;IDENTIFIER&lt;/code&gt; — &lt;code&gt;clock&lt;/code&gt; isn't in the table. The Scanner doesn't know about it. It only works as a function call because the Interpreter, at startup, manually stuffs a &lt;code&gt;clock&lt;/code&gt; function object into the global environment. That's where compile time and run time part ways.&lt;/p&gt;

&lt;p&gt;Encounter &lt;code&gt;@&lt;/code&gt; or some other character Lox doesn't use? The Scanner prints an error to stderr, sets a flag, and keeps going. Unlike the Parser — which throws exceptions to unwind — lexical errors don't cascade. The next token is still valid, so keep scanning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parser: Token Stream → AST
&lt;/h2&gt;

&lt;p&gt;The Scanner produces a &lt;strong&gt;flat, one-dimensional list of tokens&lt;/strong&gt;. The Parser's job is to shape them into a &lt;strong&gt;nested, tree-structured AST&lt;/strong&gt; — an Abstract Syntax Tree. The expression &lt;code&gt;print 2 + 3 * 4;&lt;/code&gt; isn't a pile of independent tokens; it's a print statement wrapping an addition whose right-hand side is itself a multiplication.&lt;/p&gt;

&lt;p&gt;From a black-box perspective: the Parser takes &lt;code&gt;std::vector&amp;lt;Token&amp;gt;&lt;/code&gt; in, and produces &lt;code&gt;std::vector&amp;lt;Stmt&amp;gt;&lt;/code&gt; out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;Parser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;parse_statements&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Stmt&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Stmt&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;is_at_end&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;declaration&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;  &lt;span class="c1"&gt;// Parse one top-level decl at a time&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each top-level construct enters &lt;code&gt;declaration()&lt;/code&gt;, which looks at the current token and dispatches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;declaration()
  ├─ Sees VAR    → var_declaration()          // var x = 1;
  ├─ Sees FUN    → function_declaration()     // fun foo(a,b) { ... }
  ├─ Sees CLASS  → class_declaration()        // class Dog &amp;lt; Animal { ... }
  └─ Otherwise   → statement()                // print / if / while / for / return / expression
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's trace an expression statement: &lt;code&gt;print 2 + 3 * 4;&lt;/code&gt;. &lt;code&gt;statement()&lt;/code&gt; sees &lt;code&gt;PRINT&lt;/code&gt; and enters &lt;code&gt;print_statement()&lt;/code&gt;, which calls &lt;code&gt;expression()&lt;/code&gt; to parse the right-hand side &lt;code&gt;2 + 3 * 4&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Expression parsing is where the Parser earns its keep. The technique is called &lt;strong&gt;recursive descent — lower-precedence operators wrap the higher-precedence ones&lt;/strong&gt;. The parse of &lt;code&gt;2 + 3 * 4&lt;/code&gt; goes like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;expression() descends to term(). term() handles + and -:
  1. Call factor() for the left operand. factor() handles * and /.
     factor() descends to primary(), gets Literal(2). No * or / in sight → returns Literal(2).
  2. term() gets Literal(2) as the left operand. Check the next token: is it + or -?
     Yes — it's PLUS. Gobble the PLUS, record op = "+".
  3. term() calls factor() for the right operand.
     factor() descends to primary(), gets Literal(3). Then sees STAR. Gobbles it.
     Descends again, gets Literal(4). Returns Binary(*, 3, 4).
  4. term() wraps everything: Binary(+, Literal(2), Binary(*, 3, 4)).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting tree — note how &lt;code&gt;*&lt;/code&gt; sits deeper, ensuring it evaluates first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Binary
├── left:  Literal(2)
├── op:    "+"
└── right: Binary
           ├── left:  Literal(3)
           ├── op:    "*"
           └── right: Literal(4)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every precedence level follows the same template — recursively grab the left operand, then loop matching your own operators, recursively grab the right operand, wrap into a node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;term&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Expr&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                  &lt;span class="c1"&gt;// Left operand → delegate to lower level&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;PLUS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MINUS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;   &lt;span class="c1"&gt;// Loop: my operators&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;previous&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factor&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;             &lt;span class="c1"&gt;// Right operand → delegate again&lt;/span&gt;
        &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Binary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;op&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;right&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// Wrap&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                           &lt;span class="c1"&gt;// My operators gone → return&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Class declarations follow the same recursive pattern. &lt;code&gt;class Dog &amp;lt; Animal { speak() { ... } }&lt;/code&gt; enters &lt;code&gt;class_declaration()&lt;/code&gt;: grab the class name, check for &lt;code&gt;&amp;lt;&lt;/code&gt; (superclass), then loop inside the braces matching &lt;code&gt;FUN&lt;/code&gt; keywords, each time recursively calling &lt;code&gt;function_declaration()&lt;/code&gt; to parse the method body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;class_declaration&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Stmt&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IDENTIFIER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;            &lt;span class="c1"&gt;// "Dog"&lt;/span&gt;
    &lt;span class="n"&gt;optional&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Expr&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;superclass&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LESS&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;                            &lt;span class="c1"&gt;// Has superclass?&lt;/span&gt;
        &lt;span class="n"&gt;superclass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;IDENTIFIER&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;// "Animal"&lt;/span&gt;
    &lt;span class="n"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LEFT_BRACE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FunctionStmt&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RIGHT_BRACE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;push_back&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declaration&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;  &lt;span class="c1"&gt;// "speak() {...}"&lt;/span&gt;
    &lt;span class="n"&gt;consume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RIGHT_BRACE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ClassStmt&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;superclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;methods&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One elegant design choice: &lt;code&gt;for&lt;/code&gt; loops don't get their own AST node. &lt;code&gt;for_statement()&lt;/code&gt; desugars them directly into &lt;code&gt;while + block&lt;/code&gt; at parse time. The Interpreter never needs to know &lt;code&gt;for&lt;/code&gt; exists — it only handles while and block. Fewer node types, simpler backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resolver: Compile-Time Binding
&lt;/h2&gt;

&lt;p&gt;Between the Parser finishing and the Interpreter starting sits one more compile-time pass — the Resolver. Its core task: &lt;strong&gt;pre-compute, at compile time, where every variable lives, and stamp that information directly into the AST&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why? Consider this innocent-looking code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;var x = "global";
{
    fun f() { print x; }
    f();              // Prints "global" ✓
    var x = "local";   // A new x in the same scope
    f();              // Should still print "global" (closure semantics)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the Interpreter just naively walks up the scope chain at runtime, the second call to &lt;code&gt;f()&lt;/code&gt; would find the newly-declared &lt;code&gt;x = "local"&lt;/code&gt; and print the wrong thing. The Resolver prevents this.&lt;/p&gt;

&lt;p&gt;It maintains a &lt;strong&gt;scope stack&lt;/strong&gt; &lt;code&gt;scopes_&lt;/code&gt; — each frame is a &lt;code&gt;map&amp;lt;name, bool&amp;gt;&lt;/code&gt; where &lt;code&gt;true&lt;/code&gt; means "fully defined, ready to use" and &lt;code&gt;false&lt;/code&gt; means "declared but not yet initialized" (the window between &lt;code&gt;var x&lt;/code&gt; and the &lt;code&gt;=&lt;/code&gt; sign). As it walks the AST:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Variable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Token&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;depth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// -1 = unresolved&lt;/span&gt;
    &lt;span class="c1"&gt;// &amp;gt;= 0 = "skip this many environment frames to find this variable"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When it enters &lt;code&gt;{&lt;/code&gt;, it pushes an empty frame. When it exits &lt;code&gt;}&lt;/code&gt;, it pops. For each &lt;code&gt;Variable&lt;/code&gt; node, it scans the stack from top to bottom, counts how many frames separate the variable from its declaration, and writes that depth into the node. The Interpreter then uses &lt;code&gt;env-&amp;gt;get_at(depth, name)&lt;/code&gt; — no chain walking, no ambiguity, no pollution from later declarations.&lt;/p&gt;

&lt;p&gt;The Resolver also catches a bunch of &lt;strong&gt;semantic errors&lt;/strong&gt; at compile time — things that are syntactically valid but logically wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;return 42;                  // At top level? → Error
this.x = 1;                 // Outside a class? → Error
{ var x = x; }             // Self-reference in initializer? → Error
class Foo &amp;lt; Foo {}          // Inheriting from yourself? → Error
class Bar { super.m(); }   // super without superclass? → Error
class Baz { init() { return 1; } } // Returning value from init? → Error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All caught with exit code 65 before a single line of code runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Interpreter: The Real Thing
&lt;/h2&gt;

&lt;p&gt;The Interpreter is the backend — it takes the depth-annotated AST from the Resolver, recursively walks the tree, and executes. Three functions, that's it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;interpret&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statements&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;           &lt;span class="c1"&gt;// Create global env, execute each stmt&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;           &lt;span class="c1"&gt;// Execute one statement (side effects)&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;LoxLiteral&lt;/span&gt;     &lt;span class="c1"&gt;// Evaluate one expression (produces a value)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;execute&lt;/code&gt; handles all &lt;strong&gt;statements&lt;/strong&gt;. &lt;code&gt;print x&lt;/code&gt; → evaluate &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;cout&lt;/code&gt; the result. &lt;code&gt;var x = 1&lt;/code&gt; → evaluate the right side, then &lt;code&gt;env-&amp;gt;define("x", 1.0)&lt;/code&gt;. &lt;code&gt;if (cond) { A } else { B }&lt;/code&gt; → evaluate the condition, check truthiness, execute the chosen branch. &lt;code&gt;while (cond) { body }&lt;/code&gt; → loop evaluating the condition and executing the body until false.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;evaluate&lt;/code&gt; handles all &lt;strong&gt;expressions&lt;/strong&gt;. &lt;code&gt;Literal&lt;/code&gt; → return the value directly. &lt;code&gt;Binary&lt;/code&gt; → evaluate left and right, then apply the operator. &lt;code&gt;Variable&lt;/code&gt; → skip &lt;code&gt;depth&lt;/code&gt; frames and grab it with &lt;code&gt;env-&amp;gt;get_at(depth, name)&lt;/code&gt;. &lt;code&gt;Call&lt;/code&gt; → evaluate the callee (must be a Callable), evaluate each argument, then invoke &lt;code&gt;.call()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Environment Chain: Heart of the Interpreter
&lt;/h3&gt;

&lt;p&gt;Every time the program enters &lt;code&gt;{ }&lt;/code&gt; (a block), a fresh &lt;code&gt;Environment&lt;/code&gt; is created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Environment&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LoxLiteral&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;values_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// Variables in this scope&lt;/span&gt;
    &lt;span class="n"&gt;shared_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Environment&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;enclosing_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// Pointer to the outer scope&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These link into a &lt;strong&gt;scope chain&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Global env:        {clock: Callable}
                       ↑
Block env:         {a: 1.0}           enclosing_ → global
                       ↑
Function body env: {x: 42.0}         enclosing_ → block
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variable lookup walks the chain. Variable definition only writes in the current frame. Assignment walks the chain to find where it was defined.&lt;/p&gt;

&lt;p&gt;Why &lt;code&gt;shared_ptr&lt;/code&gt;? Because &lt;strong&gt;closures grab the environment at definition time&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fun makeCounter() {
    var i = 0;                      // Local to makeCounter
    fun count() { i = i + 1; return i; }
    return count;                   // count escapes to the outside!
}
var c = makeCounter();
c();  // 1
c();  // 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;count&lt;/code&gt; is defined, &lt;code&gt;fn-&amp;gt;closure&lt;/code&gt; captures &lt;code&gt;makeCounter&lt;/code&gt;'s local environment — the one holding &lt;code&gt;i&lt;/code&gt;. After &lt;code&gt;makeCounter&lt;/code&gt; returns, that environment would normally evaporate. But &lt;code&gt;count&lt;/code&gt;'s closure still holds a &lt;code&gt;shared_ptr&lt;/code&gt; to it, so the reference count stays above zero and &lt;code&gt;i&lt;/code&gt; lives on. Every call to &lt;code&gt;c()&lt;/code&gt; enters &lt;code&gt;Function::call()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;LoxLiteral&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;func_env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_shared&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Environment&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;closure&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Parent = captured env&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;func_env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;          &lt;span class="c1"&gt;// Args bound in this frame&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func_env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Return&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;   &lt;span class="c1"&gt;// return unwinds here&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;monostate&lt;/span&gt;&lt;span class="p"&gt;{};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Parameters are bound in func_env itself (depth 0). Outer variables reachable through &lt;code&gt;enclosing_&lt;/code&gt;. That's closures, in their entirety.&lt;/p&gt;

&lt;h3&gt;
  
  
  Classes, Inheritance, and super
&lt;/h3&gt;

&lt;p&gt;A class itself is a value — it can be assigned to variables, passed as an argument, printed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// When a class is defined:&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;klass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_shared&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LoxClass&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;klass&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Dog"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;klass&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;methods_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s"&gt;"speak"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;speak_fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"init"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;init_fn&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Dog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;klass&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Stored just like any other variable&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instantiation — &lt;code&gt;Dog("Buddy")&lt;/code&gt; — calls &lt;code&gt;LoxClass::call()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;LoxClass&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;LoxLiteral&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;make_shared&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LoxInstance&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;           &lt;span class="c1"&gt;// 1. Blank instance&lt;/span&gt;
    &lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;klass&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;shared_from_this&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;                  &lt;span class="c1"&gt;// 2. Tag its class&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;init&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;find_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"init"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                      &lt;span class="c1"&gt;// 3. Find constructor&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;bound&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;init&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                          &lt;span class="c1"&gt;// 4. Copy the function&lt;/span&gt;
        &lt;span class="n"&gt;bound&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;closure&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"this"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;         &lt;span class="c1"&gt;// 5. Bind this&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;superclass_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;bound&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;closure&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"super"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;superclass_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 6. Bind super&lt;/span&gt;
        &lt;span class="n"&gt;bound&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;                           &lt;span class="c1"&gt;// 7. Execute constructor&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                                      &lt;span class="c1"&gt;// 8. Always return instance&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The constructor's return value is discarded — &lt;code&gt;LoxClass::call()&lt;/code&gt; always returns the newly minted instance. That's the constructor guarantee.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inheritance&lt;/strong&gt; works through &lt;code&gt;find_method&lt;/code&gt;, which walks the chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;LoxClass&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;find_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;shared_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Function&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;methods_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;methods_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;        &lt;span class="c1"&gt;// Check self&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;superclass_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;superclass_&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;find_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// Check parent&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                                            &lt;span class="c1"&gt;// Not found&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Method overriding is automatic — start from the subclass and go up; the first match wins.&lt;/p&gt;

&lt;h3&gt;
  
  
  How super Works — and Why declarating_class_ Exists
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;super&lt;/code&gt; has a subtle semantic. Consider this inheritance chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class A { say() { print "A"; } }
class B &amp;lt; A { test() { super.say(); }   say() { print "B"; } }
class C &amp;lt; B {                            say() { print "C"; } }
C().test();  // Should print "A", not "B" or "C"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;C().test()&lt;/code&gt; → C doesn't have &lt;code&gt;test()&lt;/code&gt;, so walk up to B → execute B's &lt;code&gt;test()&lt;/code&gt;, which calls &lt;code&gt;super.say()&lt;/code&gt;. The question is: which class does &lt;code&gt;super&lt;/code&gt; refer to?&lt;/p&gt;

&lt;p&gt;Logically, &lt;code&gt;test&lt;/code&gt; was &lt;strong&gt;defined in class B&lt;/strong&gt;, so its &lt;code&gt;super&lt;/code&gt; should be B's parent — class A. But at runtime, &lt;code&gt;this&lt;/code&gt; points to the C instance. If we naively compute &lt;code&gt;this.klass.superclass_&lt;/code&gt;, C's parent is B — we'd find B's &lt;code&gt;say()&lt;/code&gt;, print &lt;code&gt;"B"&lt;/code&gt;, and get it wrong.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;Function&lt;/code&gt; needs to remember which class it belongs to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Function&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;weak_ptr&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;LoxClass&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;declaring_class_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ← "Which class defined me?"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why &lt;code&gt;weak_ptr&lt;/code&gt; rather than &lt;code&gt;shared_ptr&lt;/code&gt;? Because &lt;code&gt;LoxClass::methods_&lt;/code&gt; already holds a &lt;code&gt;shared_ptr&amp;lt;Function&amp;gt;&lt;/code&gt;. If &lt;code&gt;Function&lt;/code&gt; held a &lt;code&gt;shared_ptr&lt;/code&gt; back to &lt;code&gt;LoxClass&lt;/code&gt;, we'd have a cycle — each keeps the other alive, reference counts never reach zero, memory leaks. A &lt;code&gt;weak_ptr&lt;/code&gt; doesn't increment the reference count. It just asks: "are you still alive?" If the class is destroyed, the method doesn't need it anymore.&lt;/p&gt;

&lt;p&gt;When a method is bound to an instance — that is, when &lt;code&gt;instance.test()&lt;/code&gt; triggers &lt;code&gt;LoxInstance::get("test")&lt;/code&gt; — the declaring class is used to inject &lt;code&gt;super&lt;/code&gt; into the closure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;declaring_class_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;           &lt;span class="c1"&gt;// Get defining class&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;superclass_&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="nb"&gt;nullptr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;bound&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;closure&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;define&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"super"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dc&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;superclass_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// super = B's parent = A&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No matter how deep the inheritance chain goes below it, &lt;code&gt;super&lt;/code&gt; always refers to the superclass of the method's birthplace.&lt;/p&gt;

&lt;h3&gt;
  
  
  The init Constructor
&lt;/h3&gt;

&lt;p&gt;An &lt;code&gt;init&lt;/code&gt; method is mechanically identical to any other method — it just gets two special treatments. First, &lt;code&gt;LoxClass::call()&lt;/code&gt; always returns the instance regardless of what &lt;code&gt;init&lt;/code&gt; returns. Second, &lt;code&gt;Function&lt;/code&gt; carries an &lt;code&gt;is_init_&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_init_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;func_env&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;get_at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"this"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// Return this, not nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows chaining: &lt;code&gt;instance.init(x).someProp&lt;/code&gt; — after &lt;code&gt;init&lt;/code&gt; runs, the return value is the instance itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why return Throws an Exception
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;return&lt;/code&gt; statement can be buried at any depth — inside an &lt;code&gt;if&lt;/code&gt;, inside a &lt;code&gt;while&lt;/code&gt;, inside a block, inside another &lt;code&gt;if&lt;/code&gt;. If we passed the return value back through the call stack layer by layer, every single &lt;code&gt;execute&lt;/code&gt; would need to check "did something below me return?" That's noise in every handler.&lt;/p&gt;

&lt;p&gt;Instead, &lt;code&gt;Function::call()&lt;/code&gt; wraps the body execution in a try-catch. The &lt;code&gt;return&lt;/code&gt; statement throws a &lt;code&gt;Return&lt;/code&gt; object. No matter how deep the nesting, it unwinds directly to the catch block at the top of the function. Clean, fast, no boilerplate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;From a raw string through the Scanner's 39 token types, through the Parser's recursive descent building the AST, through the Resolver's compile-time binding and semantic checks, to the Interpreter's environment chain and recursive execution — a working interpreter gets built one layer at a time. Around 1600 lines of source, 1200 lines of tests, zero external dependencies.&lt;/p&gt;

&lt;p&gt;The biggest takeaway from building this: interpreters aren't magic. The core engine is a few hundred lines of tree walking. Everything else — expression evaluation, scope management, function calls — is layering features on top of that kernel. After writing your own Scanner/Parser/Interpreter, you can open the CPython or V8 source code and immediately recognize which module does what.&lt;/p&gt;

&lt;p&gt;Of course, this implementation is heavily stripped down. No JIT compilation (tree-walk interpretation is the slowest execution model), no bytecode generation, no garbage collector (reference counting via &lt;code&gt;shared_ptr&lt;/code&gt; is the entire story), no tail-call optimization, no real error recovery (just the simplest panic-mode synchronize), no standard library beyond a single &lt;code&gt;clock&lt;/code&gt; function. Industrial interpreters — CPython, V8, LuaJIT — invest hundreds of thousands of lines in these directions.&lt;/p&gt;

&lt;p&gt;But that's exactly the point. By peeling away all the performance optimizations and engineering complexity, what's left is the raw, unvarnished four-layer pipeline that every interpreter shares. If you want to see what that looks like, the code is at &lt;a href="https://github.com/Tenaryo/LoxInterp" rel="noopener noreferrer"&gt;https://github.com/Tenaryo/LoxInterp&lt;/a&gt;. Pull requests and nitpicks welcome.&lt;/p&gt;

</description>
      <category>computerscience</category>
      <category>cpp</category>
      <category>programming</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Fri, 22 May 2026 11:56:56 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/building-sqlite-from-scratch-740-lines-of-c23-to-understand-every-byte-of-a-db-file-hl2</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/building-sqlite-from-scratch-740-lines-of-c23-to-understand-every-byte-of-a-db-file-hl2</guid>
      <description>&lt;p&gt;You fire up a MySQL client, connect to port 3306, send off your SQL, and the server parses, optimizes, hits an index, fetches rows, and packs the result back to you. You can picture that entire pipeline.&lt;/p&gt;

&lt;p&gt;SQLite has none of that. No server process, no port, no wire protocol. Just a single file: &lt;code&gt;my.db&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So the real question is — what exactly is stuffed inside that file that makes &lt;code&gt;SELECT * FROM apples WHERE color='Yellow'&lt;/code&gt; return the right answer?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Tenaryo/TinySqlite" rel="noopener noreferrer"&gt;TinySqlite&lt;/a&gt; takes this apart across 740 lines of C++23. It doesn't link against the official SQLite library. It opens a .db file's raw binary and pries the data directly out of the disk bytes. We'll follow its code path, peeling back SQLite's file format layer by layer.&lt;/p&gt;

&lt;p&gt;This article covers: file header → B-tree pages → varint encoding → the schema table → full table scans → index scans.&lt;/p&gt;

&lt;h2&gt;
  
  
  What SQLite Actually Is
&lt;/h2&gt;

&lt;p&gt;Let's get the definition straight first.&lt;/p&gt;

&lt;p&gt;SQLite is an &lt;strong&gt;embedded relational database engine&lt;/strong&gt; — in plain English: it's a C library you compile into your program, and once you open a file, you can run SQL against it. No server, no install, no root password.&lt;/p&gt;

&lt;p&gt;If you're familiar with MySQL, here's the mental model. MySQL is a restaurant — a dedicated kitchen (server process), waitstaff (connection handling), a complex ordering system (query optimizer). You sit down, say "SELECT," and the back of house scrambles to bring you the dish.&lt;/p&gt;

&lt;p&gt;SQLite is your fridge. Open it, grab what you need, nobody serves you. The entire database is a single &lt;code&gt;data.db&lt;/code&gt; file. Copy it, carry it, done.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Traditional client-server database:           SQLite's embedded model:
┌─────────┐   TCP/network   ┌───────────┐   ┌──────────────────────────────┐
│ Your app │ ←───────────→ │ DB server  │   │ Your app                     │
└─────────┘                └───────────┘   │  ├── libsqlite.so (engine)    │
                                           │  ├── data.db (the only file)  │
                                           │  └── all ops are local calls  │
                                           └──────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why should you care? You might not encounter MySQL every day, but you're almost certainly already using SQLite. Your phone's contacts, WeChat messages, Chrome bookmarks and browsing history — all stored in SQLite. Every iPhone, every Android device, every browser runs a SQLite instance. It's probably the most deployed database engine on the planet, bar none.&lt;/p&gt;

&lt;p&gt;Using it is trivial. Create a database, make a table, insert data, query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;sqlite3 test.db
sqlite&amp;gt; CREATE TABLE fruits &lt;span class="o"&gt;(&lt;/span&gt;name TEXT, price INT&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
sqlite&amp;gt; INSERT INTO fruits VALUES &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'apple'&lt;/span&gt;, 5&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
sqlite&amp;gt; SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM fruits WHERE price &amp;lt; 10&lt;span class="p"&gt;;&lt;/span&gt;
apple|5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're writing C/C++, include &lt;code&gt;sqlite3.h&lt;/code&gt; and a handful of lines embed a full database in your program.&lt;/p&gt;

&lt;p&gt;Great. You're using it comfortably. But do you actually know — what do the bytes inside &lt;code&gt;test.db&lt;/code&gt; look like?&lt;/p&gt;

&lt;p&gt;Now flip roles. Stop being the user, become the reverse engineer. TinySqlite is a set of reverse-engineering notes that dissects the .db file's binary structure piece by piece. Let's begin.&lt;/p&gt;

&lt;h2&gt;
  
  
  Opening the File — How a .db File Is Organized
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Entire File Is a Chain of Pages
&lt;/h3&gt;

&lt;p&gt;At the macro level, a .db file is astoundingly simple: it's a sequence of &lt;strong&gt;fixed-size pages&lt;/strong&gt; laid end to end. Every page is the same size (typically 4096 bytes), numbered starting from page 1.&lt;/p&gt;

&lt;p&gt;Picture a bookshelf where every shelf slot is the same width. To find the 3rd book, you start from the shelf edge and count to position &lt;code&gt;3 × slot_width&lt;/code&gt;. SQLite pages work the same way — the data for page N starts at file offset &lt;code&gt;(N-1) × page_size&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my.db file:
┌─────── page 1 ───────┐┌─────── page 2 ───────┐┌─────── page 3 ───────┐┌── ...
│ file header (1st 100B)││  page header         ││  page header         │
│ page_size = 4096      ││  type = 0x0D (leaf)  ││  type = 0x05 (interior)
│ num_tables = 3        ││  cells = [row1,row2] ││  child page ptrs     │
│ ...                   ││  ...                 ││  ...                 │
└───────────────────────┘└──────────────────────┘└──────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Page 1 is special — its first 100 bytes form the &lt;strong&gt;file header&lt;/strong&gt;, storing global metadata. Every page after that has only a page header followed by actual data.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the File Header Carries
&lt;/h3&gt;

&lt;p&gt;The first 100 bytes of page 1 in every SQLite file follow a fixed format. The first 16 bytes are the magic string &lt;code&gt;"SQLite format 3\000"&lt;/code&gt; — the file's "ID card." It tells any program that tries to read the file: hey, I'm a SQLite 3 format database.&lt;/p&gt;

&lt;p&gt;Bytes 16–17 store the &lt;strong&gt;page size&lt;/strong&gt;. Note that this is stored in &lt;strong&gt;big-endian&lt;/strong&gt; — high byte first. If these two bytes read &lt;code&gt;0x10 0x00&lt;/code&gt;, that's 4096. If the page size is 512, they'd be &lt;code&gt;0x02 0x00&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here's how TinySqlite reads the page size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;constexpr&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;kPageSizeOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;read_u16_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;uint16_t&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;
         &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint16_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// In the constructor:&lt;/span&gt;
&lt;span class="n"&gt;page_size_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_u16_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kPageSizeOffset&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two bytes assembled into a &lt;code&gt;uint16_t&lt;/code&gt;. No magic.&lt;/p&gt;

&lt;p&gt;Another critical number hides at byte offset 103 (corresponding to SQLite's byte 56): the table count. TinySqlite reads this to know how many tables the database holds — both system and user tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;constexpr&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;kSchemaCountOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;103&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;num_tables_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;read_u16_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;kSchemaCountOffset&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What a B-tree Page Looks Like
&lt;/h3&gt;

&lt;p&gt;Now that you know a file is a chain of pages, the next question is — what's inside a page? How is data actually organized?&lt;/p&gt;

&lt;p&gt;SQLite uses a &lt;strong&gt;B-tree&lt;/strong&gt; to organize data. Each table is a B-tree, and each node in that tree is a page. Pages have two roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Interior pages (type &lt;code&gt;0x05&lt;/code&gt;)&lt;/strong&gt;: These don't store actual data rows. They store "signposts" — child page numbers and key ranges. Their job is navigation: which subtree contains the data you're looking for.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leaf pages (type &lt;code&gt;0x0D&lt;/code&gt;)&lt;/strong&gt;: These hold the real row data. Every &lt;code&gt;INSERT&lt;/code&gt; ultimately lands in a cell on some leaf page.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Visually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;              Page 2 (interior, 0x05)
               /                      \
      Page ? (leaf, 0x0D)        Page ? (leaf, 0x0D)
       [Granny Smith]              [Fuji]
       [Golden Delicious]          [Honeycrisp]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A page's &lt;strong&gt;internal structure&lt;/strong&gt; (the page header) starts at page offset 0:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Offset&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1 byte&lt;/td&gt;
&lt;td&gt;page type (&lt;code&gt;0x05&lt;/code&gt;=interior table, &lt;code&gt;0x0D&lt;/code&gt;=leaf table, &lt;code&gt;0x02&lt;/code&gt;=interior index, &lt;code&gt;0x0A&lt;/code&gt;=leaf index)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2 bytes&lt;/td&gt;
&lt;td&gt;offset of first freeblock&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2 bytes&lt;/td&gt;
&lt;td&gt;number of cells&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2 bytes&lt;/td&gt;
&lt;td&gt;start of cell content area&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;4 bytes&lt;/td&gt;
&lt;td&gt;number of fragmented free bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For interior pages, after the page header comes a &lt;strong&gt;rightmost child pointer&lt;/strong&gt; (4 bytes), pointing to the rightmost child page. Then the &lt;strong&gt;cell pointer array&lt;/strong&gt; — 2 bytes per cell, each pointing to that cell's actual location within the page.&lt;/p&gt;

&lt;p&gt;This "rightmost child + cell pointer array" structure is what enables the B-tree to hop between pages. We'll expand on this when we cover full table scans.&lt;/p&gt;

&lt;p&gt;At this point, you know three key facts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A .db file is a sequence of fixed-size pages&lt;/li&gt;
&lt;li&gt;The file header tells you the page size and how many tables exist&lt;/li&gt;
&lt;li&gt;A B-tree organizes the data — interior pages navigate, leaf pages store rows&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The next natural question — how does SQLite know which tables exist and where each table's B-tree root lives? The answer is tucked inside a special table.&lt;/p&gt;

&lt;h2&gt;
  
  
  sqlite_master — The Database's "Table of Contents"
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Table of Tables
&lt;/h3&gt;

&lt;p&gt;SQLite has a hidden system table called &lt;strong&gt;sqlite_master&lt;/strong&gt;. Think of it as the table of contents at the front of a book — it doesn't store your business data, it &lt;strong&gt;describes&lt;/strong&gt; the structure of the entire database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sqlite_master (system table, exists in every .db file)
┌──────────┬───────────┬──────────┬─────────────────────────────┐
│ type     │ name      │ rootpage │ sql                         │
├──────────┼───────────┼──────────┼─────────────────────────────┤
│ "table"  │ "apples"  │ 2        │ "CREATE TABLE apples(...)"  │
│ "table"  │ "oranges" │ 4        │ "CREATE TABLE oranges(...)" │
│ "index"  │ "idx_..." │ 6        │ "CREATE INDEX..."           │
└──────────┴───────────┴──────────┴─────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each row represents a table, index, view, or trigger. The two most important columns are &lt;code&gt;name&lt;/code&gt; (the table name) and &lt;code&gt;rootpage&lt;/code&gt; (the root page number of that table's B-tree). When you later run &lt;code&gt;SELECT * FROM apples&lt;/code&gt;, that &lt;code&gt;rootpage = 2&lt;/code&gt; is how the engine finds the entry point to the apples table's data.&lt;/p&gt;

&lt;p&gt;So how do you find sqlite_master itself? Its data lives at a fixed location — page 1. The file header's &lt;code&gt;kSchemaCountOffset&lt;/code&gt; tells you how many rows there are, and right after it, starting at &lt;code&gt;kCellPtrArrayStart&lt;/code&gt; (offset 108), is the cell pointer array — each 2-byte pointer references a cell within page 1 that belongs to sqlite_master.&lt;/p&gt;

&lt;p&gt;But before we can actually parse those cells, we need two encoding tools — varint and serial type. They're how SQLite "writes numbers" and "describes types" on disk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Varint: Waste Not, Want Not
&lt;/h3&gt;

&lt;p&gt;SQLite's on-disk format leans heavily on a variable-length integer encoding called &lt;strong&gt;varint&lt;/strong&gt;. The core idea is simple: small numbers take less space, big numbers take more.&lt;/p&gt;

&lt;p&gt;The rule: each byte contributes its lower 7 bits as data, and the highest bit (bit 7) is a "continue" flag. If bit 7 is 1, there are more bytes coming. If it's 0, this is the last byte. Up to 9 bytes, with the 9th byte using all 8 bits.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Encoding (hex)&lt;/th&gt;
&lt;th&gt;Bytes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;code&gt;05&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;&lt;code&gt;82 2C&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1000000&lt;/td&gt;
&lt;td&gt;&lt;code&gt;3D 09 40&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;300 = 0b_0000_0010_0010_1100&lt;/code&gt;. Split into 7-bit groups: &lt;code&gt;0000010&lt;/code&gt; and &lt;code&gt;0101100&lt;/code&gt;. Add the continue bit — high group gets 1 (&lt;code&gt;10000010 = 0x82&lt;/code&gt;), low group gets 0 (&lt;code&gt;00101100 = 0x2C&lt;/code&gt;). &lt;code&gt;read_varint&lt;/code&gt; does the reverse, pulling the value back out of the byte stream:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;read_varint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;VarintResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;byte&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x7F&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;byte&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x80&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unreachable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each iteration grabs 7 bits and shifts them into the result. The loop stops when it hits a byte whose high bit is 0. If it reaches the 9th byte without stopping, it uses the full 8 bits — that's varint's maximum width.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serial Type: What Exactly Is in This Column
&lt;/h3&gt;

&lt;p&gt;Every column of a record carries a &lt;strong&gt;serial type&lt;/strong&gt; code on disk. It tells the parser: is this column NULL, an integer, text, and how many bytes does it occupy?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type code&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Size in bytes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 ~ 4&lt;/td&gt;
&lt;td&gt;1/2/3/4-byte integer&lt;/td&gt;
&lt;td&gt;equals the type code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;6-byte integer&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;8-byte integer&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;IEEE float&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;literal 0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;literal 1&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;≥12 and even&lt;/td&gt;
&lt;td&gt;BLOB&lt;/td&gt;
&lt;td&gt;(N - 12) / 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;≥13 and odd&lt;/td&gt;
&lt;td&gt;text string&lt;/td&gt;
&lt;td&gt;(N - 13) / 2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice the two special values 8 and 9 — the integer values 0 and 1 take up zero bytes on disk; the serial type alone encodes the value. SQLite's disk format is this miserly.&lt;/p&gt;

&lt;p&gt;The corresponding code in TinySqlite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;constexpr&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;serial_type_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt; &lt;span class="n"&gt;serial_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;noexcept&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}[&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;serial_type&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Decoding a Schema Row in Real Time
&lt;/h3&gt;

&lt;p&gt;With varint and serial type in hand, we can now decode a single row from sqlite_master. Say this row describes the apples table: &lt;code&gt;type="table", name="apples", rootpage=2, sql="CREATE TABLE apples(...)"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A schema table cell roughly follows this layout: &lt;code&gt;[payload size (varint)] [rowid (varint)] [header size (varint)] [5 serial types (varint)] [body: actual data for 5 columns]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;TinySqlite's &lt;code&gt;parse_schema_entry&lt;/code&gt; does exactly this: skip payload size and rowid → read 5 serial types → compute per-column byte sizes from each serial type → sequentially read type, name, tbl_name, rootpage, and sql from the body region.&lt;/p&gt;

&lt;p&gt;You don't need to memorize every step — just know that this is how you get a table name and its rootpage out of raw binary. Once you have the rootpage, you start traversing that table's B-tree from the corresponding page.&lt;/p&gt;

&lt;h2&gt;
  
  
  SELECT + WHERE — How Data Actually Gets Found
&lt;/h2&gt;

&lt;p&gt;The first three sections laid all the groundwork: file format, B-tree, schema table, encoding primitives. Now let's answer the question that's been hanging since the beginning —&lt;/p&gt;

&lt;p&gt;You type &lt;code&gt;SELECT name FROM apples WHERE color='Yellow'&lt;/code&gt;. What does SQLite actually do?&lt;/p&gt;

&lt;p&gt;SQL parsing? TinySqlite handles it with string search — find &lt;code&gt;FROM&lt;/code&gt;, split column names, find &lt;code&gt;WHERE&lt;/code&gt;, extract conditions. Let's skip past that in one sentence.&lt;/p&gt;

&lt;p&gt;The interesting part is what comes next: finding the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Full Table Scan: Recursive Descent, Zero Missed Rows
&lt;/h3&gt;

&lt;p&gt;Without an index, SQLite's only option is a &lt;strong&gt;full table scan&lt;/strong&gt; — read every row of the target table top to bottom, then filter with the WHERE clause. Sounds simple enough, but when a table spans multiple pages, how do you guarantee nothing is skipped?&lt;/p&gt;

&lt;p&gt;The answer is in the B-tree traversal algorithm. The entry point is the rootpage — the page number recorded in sqlite_master. TinySqlite starts here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;rp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rootpage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From the rootpage, &lt;code&gt;read_columns_values&lt;/code&gt; recursively walks the entire B-tree. The core logic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;If it's an interior page (&lt;code&gt;0x05&lt;/code&gt;)&lt;/strong&gt;: each cell has a left child pointer pointing to a subtree. Iterate all cells, recursively process each subtree. There's also a rightmost child pointer pointing to the rightmost subtree — can't forget that one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;If it's a leaf page (&lt;code&gt;0x0D&lt;/code&gt;)&lt;/strong&gt;: each cell is a data row. Read them one by one and match against the WHERE condition.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rightmost child is the most overlooked piece of design — that 4-byte pointer sitting before the cell pointer array on interior pages. It points to the subpage covering the range "to the right" of all cells. Without it, the rightmost chunk of data simply gets dropped.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Interior page processing logic&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;num_cells&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_u16_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;right_child&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_u32_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// ← don't forget this one&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;num_cells&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;cell_ptr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;page_offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;read_u16_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;page_offset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;child&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_u32_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cell_ptr&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// ← left child&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;child_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_columns_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;child&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;column_indices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// ...collect rows from the child page...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;right_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;read_columns_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;right_child&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;column_indices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ...collect rows from the rightmost child...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The recursion keeps diving, stops at leaf pages, reads data, and bubbles it back up. The entire B-tree gets fully traversed — not a single row is missed.&lt;/p&gt;

&lt;p&gt;Column value extraction for each row relies on the serial type machinery from the previous section: read varints for serial types and byte widths, then pull data out of the payload region. If a column happens to be the one in the WHERE clause, compare on the spot.&lt;/p&gt;

&lt;h3&gt;
  
  
  Index Scan: Taking the B-tree Shortcut
&lt;/h3&gt;

&lt;p&gt;The problem with full table scans is obvious: you're only looking for &lt;code&gt;color='Yellow'&lt;/code&gt;, yet you're reading every apple of every color. When a table has hundreds of thousands of rows, this hurts.&lt;/p&gt;

&lt;p&gt;SQLite's solution is an &lt;strong&gt;index&lt;/strong&gt;. An index is itself a B-tree, but with a few twists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page types are &lt;code&gt;0x02&lt;/code&gt; (interior index page) and &lt;code&gt;0x0A&lt;/code&gt; (leaf index page)&lt;/li&gt;
&lt;li&gt;Cells don't carry full rows. They carry &lt;strong&gt;index column values + rowid&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The B-tree is sorted by the indexed column&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Back to &lt;code&gt;WHERE color='Yellow'&lt;/code&gt;. If the apples table has an index on the &lt;code&gt;color&lt;/code&gt; column, TinySqlite's path becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Search the index B-tree, collect matching rowids.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;index_search&lt;/code&gt; function traverses the index B-tree to find every entry where &lt;code&gt;color='Yellow'&lt;/code&gt;. Because the index pages are sorted by the color column, the search is binary — compare the index value on the current page, go left if the target is lower, collect the rowid on match, and stop (or continue searching the left subtree) if the target is higher. Vastly more efficient than a full table scan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Use rowids to locate individual rows in the table B-tree.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With a list of rowids in hand, &lt;code&gt;read_row_by_rowid&lt;/code&gt; does point lookups on the table B-tree. Each point lookup follows a path similar to the recursive scan — compare rowids on interior pages to decide which child page to descend into — but it hunts for a single row rather than traversing every cell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;idx_rp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_rootpage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;column&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;idx_rp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint64_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;rowids&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;idx_rp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;where&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rowids&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;rid&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rowids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_row_by_rowid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col_indices&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="c1"&gt;// ...collect results...&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="c1"&gt;// No index, fall back to full table scan&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;QueryResult&lt;/span&gt;&lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read_columns_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;rp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col_indices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;)};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code above is the core decision logic in TinySqlite's &lt;code&gt;execute_query&lt;/code&gt;: &lt;strong&gt;if there's an index, use it; otherwise, do a full scan.&lt;/strong&gt; SQL parsing, schema rootpage lookup, B-tree traversal, WHERE filtering, index search — everything covered so far converges into these dozen lines.&lt;/p&gt;

&lt;p&gt;Of course, production-grade SQLite is far more complex. It has a query optimizer to pick among indexes, WAL journaling for crash recovery, multi-version concurrency control, B-tree page splits and rebalancing. But the core skeleton — file header → page → B-tree → schema → full scan / index scan — is exactly this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Databases Aren't That Mysterious
&lt;/h2&gt;

&lt;p&gt;Starting from the client-server architecture contrast with MySQL, we've peeled all the way down to the .db file's bedrock: fixed-size pages strung together into a tree, a handful of header bytes telling you page size and table count, a schema table whose varint-encoded rows describe where every table lives, interior B-tree pages serving as signposts and leaf pages holding the actual data, and indexes as separate B-trees already sorted for you.&lt;/p&gt;

&lt;p&gt;740 lines of C++23, zero external dependencies, spanning the full path from a binary file header to a SELECT query result. It won't run TPC-C, and it's not going to replace libsqlite3 in your project. But if you want to see what every byte in a .db file is doing, it's exactly enough.&lt;/p&gt;

&lt;p&gt;Code at &lt;a href="https://github.com/Tenaryo/TinySqlite" rel="noopener noreferrer"&gt;https://github.com/Tenaryo/TinySqlite&lt;/a&gt; — this is a teaching-grade implementation, not an industrial one. Issues and feedback are still welcome.&lt;/p&gt;

</description>
      <category>computerscience</category>
      <category>cpp</category>
      <category>database</category>
      <category>systems</category>
    </item>
    <item>
      <title>Building Kafka from Scratch: A Message Broker in 1800 Lines of C++23</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Thu, 21 May 2026 02:43:49 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/building-kafka-from-scratch-a-message-broker-in-1800-lines-of-c23-3kho</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/building-kafka-from-scratch-a-message-broker-in-1800-lines-of-c23-3kho</guid>
      <description>&lt;p&gt;You wrote a web scraper. It crawls product pages and pipes the results downstream for processing. You wired it up with raw TCP, fire-and-forget style. Then the downstream service crashed. After the restart, the messages were gone, and your scraper had no idea what was sent and what wasn't.&lt;/p&gt;

&lt;p&gt;You need something that holds onto messages until the consumer is ready to pick them up. In other words, you need a &lt;strong&gt;message queue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Enter Kafka.&lt;/p&gt;

&lt;p&gt;Kafka is the most widely deployed distributed messaging engine on the planet, powering data pipelines at LinkedIn, Uber, Netflix, and basically anywhere that moves serious volume. You give it a topic (say, &lt;code&gt;crawler-results&lt;/code&gt;), producers push messages in, consumers pull them out. In between sits the Broker, handling connections, persisting data to disk, and routing traffic. Messages don't get lost, ordering is preserved, and scaling is just a matter of adding more machines.&lt;/p&gt;

&lt;p&gt;But real Kafka clocks in at roughly half a million lines of Java. Even browsing the source tree is enough to make most people close the tab.&lt;/p&gt;

&lt;p&gt;So I stripped it to the bone. The result is &lt;a href="https://github.com/Tenaryo/TinyKafka" rel="noopener noreferrer"&gt;TinyKafka&lt;/a&gt;, written from scratch in C++23, ~1,800 lines of core code, zero external dependencies, pure standard library and POSIX sockets. It implements four essential APIs, Produce, Fetch, DescribeTopicPartitions, and ApiVersions, plus a hand-rolled Kafka binary protocol stack and disk-backed persistence. Over 3,200 lines of tests verify every byte on the wire and every write to disk.&lt;/p&gt;

&lt;p&gt;Running it is trivial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;TinyKafka &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./build.sh &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ./build/kafka
Waiting &lt;span class="k"&gt;for &lt;/span&gt;clients to connect...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It binds port 9092 and sits there waiting for Kafka clients to show up.&lt;/p&gt;

&lt;p&gt;So what actually happens to a message from the moment it arrives to the moment it leaves? Let's crack it open, layer by layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Kafka Actually Is
&lt;/h2&gt;

&lt;p&gt;Let's get the concepts out of the way in two paragraphs.&lt;/p&gt;

&lt;p&gt;Kafka's core model has three pieces: a &lt;strong&gt;Producer&lt;/strong&gt; writes messages into a logical channel called a &lt;strong&gt;Topic&lt;/strong&gt;, a &lt;strong&gt;Consumer&lt;/strong&gt; reads messages from that Topic, and the &lt;strong&gt;Broker&lt;/strong&gt; in the middle stores and forwards everything. A Topic can be split into multiple &lt;strong&gt;Partitions&lt;/strong&gt;, spreading data across them so throughput scales horizontally.&lt;/p&gt;

&lt;p&gt;It solves three problems: &lt;strong&gt;decoupling&lt;/strong&gt; (producers and consumers don't need to know about each other), &lt;strong&gt;buffering&lt;/strong&gt; (messages pile up on disk and get consumed at the consumer's pace), and &lt;strong&gt;durability&lt;/strong&gt; (messages hit the disk and survive restarts).&lt;/p&gt;

&lt;p&gt;Alright, concepts done. Now let's see what a Broker looks like when you remove everything that isn't essential.&lt;/p&gt;

&lt;h2&gt;
  
  
  The While Loop Is the Whole Engine
&lt;/h2&gt;

&lt;p&gt;Here's TinyKafka's entire flow in pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Startup: read metadata from disk → now you know what topics and partitions exist
2. Bind port 9092, start accepting client connections
3. For each connected client, detach a thread:
   while (connection alive) {
       read 4 bytes → now you know the message length
       read the full message body
       parse_request()   → binary blob becomes typed struct
       Broker::handle()  → do the actual work
       serialize()       → response back to binary
       send_all()        → fire it back to the client
   }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's &lt;code&gt;main.cpp&lt;/code&gt; in its entirety, 88 lines. Here's the real thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1. Read KRaft metadata from disk&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse_cluster_metadata_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"/tmp/kraft-combined-logs/__cluster_metadata-0/00000000000000000000.log"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Start TCP server on port 9092&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Server&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;9092&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. Accept loop: accept → hand off to thread&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;client_fd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;client_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Broker&lt;/span&gt; &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/tmp/kraft-combined-logs"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Receive: first read 4-byte length prefix&lt;/span&gt;
                &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;len_buf&lt;/span&gt;&lt;span class="p"&gt;{};&lt;/span&gt;
                &lt;span class="n"&gt;recv_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;len_buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;message_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decode_int32_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;len_buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

                &lt;span class="c1"&gt;// Read the message body&lt;/span&gt;
                &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message_length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="n"&gt;recv_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

                &lt;span class="c1"&gt;// Binary → typed request → handle → serialize → send back&lt;/span&gt;
                &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;broker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;serialize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="n"&gt;send_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client_fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="n"&gt;detach&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;detach()&lt;/code&gt;. Each client gets its own thread running its own receive-parse-handle-send loop. Multiple producers and consumers can connect simultaneously without stepping on each other. Simple, blunt, and effective.&lt;/p&gt;

&lt;p&gt;Real Kafka is far less cowboy about this. It uses thread pools and a Reactor pattern to avoid the cost of spawning and tearing down threads constantly, and Java NIO for non-blocking I/O. TinyKafka's one-thread-per-connection model is more of a proof of concept, it lets you see the concurrency model in one glance. The real thing adds enormous engineering (zero-copy &lt;code&gt;sendfile&lt;/code&gt;, mmap-backed file access, segmented indices for O(1) offset lookup, and a dozen other things), but the skeleton loop, receive request, process, send response, is identical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speaking Kafka's Language: The Binary Protocol
&lt;/h2&gt;

&lt;p&gt;So what's &lt;code&gt;parse_request()&lt;/code&gt; actually doing? Turning raw network bytes into C++ structs, and that's where TinyKafka's grittiest code lives: a hand-built implementation of the Kafka binary wire protocol.&lt;/p&gt;

&lt;p&gt;Every Kafka message is structured as three parts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+------------------+------------------+------------------+
|  message_size    |     Header       |      Body        |
|  (4 bytes, BE)   |  (variable)      |  (API-dependent)  |
+------------------+------------------+------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;message_size&lt;/strong&gt;: a 4-byte big-endian integer that tells the other side "here's how many more bytes to read." TCP is a stream protocol with no built-in message boundaries. This length prefix is a simple framing layer, without it you'd never know when one message ends and the next begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Header&lt;/strong&gt; carries three critical fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;api_key&lt;/code&gt; (2 bytes): what kind of message this is. 0 = Produce, 1 = Fetch, 18 = ApiVersions, 75 = DescribeTopicPartitions. Real Kafka has over a hundred api keys; we implemented four.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;api_version&lt;/code&gt; (2 bytes): the version of this API. Kafka keeps multiple versions of the same API alive simultaneously. An old client speaks v0, a newer one speaks v16, the broker picks the intersection.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;correlation_id&lt;/code&gt; (4 bytes): a sequence number for matching responses to requests. The client stamps it on the request, the broker echoes it back, and the client uses it to figure out "this response goes with that request I sent earlier."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Body&lt;/strong&gt;: varies by api_key. A Produce body carries a topic name and a blob of record batch bytes. A Fetch body carries a topic UUID and a list of partitions. The structure is strictly defined by the Kafka protocol spec.&lt;/p&gt;

&lt;p&gt;Why binary instead of something like HTTP? Because it's &lt;strong&gt;compact&lt;/strong&gt;. An int32 in HTTP is the string &lt;code&gt;"2147483647"&lt;/code&gt;, ten bytes. In binary it's always exactly four bytes. Kafka moves trillions of messages a day; that difference is not academic. And fixed-position binary fields mean no per-character scanning like JSON parsing, byte 4 is always this, bytes 5-6 are always that, one &lt;code&gt;memcpy&lt;/code&gt; and you're done.&lt;/p&gt;

&lt;p&gt;Since network byte order is big-endian everywhere, TinyKafka has a small arsenal of hand-rolled encode/decode primitives. Reading an int32, for instance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;decode_int32_be&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int32_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
           &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int32_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
           &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int32_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
           &lt;span class="k"&gt;static_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int32_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four bytes. Most significant in &lt;code&gt;data[0]&lt;/code&gt;, least significant in &lt;code&gt;data[3]&lt;/code&gt;. This function looks trivial, but the entire Kafka protocol stack is built out of hundreds of calls just like it.&lt;/p&gt;

&lt;p&gt;On top of these primitives sit &lt;code&gt;ByteReader&lt;/code&gt; and &lt;code&gt;ByteWriter&lt;/code&gt;, two utility classes that read and write int16/int32/varints/compact strings sequentially over &lt;code&gt;std::span&lt;/code&gt;. parser.cpp runs 290 lines, serializer.cpp 225, both standing on the shoulders of these two helpers.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;api_key&lt;/code&gt; in the header determines how the body gets parsed and how the request gets handled. We implemented four of them, 0, 1, 18, and 75. Let's take them one at a time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four APIs, Four Kinds of Work
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;broker.cpp&lt;/code&gt; and you'll find a single method, &lt;code&gt;Broker::handle()&lt;/code&gt;, that does all the heavy lifting. But before we look at the dispatch mechanism, let's understand what each of the four APIs actually does.&lt;/p&gt;

&lt;h3&gt;
  
  
  ApiVersions (api_key = 18): The Handshake
&lt;/h3&gt;

&lt;p&gt;The first thing a Kafka client typically does after connecting is ask the broker: "What APIs do you support, and which version ranges?"&lt;/p&gt;

&lt;p&gt;The response is a table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ApiVersionEntry&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// API number&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;min_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// lowest supported version&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;max_version&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;// highest supported version&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ApiVersionsResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;correlation_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ApiVersionEntry&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;api_keys&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// our four entries&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;throttle_time_ms&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TinyKafka's answer is this compile-time table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="k"&gt;constexpr&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ApiVersionEntry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kSupportedApis&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt;
    &lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;// Produce&lt;/span&gt;
    &lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;  &lt;span class="c1"&gt;// Fetch&lt;/span&gt;
    &lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;// ApiVersions&lt;/span&gt;
    &lt;span class="p"&gt;{.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;min_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;   &lt;span class="c1"&gt;// DescribeTopicPartitions&lt;/span&gt;
&lt;span class="p"&gt;}};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the client sends a version outside &lt;code&gt;[0, 4]&lt;/code&gt;, the broker fires back error_code = 35 (&lt;code&gt;UNSUPPORTED_VERSION&lt;/code&gt;) and that's the end of the conversation. That's Kafka version negotiation in its entirety, simpler than HTTP Content-Negotiation by a mile.&lt;/p&gt;

&lt;h3&gt;
  
  
  DescribeTopicPartitions (api_key = 75): "What partitions does this topic have?"
&lt;/h3&gt;

&lt;p&gt;A client wants to know about a topic's metadata. Does it exist? What partitions does it have? Who's the leader of each partition?&lt;/p&gt;

&lt;p&gt;TinyKafka handles this by looking up the topic name in an in-memory &lt;code&gt;ClusterMetadata&lt;/code&gt; structure. That structure gets built at startup by parsing a KRaft metadata log file, &lt;code&gt;__cluster_metadata-0/00000000000000000000.log&lt;/code&gt;, which contains the canonical record of every topic and partition.&lt;/p&gt;

&lt;p&gt;Found it? Here's the partition list. Didn't find it? error_code = 3 (&lt;code&gt;UNKNOWN_TOPIC_OR_PARTITION&lt;/code&gt;). Results come back sorted alphabetically by topic name, because the Kafka spec demands it. You can see this sorting behavior verified byte-for-byte in the integration tests: send &lt;code&gt;["zebra", "apple"]&lt;/code&gt;, get back &lt;code&gt;["apple", "zebra"]&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fetch (api_key = 1): Consumer Pulling Messages
&lt;/h3&gt;

&lt;p&gt;A consumer says: "Give me the messages for partition 0 of the topic with UUID &lt;code&gt;a1b2c3d4...&lt;/code&gt;."&lt;/p&gt;

&lt;p&gt;A Fetch request comes with a pile of fields, &lt;code&gt;max_wait_ms&lt;/code&gt;, &lt;code&gt;min_bytes&lt;/code&gt;, &lt;code&gt;max_bytes&lt;/code&gt;, &lt;code&gt;isolation_level&lt;/code&gt;, &lt;code&gt;session_id&lt;/code&gt;, &lt;code&gt;session_epoch&lt;/code&gt;, all controlling fetch behavior. TinyKafka keeps only the two that matter for the minimal path: &lt;strong&gt;topic UUID&lt;/strong&gt; and &lt;strong&gt;partition_index&lt;/strong&gt;. Everything else gets skipped. Real Kafka uses those extra fields for long-polling, transactional isolation, and other advanced features, but our goal is just to get the bytes flowing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;FetchTopicRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;topic_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;           &lt;span class="c1"&gt;// 16-byte UUID&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;FetchPartitionRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;FetchPartitionResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int32_t&lt;/span&gt; &lt;span class="n"&gt;partition_index&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;int16_t&lt;/span&gt; &lt;span class="n"&gt;error_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// the payload: raw record batch bytes&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the broker finds the topic, it reads the entire partition log file from disk and stuffs it directly into the &lt;code&gt;records&lt;/code&gt; field. The consumer gets raw record batch bytes and does its own decoding. This is Kafka's philosophy: the broker should touch message content as little as possible. It stores, it forwards. Decoding is the client's problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  Produce (api_key = 0): Producer Sending Messages
&lt;/h3&gt;

&lt;p&gt;This is where the scraper's data from the opening story finally enters Kafka:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ProduceTopicRequest&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;topic_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ProducePartitionRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="c1"&gt;// each partition carries a blob of records (record batch bytes)&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TinyKafka looks up the topic by name, verifies the partition exists, and appends the record batch bytes to a disk file. Kafka doesn't store messages one at a time. They're packed into &lt;strong&gt;record batches&lt;/strong&gt;, each batch containing multiple records, with a magic byte tagging the format version. This batching is what gives Kafka its legendary throughput, it dramatically reduces the number of disk I/O operations.&lt;/p&gt;

&lt;p&gt;Four APIs down. Now let's see how the broker routes a request to the right handler.&lt;/p&gt;

&lt;h3&gt;
  
  
  variant + visit + overloaded: The Compiler Won't Let You Forget
&lt;/h3&gt;

&lt;p&gt;TinyKafka models the four request types as a &lt;code&gt;std::variant&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;
    &lt;span class="n"&gt;ApiVersionsRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DescribeTopicPartitionsRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;FetchRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ProduceRequest&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Response works the same way. Then &lt;code&gt;Broker::handle()&lt;/code&gt; dispatches every case in one shot using &lt;code&gt;std::visit&lt;/code&gt; with the &lt;code&gt;overloaded&lt;/code&gt; pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;Broker&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;overloaded&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;ApiVersionsRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* version negotiation */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;DescribeTopicPartitionsRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* lookup */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;FetchRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* read from disk */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;ProduceRequest&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* write to disk */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the MoonieCode post we called &lt;code&gt;std::variant&lt;/code&gt; a "paranoid envelope": it holds exactly one of the declared types, nothing else, and the compiler forces you to handle every single case. Forget to write the Produce handler? Your build breaks. No virtual function overhead, no if-else chain, no missed cases. It's cleaner and safer than traditional OOP with virtual dispatch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Messages Live: The Disk Layout
&lt;/h2&gt;

&lt;p&gt;DescribeTopicPartitions depends on metadata read from disk. Fetch reads from disk. Produce writes to disk. They all converge on TinyKafka's storage layer. So what exactly is sitting on that filesystem?&lt;/p&gt;

&lt;h3&gt;
  
  
  Directory Structure: Exactly Like Real Kafka
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;/tmp/kraft-combined-logs/
├── __cluster_metadata-0/
│   └── 00000000000000000000.log    ← KRaft metadata
├── orders-0/
│   └── 00000000000000000000.log    ← partition 0 of the "orders" topic
├── crawler-results-0/
│   └── 00000000000000000000.log    ← partition 0 of "crawler-results"
└── ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The naming convention is &lt;code&gt;{topic}-{partition}/00000000000000000000.log&lt;/code&gt;, identical to real Kafka. Offset starts at zero. Single segment file per partition. No rolling segments (real Kafka would rotate to &lt;code&gt;00000000000000000020.log&lt;/code&gt; after hitting a size threshold).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Starting Point: Metadata Files
&lt;/h3&gt;

&lt;p&gt;That &lt;code&gt;__cluster_metadata-0/00000000000000000000.log&lt;/code&gt; is the KRaft metadata log. Since Kafka 2.8, KRaft mode lets you run without ZooKeeper, cluster metadata lives as record batches inside this file.&lt;/p&gt;

&lt;p&gt;At startup, TinyKafka slurps it into memory and walks the record batch v2 format layer by layer: first identify record batch boundaries (magic byte = 2), then parse each record's value. The value itself is a compact varint-encoded frame where the critical field is &lt;code&gt;type&lt;/code&gt;: type=2 means it's a topic record (name + UUID), type=3 means it's a partition record (partition ID + parent topic UUID). Topics and partitions get linked by UUID.&lt;/p&gt;

&lt;p&gt;The parsed result becomes a &lt;code&gt;ClusterMetadata&lt;/code&gt; struct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;ClusterMetadata&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;TopicInfo&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;topics&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;                              &lt;span class="c1"&gt;// all topics&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;name_to_topic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// lookup by name&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;unordered_map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;size_t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;UuidHash&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;uuid_to_topic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// lookup by UUID&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three tables. Two lookup paths. O(1) to find any topic. With this map built, all routing falls into place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Produce routing&lt;/strong&gt;: topic name → &lt;code&gt;name_to_topic&lt;/code&gt; → find partition list → verify partition exists → write to disk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fetch routing&lt;/strong&gt;: topic UUID → &lt;code&gt;uuid_to_topic&lt;/code&gt; → find topic name → construct file path → read from disk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice that Produce uses the name and Fetch uses the UUID. This isn't arbitrary, it's mandated by the Kafka protocol: producers send by topic name, consumers fetch by UUID (because the &lt;code&gt;DescribeTopicPartitions&lt;/code&gt; step already translated name to UUID for them).&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing and Reading
&lt;/h3&gt;

&lt;p&gt;Writing (the Produce path) takes only a few lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}/{}-{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;root_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;partition&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;filesystem&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;create_directories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ec&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;   &lt;span class="c1"&gt;// auto-create directory tree&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}/00000000000000000000.log"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dir&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ofstream&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ios&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;binary&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ios&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// append mode&lt;/span&gt;
&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;create_directories&lt;/code&gt; handles the first write to a new partition by building the directory tree automatically. &lt;code&gt;ios::app&lt;/code&gt; means every write appends to the end of the file, never overwriting existing data.&lt;/p&gt;

&lt;p&gt;Reading (for Fetch and metadata) is just as short: &lt;code&gt;ifstream&lt;/code&gt; open, &lt;code&gt;tellg&lt;/code&gt; for size, one shot into a &lt;code&gt;vector&amp;lt;uint8_t&amp;gt;&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ifstream&lt;/span&gt; &lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ios&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;binary&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ios&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;ate&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;sz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tellg&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;uint8_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;reinterpret_cast&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="o"&gt;*&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt; &lt;span class="n"&gt;sz&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Real Kafka would never read entire files into memory like this. It uses mmap for file-backed access and indexed lookups to find the exact byte range it needs by offset. But for an 1,800-line prototype, simple and direct is exactly the right call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Best Way to Understand Something Is to Build It
&lt;/h2&gt;

&lt;p&gt;That's TinyKafka, every layer from network protocol to disk storage, peeled open. In 1,800 lines it packs a full binary protocol stack, handlers for four APIs, KRaft metadata parsing, and disk-backed persistence. Think of it as a miniature Kafka anatomy model.&lt;/p&gt;

&lt;p&gt;I didn't build TinyKafka to create a production-grade broker. I built it to &lt;strong&gt;understand&lt;/strong&gt;. Kafka's documentation and source code are intimidatingly large, but once you've built a minimal version yourself, you realize the core skeleton isn't that complicated: accept binary requests over TCP, dispatch by api_key, route through metadata to the right disk file, read or write. Everything else, zero-copy, segmented indices, replica synchronization, ISR management, transactional support, is engineering built on top of that skeleton.&lt;/p&gt;

&lt;p&gt;There's a learning philosophy here: instead of letting half a million lines of source code intimidate you, spend a day building a minimal prototype. Afterwards, those massive codebases stop looking like alien artifacts. You recognize the bones.&lt;/p&gt;

&lt;p&gt;Code is on &lt;a href="https://github.com/Tenaryo/TinyKafka" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Stars, issues, and ruthless code review are all welcome.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>distributedsystems</category>
      <category>systemdesign</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building Claude Code from Scratch: A Minimal Agent in 393 Lines of C++</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Wed, 20 May 2026 04:38:16 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/building-claude-code-from-scratch-a-minimal-agent-in-393-lines-of-c-3bgi</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/building-claude-code-from-scratch-a-minimal-agent-in-393-lines-of-c-3bgi</guid>
      <description>&lt;p&gt;An AI coding assistant that reads your files, writes code, and runs shell commands. The core logic? A single while loop. I thought it was bullshit too, until I built one myself.&lt;/p&gt;

&lt;p&gt;The project is called MoonieCode, and the code lives here: &lt;a href="https://github.com/Tenaryo/MoonieCode" rel="noopener noreferrer"&gt;https://github.com/Tenaryo/MoonieCode&lt;/a&gt;. Written in C++23, clocking in at 393 lines of source (637 if you count tests). Here's what it looks like in action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;./moonie-code &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"list all .cpp files in the project"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few seconds later Claude spits back your file list. What just happened? You gave it a sentence, it threw that sentence into an HTTP request, shipped it off to a Claude Haiku model somewhere in the cloud, Claude decided it needed to run &lt;code&gt;find&lt;/code&gt;, MoonieCode ran it for Claude, fed the output back, and Claude formatted it into something human-readable.&lt;/p&gt;

&lt;p&gt;That first step wasn't running bash. First it had to talk to the LLM. So let's start there: how do you get C++ and Claude to shake hands?&lt;/p&gt;

&lt;h2&gt;
  
  
  Shaking Hands with Claude
&lt;/h2&gt;

&lt;p&gt;Talking to an LLM boils down to two moves: you HTTP POST a blob of JSON at it, and it sends a blob of JSON back. MoonieCode's &lt;code&gt;HttpClient&lt;/code&gt; is a 25-line class whose guts are basically this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;cpr&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Response&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cpr&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;cpr&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Url&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_url_&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/chat/completions"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;cpr&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Header&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="s"&gt;"Authorization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Bearer "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;api_key_&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="n"&gt;cpr&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;request_body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;cpr&lt;/code&gt; is a C++ wrapper around libcurl that handles the HTTP plumbing so you don't have to. You stuff your API key into the &lt;code&gt;Authorization&lt;/code&gt; header, pack your JSON into the body, and POST to OpenRouter, an LLM API gateway that forwards the request to Claude for you.&lt;/p&gt;

&lt;p&gt;So what's in that JSON? Two things: &lt;code&gt;messages&lt;/code&gt; and &lt;code&gt;tools&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;messages&lt;/code&gt; is an array holding the conversation history between you and Claude. At the start it's just one entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"list all .cpp files in the project"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tools&lt;/code&gt; is another array that tells Claude "here's what you have at your disposal." Each tool is a JSON object with a name, a description, and a parameter schema. Claude scans the list and goes, alright, I can ask this program to read files, write files, and run commands for me.&lt;/p&gt;

&lt;p&gt;After you fire off the request, Claude sends back a JSON response. And here's where it gets fun: Claude's response comes in exactly two flavors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flavor one, straight text.&lt;/strong&gt; You ask "what's 1+1" and it just answers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1+1 equals 2"&lt;/span&gt;&lt;span class="p"&gt;}}]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Flavor two, tool call.&lt;/strong&gt; You ask it to "list all cpp files" and it can't answer directly, so it asks for help:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;command&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;find . -name '*.cpp'&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}]}}]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's saying "I can't do this myself, but run this command for me and I'll take it from there." Notice &lt;code&gt;arguments&lt;/code&gt; is a string containing more JSON, Claude packed a shell command inside it.&lt;/p&gt;

&lt;p&gt;Now the hard part: how does your code tell these two cases apart? If Claude gives you text, print it. If it wants a tool run, execute the tool. You need those two paths separated cleanly.&lt;/p&gt;

&lt;p&gt;MoonieCode solves this with a very C++ move:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;ParsedResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ContentResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolCall&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;std::variant&lt;/code&gt; works like a paranoid envelope: it contains either a letter (&lt;code&gt;ContentResult&lt;/code&gt;) or a toolbox (a list of &lt;code&gt;ToolCall&lt;/code&gt; objects), never both, never neither. And the compiler makes sure you handle both cases. Omit one, and your build fails.&lt;/p&gt;

&lt;p&gt;Handling the variant means pairing it with &lt;code&gt;std::visit&lt;/code&gt; and a classic C++ pattern called &lt;code&gt;overloaded&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;template&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="nc"&gt;Ts&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;overloaded&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Ts&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Ts&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;operator&lt;/span&gt;&lt;span class="p"&gt;()...;&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Six lines of template code that let you dispatch elegantly with lambdas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;overloaded&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;ContentResult&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* Claude answered, print it */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ToolCall&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;tcs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* Claude wants tools, run them */&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The beauty of this pattern is type safety. You physically cannot write code that forgets to handle one of the two possibilities. The compiler will chase you down until every branch exists. People love to complain that C++ is verbose, but this flavor of compile-time guardrail is genuinely satisfying when you're building something that has to not crash.&lt;/p&gt;

&lt;p&gt;Alright, your program now knows what Claude wants. Next question: if Claude asked for a tool, what happens?&lt;/p&gt;

&lt;h2&gt;
  
  
  The While Loop Is the Soul of the Agent
&lt;/h2&gt;

&lt;p&gt;Here's the entire agent loop in pseudocode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;push&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;s prompt into messages
while (not done) {
    pack messages + tools into JSON
    POST to Claude
    parse Claude&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="nx"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;print&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;we&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;re done
    } else if (response is tool calls) {
        append Claude&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt; &lt;span class="nx"&gt;records&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt; &lt;span class="nx"&gt;call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;execute&lt;/span&gt; &lt;span class="nx"&gt;it&lt;/span&gt; &lt;span class="nx"&gt;locally&lt;/span&gt;
            &lt;span class="nx"&gt;append&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No black magic, no secret sauce. Peel back the marketing and you find a while loop wrapping a four-step cycle: ask the LLM, see what it wants, if it answered you're done, if it asked for a tool you run it and ask again.&lt;/p&gt;

&lt;p&gt;One detail that's easy to overlook: that &lt;code&gt;messages&lt;/code&gt; array keeps growing. The "conversation history" with Claude isn't wiped between rounds, it just piles up layer by layer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starts with one &lt;code&gt;role: "user"&lt;/code&gt; message&lt;/li&gt;
&lt;li&gt;Claude says "run this command," so you append an &lt;code&gt;role: "assistant"&lt;/code&gt; message with &lt;code&gt;tool_calls&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Command finishes, you append a &lt;code&gt;role: "tool"&lt;/code&gt; message with the output&lt;/li&gt;
&lt;li&gt;Next request carries the entire history, so Claude sees "last time I told you to run this, the result was this, now I will..."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the agent's "memory." No vector database, no fancy RAG pipeline, just &lt;code&gt;push_back&lt;/code&gt; on a JSON array. Claude reads the full history and naturally chains multi-step reasoning.&lt;/p&gt;

&lt;p&gt;What about stopping? MoonieCode has &lt;code&gt;maxIterations = 30&lt;/code&gt;. If Claude chains 30 tool calls without giving a final answer, the program pulls the plug. It's a safety fuse that keeps the agent from spinning its wheels forever.&lt;/p&gt;

&lt;p&gt;Of course, the real Claude Code is a different beast. Public information suggests its repo weighs in at over half a million lines of TypeScript. It doesn't use a crude 30-iteration cap, it runs a dynamic token budget system. It dispatches sub-agents to handle different tasks in parallel. It asks for confirmation before doing anything dangerous. It supports checkpointing so you can roll back when things explode. It speaks MCP to plug into external data sources. MoonieCode is roughly three orders of magnitude away from the real thing.&lt;/p&gt;

&lt;p&gt;And yet. No matter how many layers of engineering get piled on top, the skeleton underneath is the same loop: ask the LLM, check what it wants, execute on its behalf, feed the result back in. That's what MoonieCode strips bare and shows you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Doing Claude's Dirty Work
&lt;/h2&gt;

&lt;p&gt;Claude says "I want to run &lt;code&gt;find&lt;/code&gt;." That intent arrives as a JSON blob. Who turns it into an actual system call? &lt;code&gt;ToolExecutor&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;MoonieCode gives Claude three weapons: Read, Write, and Bash. When a tool call comes in, &lt;code&gt;ToolExecutor::execute&lt;/code&gt; checks the &lt;code&gt;name&lt;/code&gt; field and routes it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;ToolExecutor&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;ToolCall&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"Read"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle_read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"Write"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"Bash"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;handle_bash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;runtime_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unknown tool: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. A plain if-else chain mapping an LLM's "intent" to local C++ functions. No reflection. No plugin registry. No factory pattern. A 393-line project doesn't need design patterns.&lt;/p&gt;

&lt;p&gt;Of the three tools, Bash is the star because it hands Claude the nuclear launch codes, it can run literally any command. Read and Write could technically be emulated with Bash (read with &lt;code&gt;cat&lt;/code&gt;, write with &lt;code&gt;tee&lt;/code&gt;), but they got their own tools because file I/O is so frequent it'd be wasteful, and error-prone, to channel it all through a shell.&lt;/p&gt;

&lt;p&gt;Here's what's inside Bash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;ToolExecutor&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;handle_bash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;nlohmann&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;full_cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;command&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;" 2&amp;gt;&amp;amp;1"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// capture stderr too&lt;/span&gt;

    &lt;span class="kt"&gt;FILE&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;popen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c_str&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;char&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;{};&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;bytes_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;bytes_read&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;bytes_read&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pclose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;exit_code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;WIFEXITED&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="n"&gt;WEXITSTATUS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;[exit code: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;to_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"]"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull the &lt;code&gt;command&lt;/code&gt; field out of the JSON, tack on &lt;code&gt;2&amp;gt;&amp;amp;1&lt;/code&gt; to swallow stderr too, &lt;code&gt;popen&lt;/code&gt; it, loop &lt;code&gt;fread&lt;/code&gt; until the pipe runs dry, &lt;code&gt;pclose&lt;/code&gt; to clean up and grab the exit code, then mash stdout, stderr, and exit code into one string and toss it back.&lt;/p&gt;

&lt;p&gt;Where does that string go? Right back into the &lt;code&gt;messages&lt;/code&gt; array, wearing the &lt;code&gt;role: "tool"&lt;/code&gt; badge. Next time Claude gets a request, it reads that message and knows exactly what happened when the command ran. Loop this, and Claude starts to feel like a pilot in a cockpit: the dashboard (&lt;code&gt;messages&lt;/code&gt;) shows current state, the joystick (tools) lets it take action.&lt;/p&gt;

&lt;p&gt;Read and Write follow the exact same formula: yank parameters from JSON, do local I/O, return a result string. Read uses &lt;code&gt;ifstream&lt;/code&gt; to slurp files whole. Write uses &lt;code&gt;ofstream&lt;/code&gt; and auto-creates parent directories with &lt;code&gt;create_directories&lt;/code&gt;. So clean there's not much else to say.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 393 Lines Actually Mean
&lt;/h2&gt;

&lt;p&gt;The real Claude Code is reportedly over half a million lines of TypeScript. It has sub-agent dispatching, permission gatekeeping, checkpoint rollback, MCP multi-protocol adaptation, multi-model routing, context window compression, and a long list of features you won't find anywhere in MoonieCode. In terms of capabilities, MoonieCode isn't even a rounding error.&lt;/p&gt;

&lt;p&gt;But here's the counterintuitive part: no matter how much engineering gets layered on, the agent loop at the center is the same one. Ask the LLM, receive tool calls, execute locally, feed results back. Those four steps are the Newton's laws of this space. Everything else is engineering.&lt;/p&gt;

&lt;p&gt;MoonieCode's 393 lines don't have the right to be compared to Claude Code on features. But they do one thing well: they strip the agent skeleton down to the bone, rip off every layer of engineering skin, and let you stare directly at the heartbeat of an AI coding assistant. Once you've internalized those 393 lines, every AI coding tool you encounter will auto-decompile in your head into "okay, the permissions system is on top, sub-agent scheduling underneath, and at the very bottom... still a while loop."&lt;/p&gt;

</description>
      <category>agents</category>
      <category>claude</category>
      <category>cpp</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Building BitTorrent from Scratch: What 2500 Lines of Modern C++ Can Do</title>
      <dc:creator>Tyler Tan</dc:creator>
      <pubDate>Mon, 18 May 2026 15:51:50 +0000</pubDate>
      <link>https://dev.to/tyler_tan_13b1f742020d35a/building-bittorrent-from-scratch-what-2500-lines-of-modern-c-can-do-3hhn</link>
      <guid>https://dev.to/tyler_tan_13b1f742020d35a/building-bittorrent-from-scratch-what-2500-lines-of-modern-c-can-do-3hhn</guid>
      <description>&lt;p&gt;A working BitTorrent downloader — from raw TCP sockets to SHA-1 hashing, all written by hand.&lt;/p&gt;

&lt;p&gt;This project starts at the socket level: I wrote my own SHA-1, hand-rolled HTTP requests, implemented bencoding from scratch, defined all seven peer wire protocol message types one by one, and finally spawned multiple peer connections with std::jthread for parallel downloading. It supports both .torrent files and magnet links, and comes with 83 unit tests. Apart from a JSON formatting library and the test framework, it has zero external dependencies.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Tenaryo/TinyBitTorrent" rel="noopener noreferrer"&gt;TinyBitTorrent&lt;/a&gt;, built with C++23.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BitTorrent Is
&lt;/h2&gt;

&lt;p&gt;Before diving into the implementation, let's take a minute to understand what BitTorrent actually does.&lt;/p&gt;

&lt;p&gt;The traditional file download model is straightforward: you click a link, your browser sends an HTTP request, and the server pushes the file to you. The bottleneck is equally straightforward — all the bandwidth pressure sits on a single server. More users means slower speeds, and if the server goes down, the file is gone.&lt;/p&gt;

&lt;p&gt;BitTorrent turns this model on its head by making every downloader an uploader at the same time. A file is split into many small chunks called pieces, each with its own SHA-1 hash. Instead of downloading all pieces from one central server, you grab a few from each of dozens — or hundreds — of peers who are also downloading, or have already finished. Meanwhile, the pieces you already have can be uploaded to other peers. Paradoxically, the more people participate, the faster the entire distribution network becomes.&lt;/p&gt;

&lt;p&gt;To implement this protocol, the first problem to solve is: how do you encode and transmit data and metadata? BitTorrent uses a format called bencoding — simple, compact, and unambiguous. Let's start there.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bencoding: BitTorrent's JSON
&lt;/h2&gt;

&lt;p&gt;Bencoding is BitTorrent's native serialization format. You can think of it as JSON's binary cousin. Where JSON uses curly braces and square brackets to mark structure, bencoding uses type prefixes and length prefixes. There are only four types.&lt;/p&gt;

&lt;p&gt;The first is the string, formatted as &lt;code&gt;length:content&lt;/code&gt;. For example, &lt;code&gt;4:spam&lt;/code&gt; means the string "spam", and &lt;code&gt;11:hello world&lt;/code&gt; means "hello world". The number before the colon must be a decimal integer with no leading zeros.&lt;/p&gt;

&lt;p&gt;The second type is the integer, wrapped in &lt;code&gt;i&lt;/code&gt; and &lt;code&gt;e&lt;/code&gt;. So &lt;code&gt;i42e&lt;/code&gt; is 42, and &lt;code&gt;i-3e&lt;/code&gt; is -3. Leading zeros are forbidden, and &lt;code&gt;i-0e&lt;/code&gt; is not allowed either.&lt;/p&gt;

&lt;p&gt;The third type is the list, wrapped in &lt;code&gt;l&lt;/code&gt; and &lt;code&gt;e&lt;/code&gt;, containing any number of bencoded values. For instance, &lt;code&gt;l4:spami42ee&lt;/code&gt; is a list with the string "spam" and the integer 42. Lists can nest other lists and dictionaries.&lt;/p&gt;

&lt;p&gt;The fourth type is the dictionary, wrapped in &lt;code&gt;d&lt;/code&gt; and &lt;code&gt;e&lt;/code&gt;, with keys and values alternating. Keys must be strings; values can be any type. Something like &lt;code&gt;d3:foo3:bar4:infod6:lengthi1024eee&lt;/code&gt; represents &lt;code&gt;{"foo": "bar", "info": {"length": 1024}}&lt;/code&gt;. Dictionary keys must be sorted in lexicographic order when encoding — the protocol explicitly requires this.&lt;/p&gt;

&lt;p&gt;At this point you might wonder — why not just use JSON? Two reasons. First, JSON can't directly represent binary data like SHA-1 hashes without Base64 encoding, which is costly. Second, bencoding is extremely simple to parse — no quote escaping, no Unicode handling, none of the complexity a JSON parser has to deal with. For BitTorrent in 2001, a format with zero library dependencies was the right call.&lt;/p&gt;

&lt;p&gt;My implementation uses std::variant as the data model. Each of the four types is a struct, all wrapped together in a variant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;int64_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;items_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Dict&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;items_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's an interesting circular dependency here: the definition of Value uses List and Dict, and both List and Dict contain Value. Strictly speaking, this is an incomplete type issue in C++, but std::variant and std::vector implementations since C++17 actually support this recursive pattern in practice, so the compiler lets it through. It's the cleanest way to write it, so that's what I went with.&lt;/p&gt;

&lt;p&gt;The parser is a recursive descent design that takes a mutable string_view reference and dispatches on the first character:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string_view&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Value&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="sc"&gt;'0'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="sc"&gt;'9'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;likely&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;colon&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;':'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="cm"&gt;/* parse int from data[0..colon) */&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remove_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;colon&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;)};&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remove_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'i'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;unlikely&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* parse integer... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'l'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;unlikely&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* parse list... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sc"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;unlikely&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* parse dict... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the first character is a digit, we enter the string branch (the most common case, marked &lt;code&gt;[[likely]]&lt;/code&gt;); &lt;code&gt;i&lt;/code&gt; means integer, and so on. The encoder runs in reverse, using std::format_to to assemble the prefix strings, with dict keys sorted via std::ranges::sort before encoding.&lt;/p&gt;

&lt;h2&gt;
  
  
  .torrent Files: the Download Shopping List
&lt;/h2&gt;

&lt;p&gt;With bencoding in place, parsing .torrent files is the natural next step. So why do we even need a torrent file? The answer is simple: to download something, you have to know what it is, how big it is, and where to find people who have it. A .torrent file is exactly that shopping list — it tells you the file size, how many pieces it's split into, the hash of each piece, and the tracker URL for finding peers.&lt;/p&gt;

&lt;p&gt;A .torrent file is essentially a single bencoded dictionary. At the top level there are two critical keys: &lt;code&gt;announce&lt;/code&gt;, which is the tracker URL, and &lt;code&gt;info&lt;/code&gt;, a sub-dictionary containing everything directly related to the download — &lt;code&gt;length&lt;/code&gt; (total file size in bytes), &lt;code&gt;piece length&lt;/code&gt; (the size of each piece, typically 256 KB to 1 MB), and &lt;code&gt;pieces&lt;/code&gt; (a long string of all 20-byte SHA-1 hashes concatenated together).&lt;/p&gt;

&lt;p&gt;My Metainfo struct captures exactly these six fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Metainfo&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;announce_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;              &lt;span class="c1"&gt;// tracker URL&lt;/span&gt;
    &lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;length_&lt;/span&gt;&lt;span class="p"&gt;{};&lt;/span&gt;                  &lt;span class="c1"&gt;// total file size&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;info_hash_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;// 20-byte raw SHA1&lt;/span&gt;
    &lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;piece_length_&lt;/span&gt;&lt;span class="p"&gt;{};&lt;/span&gt;            &lt;span class="c1"&gt;// size of each piece&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;piece_hashes_&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// hex hashes per piece&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Parsing is a two-level iteration. First pass over the top-level dict grabs announce and info; second pass over the info sub-dict extracts length, piece length, and pieces. The pieces field needs a bit of special handling — the raw data is every 20-byte SHA-1 hash concatenated end-to-end. I slice it into 20-byte chunks and convert each one into a 40-character hex string for storage.&lt;/p&gt;

&lt;p&gt;The most noteworthy step is computing the info_hash. This isn't just any hash — you re-bencode the entire info dictionary, then compute SHA-1 over the encoded result. Think of it as taking a "fingerprint" of the info dict. Everything downstream — tracker requests, peer handshakes — identifies the file by this fingerprint. The info_hash is the file's universal identity card in the BitTorrent world.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;util&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Sha1&lt;/span&gt; &lt;span class="n"&gt;hasher&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="n"&gt;hasher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bencode&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;info&lt;/span&gt;&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="n"&gt;info_hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hasher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finalize&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a side note, there's also a from_info_dict function that reconstructs a Metainfo from an info dictionary obtained through the ut_metadata extension protocol. This comes into play with magnet link downloads, which I'll cover later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding Peers, Shaking Hands, Downloading
&lt;/h2&gt;

&lt;p&gt;Once you have the metadata from a .torrent file, the first order of business is finding who has your data. That's the tracker's job.&lt;/p&gt;

&lt;p&gt;A tracker is essentially an HTTP service. You send it your info_hash and your peer_id (a 20-byte random string that identifies you), and it returns a list of peers currently downloading or seeding that file. I construct an HTTP GET request with the info_hash, peer_id, port number, and download progress as URL parameters, appending &lt;code&gt;compact=1&lt;/code&gt; at the end. compact=1 means "give me the peer list in compact form" — 6 bytes per peer: 4 for the IP address and 2 for the port. This keeps the tracker response tiny; even dozens of peers fit in a few hundred bytes. After parsing the response, I split the peers field into 6-byte chunks, extract the IP and port from each, and the peer list is ready.&lt;/p&gt;

&lt;p&gt;With a peer's IP and port in hand, the next step is to open a TCP connection and perform the BitTorrent handshake. The handshake packet is a neat 68 bytes, each segment with a clear purpose. Byte 1 is the protocol string length (always 19). The next 19 bytes are "BitTorrent protocol". Then 8 reserved bytes (where bit 4 of byte 26, if set, signals extension protocol support). Then 20 bytes of info_hash. Finally, 20 bytes of peer_id. The peer responds with an identically formatted packet; I verify the protocol string and info_hash match, and the handshake is done. The code for this is shorter than the description:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="nf"&gt;make_handshake&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string_view&lt;/span&gt; &lt;span class="n"&gt;info_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;string_view&lt;/span&gt; &lt;span class="n"&gt;peer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;reserve_extensions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;string&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;68&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sc"&gt;'\0'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"BitTorrent protocol"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;reserve_extensions&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="sc"&gt;'\x10'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="sc"&gt;'\x00'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info_hash&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;peer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the handshake, both sides enter a simple state machine. The peer first sends a bitfield message — a bitmap where each bit indicates whether the peer has the corresponding piece. After inspecting the bitfield, I send an interested message, essentially saying "I'd like to download from you." Then I wait for an unchoke message. Only after receiving it am I officially granted permission to request data. Choke and unchoke form BitTorrent's flow control mechanism; a peer can choke you at any time to deny transfers, though in practice most peers unchoke right after receiving an interested message.&lt;/p&gt;

&lt;p&gt;The logic for actually downloading a piece is the most interesting part of the entire project. A piece can be several megabytes; you can't just request it all at once — that would be slow, and the retransmission cost after packet loss would be punishing. BitTorrent's approach is to split each piece into 16 KB blocks, sending a separate Request message for each block with the piece index, the block's offset within the piece, and its length. But waiting for one block to arrive before requesting the next wastes network bandwidth. The better approach is pipelining: keep up to 5 requests in flight at all times. Whenever a Piece message arrives, I copy the data into the piece buffer at the correct offset and immediately send a new Request to fill the freed slot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Fill the pipeline: send up to 5 block requests at once&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;send_idx&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;total_blocks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;piece_index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;send_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;begin_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;blocks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;send_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;length_&lt;/span&gt;&lt;span class="p"&gt;}));&lt;/span&gt;
    &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;send_idx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Event loop: receive Pieces, fill buffer, replenish requests&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;total_blocks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;recv_message&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;visit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Overloaded&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;Piece&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;pce&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pce&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;piece_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;pce&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;begin_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;received&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;send_idx&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;total_blocks&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* send next request */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;[](&lt;/span&gt;&lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="k"&gt;auto&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;  &lt;span class="c1"&gt;// ignore other message types&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// After all blocks arrive, verify SHA-1&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sha1_hex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;piece_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;expected_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="p"&gt;...;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once all blocks are in, I run SHA-1 over the assembled piece and compare it against the hash recorded in the .torrent file. A match means the piece is good. A mismatch means something went wrong in transit or the peer gave us bad data, so we throw an exception.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multithreading: Full-Speed Download
&lt;/h2&gt;

&lt;p&gt;If you can download one piece, you can download the entire file. The logic connecting these two concepts is surprisingly straightforward.&lt;/p&gt;

&lt;p&gt;First, I pre-allocate the output file to its final size with ftruncate. Think of this as "reserving your spot" on disk — the file already occupies its full footprint, and each piece's data just gets written to its correct offset with pwrite. No need to accumulate a file-sized buffer in memory.&lt;/p&gt;

&lt;p&gt;Then comes the multithreading. I spawn one std::jthread worker per peer, each responsible for a contiguous range of pieces. Within a thread, a single TCP connection is established and reused for all pieces in that worker's range (saving handshake overhead). Across threads, everything runs in parallel, each talking to a different peer. The core logic is clean enough to fit in a handful of lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pre-allocate the file&lt;/span&gt;
&lt;span class="n"&gt;ftruncate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metainfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;length_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;num_workers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;emplace_back&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;peer_idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;establish_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;peers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;peer_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;ip_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;peers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;peer_idx&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;port_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...);&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;download_piece_on_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metainfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;pwrite&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                   &lt;span class="n"&gt;pi&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;metainfo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;piece_length_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Error handling is taken care of too. If any worker throws, I capture the first exception with std::exception_ptr behind a mutex, and rethrow it after all threads have joined. This ensures a single failure doesn't crash the whole process before other threads have a chance to clean up their resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Magnet Links: Throw Away the Torrent File
&lt;/h2&gt;

&lt;p&gt;The .torrent file path is done, but there's another — arguably more common — way to start a download: magnet links. You've definitely seen something like this: &lt;code&gt;magnet:?xt=urn:btih:abc123...&amp;amp;dn=filename&amp;amp;tr=tracker_url&lt;/code&gt;. At its core, it's just a URL embedding the file's info_hash, a suggested display name, and one or more tracker addresses.&lt;/p&gt;

&lt;p&gt;Why magnet links? A .torrent file may be small, but it's still a file — you have to get it from a website, a forum, or some other channel first. A magnet link is just a string. Sharing a link is infinitely more convenient than sharing a file. For the BitTorrent network, magnet links are also more decentralized: even if every torrent index site goes down, as long as someone is still seeding, pasting a link is enough to start downloading.&lt;/p&gt;

&lt;p&gt;The full magnet download flow adds one critical step compared to the .torrent path: since you don't have a torrent file, you have no idea how big the file is or what its piece hashes are — you have to ask a peer for this information. The rough flow goes like this: parse the magnet link to extract the info_hash and tracker URL, query the tracker for a peer list, establish a TCP connection and perform the base handshake, then use the extension protocol to request the info dictionary from the peer. Once the info_dict passes verification, the rest is exactly the same as the .torrent path — download all the pieces as usual.&lt;/p&gt;

&lt;p&gt;Parsing the magnet link itself is straightforward string processing: check for the &lt;code&gt;magnet:?&lt;/code&gt; prefix, find the 40-character hex hash after &lt;code&gt;xt=urn:btih:&lt;/code&gt; and convert it to 20 raw bytes, locate the tracker URL after &lt;code&gt;tr=&lt;/code&gt; and URL-decode it. Compared to bencoding, this is about as hard as drinking a glass of water.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Extension Handshake
&lt;/h2&gt;

&lt;p&gt;The core challenge of magnet link downloads is "without a torrent file, how do I know what to download?" BitTorrent's answer is the ut_metadata extension defined in BEP 9, which allows peers to exchange torrent info dictionaries. But to use ut_metadata, you first need to complete the extension handshake defined in BEP 10.&lt;/p&gt;

&lt;p&gt;The extension handshake is an extra round of negotiation that happens immediately after the standard handshake. First, bit 4 of byte 26 (the 5th byte of the reserved field) in my handshake packet is set to 1 — this flag tells the peer "I speak the extension protocol." If the peer also supports it, it will set the same bit in its handshake response.&lt;/p&gt;

&lt;p&gt;Right after the handshake, I send an extension handshake message. This message has type Extended with message ID 0 (by convention, ID 0 is always the extension handshake), and its payload is the bencoded dictionary &lt;code&gt;{"m": {"ut_metadata": 1}}&lt;/code&gt;. This says "I want to use the ut_metadata extension, and I'll call it ID 1." The peer responds with a similarly structured dictionary &lt;code&gt;{"m": {"ut_metadata": N}}&lt;/code&gt;, telling me what message ID it has assigned to ut_metadata — it might be 1, 2, or some other number. From this point on, all ut_metadata messages use that ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// After the standard handshake, check if the peer supports extensions&lt;/span&gt;
&lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;has_ext&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hs_buf&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;has_ext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Send extension handshake: {"m": {"ut_metadata": 1}}&lt;/span&gt;
    &lt;span class="n"&gt;Dict&lt;/span&gt; &lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="s"&gt;"m"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt;&lt;span class="s"&gt;"ut_metadata"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}}}}};&lt;/span&gt;
    &lt;span class="n"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Extended&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bencode&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ext&lt;/span&gt;&lt;span class="p"&gt;})}));&lt;/span&gt;

    &lt;span class="c1"&gt;// Parse the response to get the peer's ut_metadata message ID&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;recv_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;metadata_ext_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse_ext_handshake_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this step completes, I know exactly which message ID to use when requesting metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Asking a Peer for Metadata
&lt;/h2&gt;

&lt;p&gt;With the peer's ut_metadata message ID in hand, requesting metadata means constructing the bencoded dictionary &lt;code&gt;{"msg_type": 0, "piece": 0}&lt;/code&gt; and sending it as an Extended message. msg_type=0 means "this is a request," and piece=0 means "give me chunk 0 of the metadata." (The ut_metadata protocol splits the info dictionary into 16 KB chunks for transmission; the overwhelming majority of torrents have an info dict that fits in a single chunk, so piece=0 is all you need.)&lt;/p&gt;

&lt;p&gt;The peer responds with an Extended message whose payload is &lt;code&gt;{"msg_type": 1, "piece": 0, "total_size": N, ...info dict bencoded data appended at the end...}&lt;/code&gt;. msg_type=1 means this is a response, and total_size tells me how many bytes the info dictionary's bencoded form takes. The key operation is extracting the last total_size bytes from the payload — that's the complete bencoded info dictionary.&lt;/p&gt;

&lt;p&gt;Once I have info_bencode, I do two things. First, bencode-decode it and feed it into from_info_dict to reconstruct a Metainfo — now I have piece_hashes, length, and piece_length, everything I need. Second, and this is the critical part, I compute SHA-1 over info_bencode and compare it against the info_hash from the magnet link. If they don't match, the peer gave me bogus data — throw an exception, try a different peer. This is "trust but verify"; the entire BitTorrent protocol's security rests on hash verification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Request metadata&lt;/span&gt;
&lt;span class="n"&gt;send_metadata_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata_ext_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Receive the response&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;recv_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sock&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;info_bencode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse_metadata_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;payload_&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Parse the info dict&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;info_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bencode&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info_bencode&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;metainfo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;from_info_dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info_dict&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Verify: info_hash must match the magnet link&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sha1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;info_bencode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;info_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="p"&gt;...;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once verification passes, the path forward is identical to the .torrent download: use the Metainfo to query the tracker for a peer list, spawn multiple threads for parallel piece download, and pwrite everything to disk. Magnet links and .torrent files converge on the same destination.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;From raw TCP sockets to bencoding, from .torrent parsing and tracker communication to the peer wire protocol's handshake and block pipelining, from multithreaded parallel download to magnet links and extension protocols — bit by bit, a working BitTorrent client came together. Around 2500 lines of source code, just under 3500 including tests and build configuration.&lt;/p&gt;

&lt;p&gt;The biggest takeaway from this project is that the best way to understand a protocol or a system is to implement it yourself. The BitTorrent protocol specification is only a handful of pages. But there's an ocean of difference between calling someone else's library and filling every byte of a socket buffer by hand, cross-referencing BEP documents to figure out why the peer won't send an unchoke.&lt;/p&gt;

&lt;p&gt;Of course, this implementation is aggressively minimal. No seeding (download-only), no DHT for decentralized peer discovery (fully tracker-dependent), no UDP tracker support (HTTP only), no rarest-first piece selection (just sequential assignment), no PEX peer exchange, and no end-game mode. These are the clear dividing lines between a production-grade BitTorrent client and a "learning wheel." As a practical tool, it doesn't hold a candle to qBittorrent or Transmission. As a learning exercise, it did everything I wanted it to do.&lt;/p&gt;

&lt;p&gt;If the project interests you, the code is at &lt;a href="https://github.com/Tenaryo/TinyBitTorrent" rel="noopener noreferrer"&gt;https://github.com/Tenaryo/TinyBitTorrent&lt;/a&gt;. Feedback and drive-by comments welcome.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>networking</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
