<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shrijith Venkatramana</title>
    <description>The latest articles on DEV Community by Shrijith Venkatramana (@shrsv).</description>
    <link>https://dev.to/shrsv</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1001514%2F17b7d334-44b1-417a-9268-346e6a34988a.jpg</url>
      <title>DEV Community: Shrijith Venkatramana</title>
      <link>https://dev.to/shrsv</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shrsv"/>
    <language>en</language>
    <item>
      <title>FlashAttention Explained: The Optimization That Made Modern LLMs Practical</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Thu, 11 Jun 2026 17:32:44 +0000</pubDate>
      <link>https://dev.to/shrsv/flashattention-explained-the-optimization-that-made-modern-llms-practical-2ik7</link>
      <guid>https://dev.to/shrsv/flashattention-explained-the-optimization-that-made-modern-llms-practical-2ik7</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Large language models keep getting bigger.&lt;/p&gt;

&lt;p&gt;Context windows have grown from a few thousand tokens to hundreds of thousands, and some models now advertise context lengths measured in millions of tokens.&lt;/p&gt;

&lt;p&gt;Yet for years, one part of the Transformer threatened to become the bottleneck:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Attention.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not because it required too much math.&lt;/p&gt;

&lt;p&gt;Because it moved too much data.&lt;/p&gt;

&lt;p&gt;FlashAttention is one of the most important optimizations in modern AI infrastructure because it attacks exactly that problem. It doesn't change the Transformer architecture, approximate attention, or introduce a new model design. Instead, it rethinks how attention is executed on GPUs.&lt;/p&gt;

&lt;p&gt;The result is dramatically lower memory usage, faster training, faster inference, and practical long-context models.&lt;/p&gt;

&lt;p&gt;Let's see how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Real Cost of Attention
&lt;/h2&gt;

&lt;p&gt;Every Transformer layer computes attention using three matrices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query (Q)&lt;/li&gt;
&lt;li&gt;Key (K)&lt;/li&gt;
&lt;li&gt;Value (V)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The standard attention equation is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attention(Q,K,V) = softmax(QK^T / sqrt(d))V
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Conceptually, every token compares itself against every other token.&lt;/p&gt;

&lt;p&gt;For a sequence of N tokens, the attention score matrix contains:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;N x N
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;elements.&lt;/p&gt;

&lt;p&gt;A sequence length of 16,384 tokens requires:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;16,384 x 16,384 = 268 million
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;attention scores.&lt;/p&gt;

&lt;p&gt;The first instinct is usually:&lt;/p&gt;

&lt;p&gt;"That's a lot of computation."&lt;/p&gt;

&lt;p&gt;But on modern GPUs, the bigger problem is often memory traffic.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. GPUs Are Often Memory-Bound, Not Compute-Bound
&lt;/h2&gt;

&lt;p&gt;Modern GPUs can perform enormous numbers of floating-point operations per second.&lt;/p&gt;

&lt;p&gt;What they cannot do nearly as efficiently is move massive amounts of data between memory levels.&lt;/p&gt;

&lt;p&gt;A simplified GPU memory hierarchy looks like:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HBM (GPU memory)
    |
L2 Cache
    |
Shared Memory
    |
Registers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The farther away the data is from the compute units, the more expensive it is to access.&lt;/p&gt;

&lt;p&gt;Traditional attention repeatedly writes and reads the full attention matrix from HBM.&lt;/p&gt;

&lt;p&gt;The workflow looks roughly like this:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Load Q
Load K

Compute QK^T

Write attention matrix to memory

Read attention matrix

Apply softmax

Write result

Read result

Multiply by V

Write output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A huge amount of time is spent moving data rather than performing useful computation.&lt;/p&gt;

&lt;p&gt;This is the problem FlashAttention solves.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. The Core Insight: Never Materialize the Attention Matrix
&lt;/h2&gt;

&lt;p&gt;The key observation behind FlashAttention is surprisingly simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You don't actually need to store the full attention matrix.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Traditional attention explicitly creates:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QK^T
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;and writes it to memory.&lt;/p&gt;

&lt;p&gt;FlashAttention never does.&lt;/p&gt;

&lt;p&gt;Instead, it processes attention in small blocks.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Q Block 1 x K Block 1
Q Block 1 x K Block 2
Q Block 1 x K Block 3
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each block is loaded, processed, contributes to the final result, and is discarded.&lt;/p&gt;

&lt;p&gt;The gigantic attention matrix never exists in memory.&lt;/p&gt;

&lt;p&gt;This immediately removes one of the largest memory bottlenecks in Transformer execution.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Why This Is Harder Than It Sounds
&lt;/h2&gt;

&lt;p&gt;At first glance, block-wise processing seems impossible.&lt;/p&gt;

&lt;p&gt;The softmax operation requires information from an entire row.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;softmax(x_i) = exp(x_i) / sum(exp(x_j))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;To compute a single probability, you need a denominator that depends on all scores.&lt;/p&gt;

&lt;p&gt;If only one block is visible at a time, how can softmax be computed correctly?&lt;/p&gt;

&lt;p&gt;This is where FlashAttention becomes clever.&lt;/p&gt;

&lt;p&gt;Instead of storing all scores, it maintains running statistics while processing blocks.&lt;/p&gt;

&lt;p&gt;These include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Running maximum&lt;/li&gt;
&lt;li&gt;Running normalization term&lt;/li&gt;
&lt;li&gt;Running output accumulator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As each block arrives, these values are updated.&lt;/p&gt;

&lt;p&gt;When all blocks have been processed, the result is mathematically identical to standard attention.&lt;/p&gt;

&lt;p&gt;Not approximate.&lt;/p&gt;

&lt;p&gt;Not close.&lt;/p&gt;

&lt;p&gt;Exactly identical.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Online Softmax: The Trick That Makes It Work
&lt;/h2&gt;

&lt;p&gt;The breakthrough behind FlashAttention is often called &lt;strong&gt;Online Softmax&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Imagine attention scores arriving in chunks:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Block A
Block B
Block C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A naive implementation would need every score before computing softmax.&lt;/p&gt;

&lt;p&gt;Online Softmax instead maintains enough information to update the result incrementally.&lt;/p&gt;

&lt;p&gt;The algorithm keeps track of:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Current maximum score
Current normalization factor
Current weighted output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;When a new block arrives:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Update maximum

Rescale previous values

Accumulate new contributions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This allows FlashAttention to process attention as a stream rather than as a giant matrix.&lt;/p&gt;

&lt;p&gt;The memory savings are enormous.&lt;/p&gt;

&lt;p&gt;More importantly, the final output is identical to what standard attention would have produced.&lt;/p&gt;
&lt;h2&gt;
  
  
  6. Tiling: Making GPUs Happy
&lt;/h2&gt;

&lt;p&gt;FlashAttention is often described as an &lt;strong&gt;IO-aware algorithm&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;IO-aware means the algorithm is designed around memory movement rather than purely around arithmetic complexity.&lt;/p&gt;

&lt;p&gt;The implementation uses tiling.&lt;/p&gt;

&lt;p&gt;Instead of operating on huge matrices, FlashAttention loads small chunks into fast on-chip memory:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Load tile into shared memory

Compute attention

Update softmax statistics

Accumulate output

Discard tile

Load next tile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Because shared memory is dramatically faster than HBM, the GPU spends much more time performing useful work.&lt;/p&gt;

&lt;p&gt;A useful mental model is:&lt;/p&gt;

&lt;p&gt;Traditional Attention:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory -&amp;gt; Compute -&amp;gt; Memory -&amp;gt; Compute -&amp;gt; Memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;FlashAttention:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Memory -&amp;gt; Compute -&amp;gt; Compute -&amp;gt; Compute
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The computation remains approximately O(N^2), but memory traffic is dramatically reduced.&lt;/p&gt;

&lt;p&gt;That's where most of the speedup comes from.&lt;/p&gt;
&lt;h2&gt;
  
  
  7. FlashAttention-2, FlashAttention-3, and Production Systems
&lt;/h2&gt;

&lt;p&gt;The original FlashAttention paper introduced the core idea.&lt;/p&gt;

&lt;p&gt;FlashAttention-2 improved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU utilization&lt;/li&gt;
&lt;li&gt;Parallelism&lt;/li&gt;
&lt;li&gt;Work partitioning&lt;/li&gt;
&lt;li&gt;Training throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result was significantly higher performance on modern accelerators.&lt;/p&gt;

&lt;p&gt;FlashAttention-3 pushed things further for newer NVIDIA Hopper GPUs and introduced support for modern low-precision formats such as FP8.&lt;/p&gt;

&lt;p&gt;Today, FlashAttention is used throughout the AI ecosystem.&lt;/p&gt;

&lt;p&gt;You'll find it in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PyTorch&lt;/li&gt;
&lt;li&gt;Hugging Face Transformers&lt;/li&gt;
&lt;li&gt;vLLM&lt;/li&gt;
&lt;li&gt;TensorRT-LLM&lt;/li&gt;
&lt;li&gt;Many open-weight LLMs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many developers, enabling FlashAttention can be as simple as selecting an optimized attention backend.&lt;/p&gt;

&lt;p&gt;The model architecture remains unchanged.&lt;/p&gt;

&lt;p&gt;The performance characteristics improve dramatically.&lt;/p&gt;
&lt;h2&gt;
  
  
  What FlashAttention Does Not Fix
&lt;/h2&gt;

&lt;p&gt;FlashAttention is powerful, but it doesn't magically eliminate all scaling problems.&lt;/p&gt;

&lt;p&gt;The computation is still approximately:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;O(N^2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;with respect to sequence length.&lt;/p&gt;

&lt;p&gt;Very long contexts still require substantial compute.&lt;/p&gt;

&lt;p&gt;FlashAttention primarily reduces memory traffic and memory footprint.&lt;/p&gt;

&lt;p&gt;It makes attention much more efficient, but it does not change the fundamental quadratic interaction pattern of standard attention.&lt;/p&gt;

&lt;p&gt;That's why researchers continue exploring alternatives such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sliding-window attention&lt;/li&gt;
&lt;li&gt;Linear attention&lt;/li&gt;
&lt;li&gt;State-space models&lt;/li&gt;
&lt;li&gt;Hybrid architectures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These approaches attempt to address the computational scaling problem itself rather than the memory movement problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;FlashAttention is one of the rare breakthroughs that became foundational almost immediately.&lt;/p&gt;

&lt;p&gt;It didn't replace Transformers.&lt;/p&gt;

&lt;p&gt;It didn't invent a new attention mechanism.&lt;/p&gt;

&lt;p&gt;It didn't require retraining models.&lt;/p&gt;

&lt;p&gt;Instead, it recognized that modern GPUs spend an enormous amount of time moving data around and redesigned attention to minimize that movement.&lt;/p&gt;

&lt;p&gt;By treating memory access as the bottleneck rather than arithmetic, FlashAttention transformed attention from a memory-heavy operation into a much more hardware-efficient one.&lt;/p&gt;

&lt;p&gt;Many of today's long-context LLMs would be significantly slower, more expensive, or simply impractical without it.&lt;/p&gt;

&lt;p&gt;As AI systems continue to scale, FlashAttention serves as an important reminder that sometimes the biggest breakthroughs don't come from changing the algorithm itself. They come from understanding how that algorithm interacts with real hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What do you think has been the most important infrastructure breakthrough for LLMs so far: FlashAttention, quantization, KV caching, speculative decoding, or something else entirely?&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.hi.md" rel="noopener noreferrer"&gt;🇮🇳 हिन्दी&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Mixture of Experts (MoE) Explained Simply: How Modern AI Models Get Bigger Without Getting Slower</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Wed, 10 Jun 2026 18:01:16 +0000</pubDate>
      <link>https://dev.to/shrsv/mixture-of-experts-moe-explained-simply-how-modern-ai-models-get-bigger-without-getting-slower-25mm</link>
      <guid>https://dev.to/shrsv/mixture-of-experts-moe-explained-simply-how-modern-ai-models-get-bigger-without-getting-slower-25mm</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Large language models keep getting larger.&lt;/p&gt;

&lt;p&gt;Hundreds of billions of parameters. Trillions of parameters. Yet somehow, many of these models remain surprisingly fast and affordable to run.&lt;/p&gt;

&lt;p&gt;How?&lt;/p&gt;

&lt;p&gt;The trick is that most modern frontier models don't use all of their parameters for every token.&lt;/p&gt;

&lt;p&gt;Instead, they use a technique called &lt;strong&gt;Mixture of Experts (MoE)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Think of it like replacing a single giant software service with a fleet of specialized microservices. Rather than every request hitting every service, a router decides which specialists should handle a particular request.&lt;/p&gt;

&lt;p&gt;That's the core idea behind MoE.&lt;/p&gt;

&lt;p&gt;Let's break down how it works, why it matters, and what challenges engineers face when running MoE models in production.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. The Scaling Problem
&lt;/h1&gt;

&lt;p&gt;Traditional transformer models are &lt;strong&gt;dense&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When a token enters a transformer layer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Attention runs.&lt;/li&gt;
&lt;li&gt;The feed-forward network (MLP) runs.&lt;/li&gt;
&lt;li&gt;Every parameter in that layer participates in computation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you double the model size, you roughly double the compute cost.&lt;/p&gt;

&lt;p&gt;This creates a painful tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More parameters → better quality&lt;/li&gt;
&lt;li&gt;More parameters → slower inference and more expensive training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Researchers wanted a way to increase model capacity without increasing computation proportionally.&lt;/p&gt;

&lt;p&gt;MoE emerged as one of the most successful solutions. Instead of activating every parameter, MoE activates only a small subset for each token.&lt;/p&gt;

&lt;h1&gt;
  
  
  2. The Restaurant Analogy
&lt;/h1&gt;

&lt;p&gt;Imagine a restaurant with eight specialists:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pizza chef&lt;/li&gt;
&lt;li&gt;Sushi chef&lt;/li&gt;
&lt;li&gt;Pastry chef&lt;/li&gt;
&lt;li&gt;Grill chef&lt;/li&gt;
&lt;li&gt;Salad chef&lt;/li&gt;
&lt;li&gt;Soup chef&lt;/li&gt;
&lt;li&gt;Pasta chef&lt;/li&gt;
&lt;li&gt;Dessert chef&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a customer orders pizza, there's no reason for all eight chefs to work on the order.&lt;/p&gt;

&lt;p&gt;The restaurant manager simply routes the request to the relevant specialists.&lt;/p&gt;

&lt;p&gt;MoE applies the same idea.&lt;/p&gt;

&lt;p&gt;Instead of one large neural network handling every token:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple expert networks exist&lt;/li&gt;
&lt;li&gt;A router chooses which experts should process each token&lt;/li&gt;
&lt;li&gt;Only selected experts perform computation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a model that can contain many more parameters than are actually used for any individual inference step.&lt;/p&gt;

&lt;h1&gt;
  
  
  3. What Actually Changes Inside a Transformer?
&lt;/h1&gt;

&lt;p&gt;One surprising fact about MoE:&lt;/p&gt;

&lt;p&gt;Most of the transformer remains unchanged.&lt;/p&gt;

&lt;p&gt;Typically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attention layers stay dense&lt;/li&gt;
&lt;li&gt;Embeddings stay dense&lt;/li&gt;
&lt;li&gt;Normalization layers stay dense&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The feed-forward network (MLP) is replaced by a collection of experts.&lt;/p&gt;

&lt;p&gt;A standard transformer block looks roughly like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input
  ↓
Attention
  ↓
Feed Forward Network
  ↓
Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;An MoE block becomes:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input
  ↓
Attention
  ↓
Router
  ↓
Selected Experts
  ↓
Combine Outputs
  ↓
Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each expert is often just another feed-forward network.&lt;/p&gt;

&lt;p&gt;Instead of one MLP, you may have:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Expert 1
Expert 2
Expert 3
...
Expert 64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The router decides which ones should handle each token.&lt;/p&gt;
&lt;h1&gt;
  
  
  4. How Routing Works
&lt;/h1&gt;

&lt;p&gt;The router is usually a lightweight neural network.&lt;/p&gt;

&lt;p&gt;For each token it produces scores:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Token: "database"

Expert 1: 0.05
Expert 2: 0.61
Expert 3: 0.09
Expert 4: 0.25
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The model then selects the top experts.&lt;/p&gt;
&lt;h3&gt;
  
  
  Top-2 Routing
&lt;/h3&gt;

&lt;p&gt;Historically, many MoE systems used Top-2 routing:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Selected:
Expert 2
Expert 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Both experts process the token.&lt;/p&gt;

&lt;p&gt;Their outputs are combined using the router probabilities as weights.&lt;/p&gt;
&lt;h3&gt;
  
  
  Switch Routing
&lt;/h3&gt;

&lt;p&gt;Later, Google's Switch Transformer simplified this further.&lt;/p&gt;

&lt;p&gt;Instead of selecting two experts:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Selected:
Expert 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Only one expert runs.&lt;/p&gt;

&lt;p&gt;This significantly reduces communication and inference overhead while preserving much of the benefit.&lt;/p&gt;
&lt;h1&gt;
  
  
  5. Why MoE Models Are So Efficient
&lt;/h1&gt;

&lt;p&gt;Let's compare two hypothetical models.&lt;/p&gt;
&lt;h3&gt;
  
  
  Dense Model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;100B parameters
100B active per token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  MoE Model
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;8 experts × 100B parameters
= 800B total parameters

Only 2 experts active
= 200B active parameters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The MoE model can have dramatically larger capacity while activating only a fraction of its weights during inference.&lt;/p&gt;

&lt;p&gt;This is often called &lt;strong&gt;conditional computation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Different inputs trigger different computation paths.&lt;/p&gt;

&lt;p&gt;The model effectively says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Not every problem requires every part of my brain."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is one reason MoE architectures became attractive for large-scale LLMs. They allow parameter counts to grow much faster than inference cost.&lt;/p&gt;
&lt;h1&gt;
  
  
  6. The Hidden Engineering Challenges
&lt;/h1&gt;

&lt;p&gt;The basic idea sounds simple.&lt;/p&gt;

&lt;p&gt;Production systems quickly reveal the hard parts.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 1: Expert Collapse
&lt;/h2&gt;

&lt;p&gt;Suppose the router learns:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;90% of tokens → Expert 7
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now Expert 7 receives almost all training.&lt;/p&gt;

&lt;p&gt;Other experts receive little data and become useless.&lt;/p&gt;

&lt;p&gt;Researchers combat this with load-balancing losses that encourage more even utilization.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 2: Distributed Communication
&lt;/h2&gt;

&lt;p&gt;Imagine:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GPU 1 → Experts 1-8
GPU 2 → Experts 9-16
GPU 3 → Experts 17-24
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A batch of tokens may need experts spread across multiple machines.&lt;/p&gt;

&lt;p&gt;Now inference becomes a networking problem.&lt;/p&gt;

&lt;p&gt;Token activations must be shuffled between devices before expert computation can occur.&lt;/p&gt;

&lt;p&gt;In many MoE deployments, communication becomes a significant bottleneck.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 3: Load Imbalance
&lt;/h2&gt;

&lt;p&gt;Real traffic isn't uniform.&lt;/p&gt;

&lt;p&gt;Some experts become hot.&lt;/p&gt;

&lt;p&gt;Others remain mostly idle.&lt;/p&gt;

&lt;p&gt;This creates GPU utilization problems similar to uneven request distribution in distributed systems.&lt;/p&gt;

&lt;p&gt;Modern routing approaches focus heavily on balancing expert workloads.&lt;/p&gt;
&lt;h2&gt;
  
  
  Challenge 4: Token Dropping
&lt;/h2&gt;

&lt;p&gt;Experts often have limited capacity.&lt;/p&gt;

&lt;p&gt;If too many tokens are routed to one expert:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Capacity: 1000 tokens
Incoming: 1500 tokens
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Some tokens may need to be rerouted or dropped.&lt;/p&gt;

&lt;p&gt;Managing these overflow situations becomes part of production MoE serving infrastructure.&lt;/p&gt;
&lt;h1&gt;
  
  
  7. What MoE Looks Like in Production
&lt;/h1&gt;

&lt;p&gt;For developers building AI systems, the practical implications are interesting.&lt;/p&gt;
&lt;h3&gt;
  
  
  Memory Footprint
&lt;/h3&gt;

&lt;p&gt;An MoE model may advertise:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;600B parameters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;But only a fraction are active for any token.&lt;/p&gt;

&lt;p&gt;Compute cost may resemble a much smaller dense model.&lt;/p&gt;
&lt;h3&gt;
  
  
  Inference Isn't Automatically Cheaper
&lt;/h3&gt;

&lt;p&gt;Many developers assume:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fewer active parameters
=
Lower latency
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Not always.&lt;/p&gt;

&lt;p&gt;Routing overhead, expert communication, and distributed synchronization can erase part of the theoretical gain.&lt;/p&gt;

&lt;p&gt;Serving MoE efficiently often requires specialized inference stacks.&lt;/p&gt;
&lt;h3&gt;
  
  
  Observability Matters
&lt;/h3&gt;

&lt;p&gt;Production teams increasingly monitor:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expert utilization&lt;/li&gt;
&lt;li&gt;Router entropy&lt;/li&gt;
&lt;li&gt;Token distribution&lt;/li&gt;
&lt;li&gt;Expert hot spots&lt;/li&gt;
&lt;li&gt;Cross-device traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An overloaded expert can become the AI equivalent of a hot database shard.&lt;/p&gt;
&lt;h3&gt;
  
  
  Routing Becomes Product Behavior
&lt;/h3&gt;

&lt;p&gt;Recent research suggests routing patterns can become task-specific.&lt;/p&gt;

&lt;p&gt;Different prompt categories often activate different expert combinations, meaning the routing system itself becomes part of the model's learned intelligence.&lt;/p&gt;
&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Mixture of Experts is one of the most important ideas behind modern large-scale AI systems.&lt;/p&gt;

&lt;p&gt;Instead of making every token pass through every parameter, MoE introduces specialization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Experts perform different computations&lt;/li&gt;
&lt;li&gt;Routers choose which experts to use&lt;/li&gt;
&lt;li&gt;Only a small subset activates per token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is a model that can grow dramatically in total capacity while keeping computation relatively manageable.&lt;/p&gt;

&lt;p&gt;For software engineers, MoE feels surprisingly familiar.&lt;/p&gt;

&lt;p&gt;It's essentially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service routing&lt;/li&gt;
&lt;li&gt;Load balancing&lt;/li&gt;
&lt;li&gt;Resource scheduling&lt;/li&gt;
&lt;li&gt;Distributed systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...implemented inside a neural network.&lt;/p&gt;

&lt;p&gt;As AI systems continue scaling, understanding MoE is becoming as important as understanding transformers themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; If you were designing an MoE model, would you optimize primarily for maximum model quality, or for predictable production latency and infrastructure simplicity? Why?&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.hi.md" rel="noopener noreferrer"&gt;🇮🇳 हिन्दी&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>beginners</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Speculative Decoding: How LLMs Generate Tokens Faster Without Changing the Answer</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Tue, 09 Jun 2026 17:01:40 +0000</pubDate>
      <link>https://dev.to/shrsv/speculative-decoding-how-llms-generate-tokens-faster-without-changing-the-answer-38i5</link>
      <guid>https://dev.to/shrsv/speculative-decoding-how-llms-generate-tokens-faster-without-changing-the-answer-38i5</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Large Language Models keep getting smarter.&lt;/p&gt;

&lt;p&gt;But there's a problem: users don't experience intelligence directly. They experience latency.&lt;/p&gt;

&lt;p&gt;If a model takes 30 seconds to write an answer instead of 3 seconds, most users won't care that it scored higher on some benchmark.&lt;/p&gt;

&lt;p&gt;This creates an interesting engineering challenge:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do we make LLMs generate text faster without making them worse?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of the most important techniques to emerge in recent years is &lt;strong&gt;speculative decoding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The idea sounds almost absurd at first:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if a small model could guess what a large model is about to say, and the large model simply verifies those guesses?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Surprisingly, that's exactly what happens.&lt;/p&gt;

&lt;p&gt;Let's see how it works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fundamental Bottleneck of LLM Inference
&lt;/h2&gt;

&lt;p&gt;To understand speculative decoding, we first need to understand why LLMs are slow.&lt;/p&gt;

&lt;p&gt;Imagine a model is generating:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The capital of France is Paris.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model doesn't generate the entire sentence at once.&lt;/p&gt;

&lt;p&gt;Instead it generates one token at a time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The
The capital
The capital of
The capital of France
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Each new token requires another forward pass through the model.&lt;/p&gt;

&lt;p&gt;For a large model with hundreds of billions of parameters, every token is expensive.&lt;/p&gt;

&lt;p&gt;This means generation is inherently sequential:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Token 1 → Token 2 → Token 3 → Token 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You can't generate token 4 until you know token 3.&lt;/p&gt;

&lt;p&gt;This sequential nature becomes one of the biggest sources of inference latency.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Intuition: Let a Smaller Model Predict Ahead
&lt;/h2&gt;

&lt;p&gt;Suppose you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A large model (expensive)&lt;/li&gt;
&lt;li&gt;A small model (cheap)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The small model is usually less accurate.&lt;/p&gt;

&lt;p&gt;But it's often correct about obvious next tokens.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;Prompt:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The capital of France is
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Small model prediction:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Paris
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Large model prediction:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Paris
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Both agree.&lt;/p&gt;

&lt;p&gt;Now imagine the small model predicts several tokens:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Paris, which is
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Instead of asking the large model for each token individually, we ask it to verify the entire sequence in one pass.&lt;/p&gt;

&lt;p&gt;If the predictions are correct, we've effectively skipped multiple expensive decoding steps.&lt;/p&gt;

&lt;p&gt;This is the core idea behind speculative decoding.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Simple Example
&lt;/h2&gt;

&lt;p&gt;Let's say our draft model predicts:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The weather today is sunny and warm.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Tokenized:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sunny
and
warm
.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The large model then evaluates these proposed tokens.&lt;/p&gt;

&lt;p&gt;Possible outcome:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Token&lt;/th&gt;
&lt;th&gt;Draft Model&lt;/th&gt;
&lt;th&gt;Large Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sunny&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;and&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;warm&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;.&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything matches.&lt;/p&gt;

&lt;p&gt;The large model accepts all four tokens.&lt;/p&gt;

&lt;p&gt;Instead of generating four separate tokens sequentially, we've effectively generated four tokens in a single verification step.&lt;/p&gt;

&lt;p&gt;That's a major latency reduction.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Happens When They Disagree?
&lt;/h2&gt;

&lt;p&gt;This is where the algorithm becomes interesting.&lt;/p&gt;

&lt;p&gt;Suppose the draft model predicts:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The weather today is rainy and cold.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The large model evaluates the proposal.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rainy   ✓
and     ✓
cold    ✗
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The large model agrees until "cold".&lt;/p&gt;

&lt;p&gt;At that point:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Accepted tokens are kept.&lt;/li&gt;
&lt;li&gt;Incorrect tokens are discarded.&lt;/li&gt;
&lt;li&gt;Generation resumes from the first disagreement.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The weather today is rainy and pleasant.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Only part of the speculation was useful.&lt;/p&gt;

&lt;p&gt;But even partial acceptance can significantly improve throughput.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why This Doesn't Change Model Quality
&lt;/h2&gt;

&lt;p&gt;A common misconception is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Aren't we replacing the big model with a smaller model?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;The large model remains the source of truth.&lt;/p&gt;

&lt;p&gt;The draft model merely proposes candidates.&lt;/p&gt;

&lt;p&gt;The verification process guarantees that the final output follows the same probability distribution as standard decoding.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Normal Decoding
---------------
Large Model → Token

Speculative Decoding
--------------------
Small Model → Proposed Tokens
Large Model → Verify Tokens
Accepted Tokens → Output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The final answer is still determined by the large model.&lt;/p&gt;

&lt;p&gt;The user gets the same quality, but faster.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Simplified Algorithm
&lt;/h2&gt;

&lt;p&gt;At a high level:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;finished&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;proposed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;small_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;verification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;large_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;proposed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;accepted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;longest_matching_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;proposed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;verification&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;accepted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;mismatch&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;large_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;next_token&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In practice, the real algorithm is more sophisticated because it must preserve exact sampling behavior.&lt;/p&gt;

&lt;p&gt;But this captures the overall workflow.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why It Works So Well
&lt;/h2&gt;

&lt;p&gt;Speculative decoding exploits an observation about language:&lt;/p&gt;

&lt;p&gt;Most tokens are predictable.&lt;/p&gt;

&lt;p&gt;Consider:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Once upon a
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Most models will predict:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Similarly:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thank you for your
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Likely:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;help
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Large models spend a surprising amount of compute confirming obvious continuations.&lt;/p&gt;

&lt;p&gt;A smaller model can often predict these easy regions accurately.&lt;/p&gt;

&lt;p&gt;The large model only needs to intervene when things become ambiguous.&lt;/p&gt;

&lt;p&gt;This creates a useful division of labor:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small Model&lt;/td&gt;
&lt;td&gt;Predict likely tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large Model&lt;/td&gt;
&lt;td&gt;Verify and correct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User&lt;/td&gt;
&lt;td&gt;Receives faster output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Modern Variants
&lt;/h2&gt;

&lt;p&gt;Research and production systems have extended the original idea in several directions.&lt;/p&gt;
&lt;h3&gt;
  
  
  Self-Speculative Decoding
&lt;/h3&gt;

&lt;p&gt;Instead of using two separate models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Early layers generate drafts&lt;/li&gt;
&lt;li&gt;Full model verifies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This avoids maintaining a second model entirely.&lt;/p&gt;
&lt;h3&gt;
  
  
  Multi-Token Prediction
&lt;/h3&gt;

&lt;p&gt;Some architectures are trained to predict multiple future tokens directly.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Predict token N+1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;They predict:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;N+1
N+2
N+3
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This increases opportunities for speculative execution.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tree-Based Speculation
&lt;/h3&gt;

&lt;p&gt;Rather than proposing a single sequence:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A → B → C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The draft model proposes multiple branches:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dot"&gt;&lt;code&gt;      &lt;span class="nv"&gt;B1&lt;/span&gt;
    &lt;span class="err"&gt;/&lt;/span&gt;
&lt;span class="nv"&gt;A&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;
    &lt;span class="err"&gt;\&lt;/span&gt;
      &lt;span class="nv"&gt;B2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The verifier can then select among several possible continuations.&lt;/p&gt;

&lt;p&gt;These approaches push throughput even further.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where You'll Encounter It
&lt;/h2&gt;

&lt;p&gt;Many developers use speculative decoding without realizing it.&lt;/p&gt;

&lt;p&gt;Modern inference systems frequently employ variants of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Server-side LLM inference platforms&lt;/li&gt;
&lt;li&gt;High-throughput API providers&lt;/li&gt;
&lt;li&gt;Optimized open-source inference engines&lt;/li&gt;
&lt;li&gt;Enterprise deployment stacks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whenever you see a large model streaming unusually quickly, there's a decent chance some form of speculative execution is happening behind the scenes.&lt;/p&gt;

&lt;p&gt;It's becoming one of the standard techniques for making frontier models economically viable at scale.&lt;/p&gt;
&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Speculative decoding is a beautiful example of an engineering idea that sounds counterintuitive but turns out to be remarkably effective.&lt;/p&gt;

&lt;p&gt;Instead of trying to make large models inherently faster, it asks a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if most of the work they're doing is already predictable?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By letting a smaller model make educated guesses and allowing the larger model to verify them, we can reduce latency dramatically while preserving output quality.&lt;/p&gt;

&lt;p&gt;As LLM deployment scales to millions of users and billions of generated tokens, techniques like speculative decoding are likely to matter just as much as advances in model architecture itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; If you were deploying a large LLM in production, would you prefer investing in a better model, a faster model, or inference optimizations like speculative decoding? Why?&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.hi.md" rel="noopener noreferrer"&gt;🇮🇳 हिन्दी&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Top Golang Mocking Libraries in 2026: A Practical Comparison</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Mon, 08 Jun 2026 18:34:47 +0000</pubDate>
      <link>https://dev.to/shrsv/the-top-golang-mocking-libraries-in-2026-a-practical-comparison-6kg</link>
      <guid>https://dev.to/shrsv/the-top-golang-mocking-libraries-in-2026-a-practical-comparison-6kg</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A few years ago, choosing a Go mocking framework was mostly a matter of personal preference.&lt;/p&gt;

&lt;p&gt;Today, things are different.&lt;/p&gt;

&lt;p&gt;Most Go developers have at least one AI coding assistant generating tests alongside them. Some teams even generate the majority of their unit tests automatically. Yet one area remains surprisingly messy: mocks.&lt;/p&gt;

&lt;p&gt;Ask an LLM to write a test for the same interface and you'll often get completely different results depending on whether your project uses GoMock, Mockery, MockIO, Minimock, Moq, or hand-written test doubles.&lt;/p&gt;

&lt;p&gt;The problem isn't that the models are bad.&lt;/p&gt;

&lt;p&gt;The problem is that mocking libraries represent very different philosophies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strict vs flexible&lt;/li&gt;
&lt;li&gt;Generated vs runtime-created&lt;/li&gt;
&lt;li&gt;DSL-heavy vs idiomatic Go&lt;/li&gt;
&lt;li&gt;Feature-rich vs minimalist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In this article we'll compare the most popular Go mocking libraries in 2026, examine their strengths and weaknesses, and discuss which one may be the best fit for your project.&lt;/p&gt;




&lt;h1&gt;
  
  
  What Makes a Good Mocking Library?
&lt;/h1&gt;

&lt;p&gt;Before comparing tools, it's worth defining what matters.&lt;/p&gt;

&lt;p&gt;A good mocking library should ideally provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy mock generation&lt;/li&gt;
&lt;li&gt;Clear test failures&lt;/li&gt;
&lt;li&gt;Minimal boilerplate&lt;/li&gt;
&lt;li&gt;Strong refactoring support&lt;/li&gt;
&lt;li&gt;Good IDE experience&lt;/li&gt;
&lt;li&gt;Readable tests&lt;/li&gt;
&lt;li&gt;Reliable call verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Different libraries optimize for different parts of this list.&lt;/p&gt;

&lt;p&gt;That's why there is no universally correct answer.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. GoMock: The Enterprise Workhorse
&lt;/h1&gt;

&lt;p&gt;GoMock remains one of the most widely used mocking frameworks in the Go ecosystem.&lt;/p&gt;

&lt;p&gt;Originally created by Google and now actively maintained by Uber, it has become the standard choice for many large organizations.&lt;/p&gt;

&lt;p&gt;Its philosophy is straightforward: define expectations explicitly and verify them rigorously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestUserService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctrl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gomock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewController&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;NewMockUserRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EXPECT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gomock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What It Does Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Excellent matcher support&lt;/li&gt;
&lt;li&gt;Strong verification guarantees&lt;/li&gt;
&lt;li&gt;Call ordering support&lt;/li&gt;
&lt;li&gt;Mature ecosystem&lt;/li&gt;
&lt;li&gt;Well understood across large teams&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Drawbacks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Requires code generation&lt;/li&gt;
&lt;li&gt;Can become verbose&lt;/li&gt;
&lt;li&gt;DSL feels heavy in simple tests&lt;/li&gt;
&lt;li&gt;Generated files add maintenance overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Best Fit
&lt;/h2&gt;

&lt;p&gt;Large codebases where consistency and strictness matter more than simplicity.&lt;/p&gt;
&lt;h1&gt;
  
  
  2. Testify + Mockery: The Safe Default
&lt;/h1&gt;

&lt;p&gt;If you started a new Go project today and asked ten developers which mocking stack to use, this would probably be the most common answer.&lt;/p&gt;

&lt;p&gt;Testify provides assertions and mocking support while Mockery generates mocks from interfaces.&lt;/p&gt;

&lt;p&gt;The combination has become the default choice for many teams.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestUserService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;mocks&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewUserRepository&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EXPECT&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Anything&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Return&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Once&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What It Does Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Familiar API&lt;/li&gt;
&lt;li&gt;Large community&lt;/li&gt;
&lt;li&gt;Excellent assertion integration&lt;/li&gt;
&lt;li&gt;Good balance between flexibility and verification&lt;/li&gt;
&lt;li&gt;Easy onboarding for new developers&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Drawbacks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Less strict than GoMock&lt;/li&gt;
&lt;li&gt;Generated mocks can grow large&lt;/li&gt;
&lt;li&gt;Expectations are easier to misconfigure&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Best Fit
&lt;/h2&gt;

&lt;p&gt;Most application teams.&lt;/p&gt;

&lt;p&gt;If you're unsure what to choose, this is usually the safest answer.&lt;/p&gt;
&lt;h1&gt;
  
  
  3. MockIO: The Most Interesting Newcomer
&lt;/h1&gt;

&lt;p&gt;MockIO takes a different approach.&lt;/p&gt;

&lt;p&gt;Unlike traditional Go mocking frameworks, it supports runtime-created mocks and offers a modern matcher system inspired by frameworks from other languages.&lt;/p&gt;

&lt;p&gt;For developers tired of constantly regenerating mocks, this is immediately appealing.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestUserService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctrl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewMockController&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;mockopts&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;StrictVerify&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Mock&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;UserRepository&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WhenDouble&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AnyContext&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;mock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ThenReturn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What It Does Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Runtime mocks&lt;/li&gt;
&lt;li&gt;Rich matcher support&lt;/li&gt;
&lt;li&gt;Powerful argument capture&lt;/li&gt;
&lt;li&gt;Less dependency on generated code&lt;/li&gt;
&lt;li&gt;Modern API design&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Drawbacks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Smaller ecosystem&lt;/li&gt;
&lt;li&gt;Depends on compiler internals and unsafe features&lt;/li&gt;
&lt;li&gt;Less proven in very large codebases&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Best Fit
&lt;/h2&gt;

&lt;p&gt;Developers looking for a modern alternative to traditional code-generation workflows.&lt;/p&gt;
&lt;h1&gt;
  
  
  4. Minimock: Fast and Strict
&lt;/h1&gt;

&lt;p&gt;Minimock focuses on simplicity and performance.&lt;/p&gt;

&lt;p&gt;It generates lightweight mocks and automatically verifies expectations when tests finish.&lt;/p&gt;

&lt;p&gt;The result is a relatively small API surface with strong guarantees.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestUserService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctrl&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;minimock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewController&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;NewUserRepositoryMock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;repo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUserMock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;When&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minimock&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;AnyContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
        &lt;span class="n"&gt;Then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What It Does Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Fast execution&lt;/li&gt;
&lt;li&gt;Strict verification&lt;/li&gt;
&lt;li&gt;Clean generated code&lt;/li&gt;
&lt;li&gt;Automatic cleanup integration&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Drawbacks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Smaller community&lt;/li&gt;
&lt;li&gt;Fewer advanced capabilities&lt;/li&gt;
&lt;li&gt;Less flexibility than GoMock&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Best Fit
&lt;/h2&gt;

&lt;p&gt;Teams that value strict tests and fast feedback cycles.&lt;/p&gt;
&lt;h1&gt;
  
  
  5. Moq: The Go-Like Option
&lt;/h1&gt;

&lt;p&gt;Moq has a philosophy that many Go developers appreciate:&lt;/p&gt;

&lt;p&gt;Don't build a framework if ordinary Go code can do the job.&lt;/p&gt;

&lt;p&gt;Instead of constructing a large expectation DSL, Moq generates structs whose behavior is implemented through functions.&lt;/p&gt;
&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestUserService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;repo&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;UserRepositoryMock&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;GetUserFunc&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"123"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Equal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"John"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  What It Does Well
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Extremely simple&lt;/li&gt;
&lt;li&gt;Minimal abstraction&lt;/li&gt;
&lt;li&gt;Highly readable tests&lt;/li&gt;
&lt;li&gt;Easy to debug&lt;/li&gt;
&lt;li&gt;Feels like ordinary Go&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Drawbacks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Limited matcher support&lt;/li&gt;
&lt;li&gt;Manual verification is sometimes necessary&lt;/li&gt;
&lt;li&gt;Less suitable for highly complex interaction testing&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Best Fit
&lt;/h2&gt;

&lt;p&gt;Developers who prefer explicit code over frameworks.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Bigger Trend: Fewer Mocks, More Fakes
&lt;/h1&gt;

&lt;p&gt;One of the most interesting testing trends in 2026 is that many experienced Go teams are using fewer mocks than they did a few years ago.&lt;/p&gt;

&lt;p&gt;Instead of mocking every dependency, they're increasingly creating lightweight in-memory implementations.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;FakeUserRepo&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;FakeUserRepo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;GetUser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Compared to mocks, fakes often provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better readability&lt;/li&gt;
&lt;li&gt;More realistic behavior&lt;/li&gt;
&lt;li&gt;Easier maintenance&lt;/li&gt;
&lt;li&gt;Reduced brittleness&lt;/li&gt;
&lt;li&gt;Better AI-generated tests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mocks remain valuable for external boundaries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment providers&lt;/li&gt;
&lt;li&gt;Email services&lt;/li&gt;
&lt;li&gt;Message queues&lt;/li&gt;
&lt;li&gt;LLM providers&lt;/li&gt;
&lt;li&gt;Third-party APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But many teams no longer mock every interface by default.&lt;/p&gt;
&lt;h1&gt;
  
  
  Which One Should You Choose?
&lt;/h1&gt;

&lt;p&gt;If you're starting a new project today:&lt;/p&gt;
&lt;h3&gt;
  
  
  Choose GoMock if
&lt;/h3&gt;

&lt;p&gt;You want maximum verification and are working in a large organization.&lt;/p&gt;
&lt;h3&gt;
  
  
  Choose Testify + Mockery if
&lt;/h3&gt;

&lt;p&gt;You want the safest and most widely adopted option.&lt;/p&gt;
&lt;h3&gt;
  
  
  Choose MockIO if
&lt;/h3&gt;

&lt;p&gt;You want modern runtime mocking and fewer code-generation steps.&lt;/p&gt;
&lt;h3&gt;
  
  
  Choose Minimock if
&lt;/h3&gt;

&lt;p&gt;You prioritize speed and strictness.&lt;/p&gt;
&lt;h3&gt;
  
  
  Choose Moq if
&lt;/h3&gt;

&lt;p&gt;You believe tests should look as much like ordinary Go as possible.&lt;/p&gt;
&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;The most important shift in Go testing isn't a new mocking framework.&lt;/p&gt;

&lt;p&gt;It's that maintainability has become more important than capability.&lt;/p&gt;

&lt;p&gt;In 2026, every major mocking library can mock interfaces effectively. The real differentiator is what your tests look like six months later when someone else has to understand them.&lt;/p&gt;

&lt;p&gt;The best mocking framework is rarely the one with the longest feature list.&lt;/p&gt;

&lt;p&gt;It's the one your team can read, trust, and maintain.&lt;/p&gt;

&lt;p&gt;And increasingly, it's the one that both humans and AI assistants can work with comfortably.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does your team use today: a mocking framework, hand-written fakes, or a mix of both? Have your testing practices changed since AI coding assistants became part of your workflow?&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.hi.md" rel="noopener noreferrer"&gt;🇮🇳 हिन्दी&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>go</category>
    </item>
    <item>
      <title>What's Happening With AI Token Prices, and What It Means for Your Budget</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Sun, 07 Jun 2026 15:36:48 +0000</pubDate>
      <link>https://dev.to/shrsv/whats-happening-with-ai-token-prices-and-what-it-means-for-your-budget-4i3o</link>
      <guid>https://dev.to/shrsv/whats-happening-with-ai-token-prices-and-what-it-means-for-your-budget-4i3o</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you prefer watching a video version of this article, check out:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/To-uMGMejAA"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Every few months, we seem to hear two completely different stories about AI pricing.&lt;/p&gt;

&lt;p&gt;On one side, AI labs and industry commentators tell us that AI is getting dramatically cheaper. If current trends continue, they argue, we'll eventually stop thinking much about token costs at all because intelligence will be abundant and inexpensive.&lt;/p&gt;

&lt;p&gt;On the other side, developers are staring at real AI bills. Teams building agents are running into rate limits. Companies are spending meaningful amounts of money on inference. For many organizations, token costs are still a very real constraint.&lt;/p&gt;

&lt;p&gt;So which story is true?&lt;/p&gt;

&lt;p&gt;The answer is: both.&lt;/p&gt;

&lt;p&gt;To understand why, it helps to look beyond the headlines and examine what's actually happening to the economics of AI inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Software Eating the World to AI Eating Software
&lt;/h2&gt;

&lt;p&gt;Over the last few decades, software transformed almost every industry.&lt;/p&gt;

&lt;p&gt;Healthcare, finance, education, transportation, government—software touched all of them. Marc Andreessen famously described this trend as "software eating the world."&lt;/p&gt;

&lt;p&gt;Now we may be entering the next phase.&lt;/p&gt;

&lt;p&gt;Increasingly, AI agents are starting to eat software itself. Instead of clicking through interfaces and workflows, users can simply tell an AI what they want done and let it handle the work.&lt;/p&gt;

&lt;p&gt;That's why token economics matter.&lt;/p&gt;

&lt;p&gt;The cost of tokens ultimately determines the cost of the products, agents, and applications being built on top of modern AI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Token Economics Matters
&lt;/h2&gt;

&lt;p&gt;Take AI-powered code review tools as an example.&lt;/p&gt;

&lt;p&gt;At LiveReview, we compete with products like CodeRabbit and other AI-assisted code review platforms. One of the most important questions in that market is simple:&lt;/p&gt;

&lt;p&gt;If a customer spends one dollar, how much value do they get?&lt;/p&gt;

&lt;p&gt;Can we provide more useful reviews? Better analysis? More actionable feedback?&lt;/p&gt;

&lt;p&gt;Those questions are directly tied to token costs because token costs sit underneath almost every AI product.&lt;/p&gt;

&lt;p&gt;One of the reasons we're optimistic about the future is that customers should be able to buy more intelligence for the same amount of money over time.&lt;/p&gt;

&lt;p&gt;In practical terms, that means the same budget should eventually buy more reviews, more analysis, more automation, and better outcomes.&lt;/p&gt;

&lt;p&gt;To understand why, it's worth revisiting a familiar idea: Moore's Law.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moore's Law and the Economics of Compute
&lt;/h2&gt;

&lt;p&gt;Moore's Law is one of the most important trends in the history of technology.&lt;/p&gt;

&lt;p&gt;Originally proposed by Intel co-founder Gordon Moore, it observed that the amount of compute available per dollar roughly doubled every two years.&lt;/p&gt;

&lt;p&gt;If one dollar bought 100 units of compute today, then in two years it might buy 200 units. Two years after that, 400 units. Then 800.&lt;/p&gt;

&lt;p&gt;That steady increase in compute-per-dollar helped drive the software revolution.&lt;/p&gt;

&lt;p&gt;But what's interesting is that parts of the AI industry now appear to be moving even faster than traditional Moore's Law.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Emergence of Tiered Super-Moore's Law
&lt;/h2&gt;

&lt;p&gt;Recent research has introduced the idea of "Tiered Super-Moore's Law."&lt;/p&gt;

&lt;p&gt;The basic observation is straightforward.&lt;/p&gt;

&lt;p&gt;New AI capabilities usually appear first in expensive frontier models. Then competition, optimization, infrastructure improvements, and engineering innovations rapidly drive down the cost of delivering those same capabilities.&lt;/p&gt;

&lt;p&gt;You can think of the market as having three broad tiers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpka2dxlyfp6qshx2340g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpka2dxlyfp6qshx2340g.png" alt="tier list" width="799" height="164"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Frontier Models
&lt;/h3&gt;

&lt;p&gt;These are the most capable models available.&lt;/p&gt;

&lt;p&gt;Think GPT, Claude Opus, Gemini Pro, and other flagship systems operating at the cutting edge.&lt;/p&gt;

&lt;p&gt;These models are typically priced above $5 per million input tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mid-Tier Models
&lt;/h3&gt;

&lt;p&gt;These models offer strong performance at much lower prices.&lt;/p&gt;

&lt;p&gt;Pricing generally falls between $0.50 and $5 per million tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Economy Models
&lt;/h3&gt;

&lt;p&gt;These are highly optimized models designed for large-scale production workloads.&lt;/p&gt;

&lt;p&gt;Pricing is often below $0.50 per million tokens, making them attractive for repetitive or high-volume tasks.&lt;/p&gt;

&lt;p&gt;The key insight is that capabilities don't stay in the frontier tier forever.&lt;/p&gt;

&lt;p&gt;A capability that first appears in an expensive model eventually works its way down into cheaper and cheaper models.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Prices Are Falling Remarkably Fast
&lt;/h2&gt;

&lt;p&gt;According to the research, frontier and mid-tier model prices have often fallen by 10× to 30× per year.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7g39ec083wynil00159p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7g39ec083wynil00159p.png" alt="tiered price fall" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's an astonishing rate of change.&lt;/p&gt;

&lt;p&gt;Imagine a task that costs $100 today.&lt;/p&gt;

&lt;p&gt;If prices fall by 10× over the next year, that same task costs about $10.&lt;/p&gt;

&lt;p&gt;Another year of similar improvement brings the cost down to roughly $1.&lt;/p&gt;

&lt;p&gt;That's not a small efficiency gain. It's a dramatic reduction in the cost of intelligence.&lt;/p&gt;

&lt;p&gt;The pattern is becoming increasingly familiar: new capabilities show up first in expensive models, and then the industry gets very good at making those capabilities cheaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Frontier Capabilities Become Affordable
&lt;/h2&gt;

&lt;p&gt;The process is surprisingly predictable.&lt;/p&gt;

&lt;p&gt;A frontier model demonstrates a new capability. Then researchers and engineers figure out how to reproduce similar results more efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7s5y961yk8wppjxyf38.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe7s5y961yk8wppjxyf38.png" alt="techniques" width="800" height="458"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some of the techniques used include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quantization — Reduces the numerical precision of model weights and computations to lower memory usage and speed up inference.&lt;/li&gt;
&lt;li&gt;Distillation — Trains a smaller model to mimic the behavior of a larger model, preserving much of its capability at a lower cost.&lt;/li&gt;
&lt;li&gt;Mixture-of-Experts architectures — Activates only a subset of specialized model components for each task, reducing compute requirements.&lt;/li&gt;
&lt;li&gt;Flash Attention — An optimized attention algorithm that improves speed and memory efficiency when processing long contexts.&lt;/li&gt;
&lt;li&gt;Speculative decoding — Uses a smaller model to predict likely outputs ahead of time, allowing the larger model to generate responses faster.&lt;/li&gt;
&lt;li&gt;KV-cache optimization — Reuses previously computed attention keys and values so the model does not need to recompute them for every token.&lt;/li&gt;
&lt;li&gt;Prompt caching — Stores and reuses computations for repeated prompts, reducing latency and inference costs.&lt;/li&gt;
&lt;li&gt;Parameter-efficient fine-tuning — Adapts models to new tasks by updating only a small subset of parameters instead of retraining the entire model.&lt;/li&gt;
&lt;li&gt;Model routing — Directs requests to the most appropriate model for a given task, balancing cost, speed, and quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every one of these innovations reduces the amount of compute needed to achieve a given result.&lt;/p&gt;

&lt;p&gt;Over time, capabilities that were once expensive become available to almost everyone.&lt;/p&gt;

&lt;p&gt;That's one of the reasons AI pricing behaves differently from traditional software pricing. The thing being sold—the intelligence itself—is constantly becoming cheaper to produce.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economy Tier Effect
&lt;/h2&gt;

&lt;p&gt;What's particularly interesting is that even economy models continue getting cheaper.&lt;/p&gt;

&lt;p&gt;The rate of improvement is slower than at the frontier, but it's still meaningful.&lt;/p&gt;

&lt;p&gt;Suppose a task costs $100 today using economy-tier models.&lt;/p&gt;

&lt;p&gt;If costs fall roughly 2× per year:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Year 1: $50&lt;/li&gt;
&lt;li&gt;Year 2: $25&lt;/li&gt;
&lt;li&gt;Year 3: $12.50&lt;/li&gt;
&lt;li&gt;Year 4: $6&lt;/li&gt;
&lt;li&gt;Year 5: Approximately $3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Five years later, a workload that once cost $100 might cost only a few dollars.&lt;/p&gt;

&lt;p&gt;Even if model quality stayed exactly the same, the cost of accessing that quality would continue to decline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Buying More Intelligence Per Dollar
&lt;/h2&gt;

&lt;p&gt;Another way to think about all of this is that you're buying more intelligence for the same amount of money.&lt;/p&gt;

&lt;p&gt;Of course, intelligence isn't a perfectly measurable unit.&lt;/p&gt;

&lt;p&gt;But the practical effect is easy to see.&lt;/p&gt;

&lt;p&gt;For the same budget, organizations can often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Process more data&lt;/li&gt;
&lt;li&gt;Analyze more context&lt;/li&gt;
&lt;li&gt;Run more agents&lt;/li&gt;
&lt;li&gt;Generate more outputs&lt;/li&gt;
&lt;li&gt;Perform more reviews&lt;/li&gt;
&lt;li&gt;Execute more reasoning steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, the amount of useful cognitive work available per dollar keeps increasing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Customers Should Demand
&lt;/h2&gt;

&lt;p&gt;This has important implications for both buyers and builders of AI products.&lt;/p&gt;

&lt;p&gt;Customers shouldn't expect AI products to stay static.&lt;/p&gt;

&lt;p&gt;If intelligence is getting cheaper, then products should become more valuable over time.&lt;/p&gt;

&lt;p&gt;That value might show up as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More reviews per month&lt;/li&gt;
&lt;li&gt;Better analysis&lt;/li&gt;
&lt;li&gt;Larger context windows&lt;/li&gt;
&lt;li&gt;Improved security scanning&lt;/li&gt;
&lt;li&gt;More comprehensive monitoring&lt;/li&gt;
&lt;li&gt;Higher reliability&lt;/li&gt;
&lt;li&gt;Better user experiences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As the cost of intelligence falls, customers should benefit from those gains.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for LiveReview
&lt;/h2&gt;

&lt;p&gt;For us at LiveReview, the goal is pretty simple.&lt;/p&gt;

&lt;p&gt;As AI economics improve, we want those improvements to show up in the product.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More code reviews per dollar&lt;/li&gt;
&lt;li&gt;More analysis per dollar&lt;/li&gt;
&lt;li&gt;Better answers to developer questions&lt;/li&gt;
&lt;li&gt;Increased security coverage&lt;/li&gt;
&lt;li&gt;Improved production stability&lt;/li&gt;
&lt;li&gt;Greater overall effectiveness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The objective isn't simply to capture efficiency gains internally.&lt;/p&gt;

&lt;p&gt;The objective is to turn those gains into more value for customers.&lt;/p&gt;

&lt;p&gt;Over time, customers should expect to get substantially more from the same subscription spend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Trend Is Likely to Continue
&lt;/h2&gt;

&lt;p&gt;Many of the techniques responsible for previous cost reductions have already delivered enormous gains.&lt;/p&gt;

&lt;p&gt;But there is still plenty of room for improvement.&lt;/p&gt;

&lt;p&gt;Future advances may include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better model compression techniques, allowing models to deliver similar performance while using fewer resources.&lt;/li&gt;
&lt;li&gt;More sophisticated routing systems that direct tasks to the most efficient model for the job instead of always using the most expensive option.&lt;/li&gt;
&lt;li&gt;Improved reasoning architectures that can solve complex problems with fewer computational steps.&lt;/li&gt;
&lt;li&gt;More efficient inference algorithms that reduce the amount of compute required to generate high-quality outputs.&lt;/li&gt;
&lt;li&gt;Hardware improvements, including new generations of AI chips designed specifically for large-scale inference workloads.&lt;/li&gt;
&lt;li&gt;Lower energy consumption, which can significantly reduce the operational costs of running AI systems.&lt;/li&gt;
&lt;li&gt;Better datacenter utilization, helping providers get more value from existing infrastructure and pass some of those savings on to customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each layer of optimization pushes the cost of intelligence lower.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The conversation around AI costs often sounds contradictory because people are looking at different parts of the same trend.&lt;/p&gt;

&lt;p&gt;Yes, AI can still be expensive today.&lt;/p&gt;

&lt;p&gt;Yes, developers still hit rate limits and budget constraints.&lt;/p&gt;

&lt;p&gt;But the broader pattern is clear: the cost of intelligence is falling rapidly.&lt;/p&gt;

&lt;p&gt;New capabilities appear first in expensive frontier models. Then optimization, distillation, compression, competition, and infrastructure improvements make those capabilities dramatically cheaper.&lt;/p&gt;

&lt;p&gt;The result is that every year, a dollar tends to buy more intelligence than it did before.&lt;/p&gt;

&lt;p&gt;For companies building AI products, the challenge is straightforward: take those falling costs and turn them into better outcomes for customers.&lt;/p&gt;

&lt;p&gt;The companies that do that best are likely to be some of the biggest winners in the next phase of AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;References&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tiered Super-Moore's Law: Price Evolution, Production Frontiers, and Market Competition in Large Language Model Inference Services (&lt;a href="https://arxiv.org/abs/2603.28576" rel="noopener noreferrer"&gt;arxiv&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;The Price of Progress: Price Performance and the Future of AI (&lt;a href="https://arxiv.org/abs/2603.28576" rel="noopener noreferrer"&gt;arxiv&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.hi.md" rel="noopener noreferrer"&gt;🇮🇳 हिन्दी&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;
&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>tokeneconomics</category>
      <category>inferencecosts</category>
      <category>llms</category>
    </item>
    <item>
      <title>Getting Started with Genkit in Go: Building Production-Ready AI Applications Without Reinventing the Wheel</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Sat, 06 Jun 2026 18:18:42 +0000</pubDate>
      <link>https://dev.to/shrsv/getting-started-with-genkit-in-go-building-production-ready-ai-applications-without-reinventing-26lf</link>
      <guid>https://dev.to/shrsv/getting-started-with-genkit-in-go-building-production-ready-ai-applications-without-reinventing-26lf</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Large Language Models have made it surprisingly easy to generate text.&lt;/p&gt;

&lt;p&gt;Building a reliable AI application, however, is a completely different problem.&lt;/p&gt;

&lt;p&gt;Once you move beyond a simple "send prompt, get response" demo, you quickly encounter real-world concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt management&lt;/li&gt;
&lt;li&gt;Structured outputs&lt;/li&gt;
&lt;li&gt;Multi-step workflows&lt;/li&gt;
&lt;li&gt;Tool calling&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;li&gt;Evaluation&lt;/li&gt;
&lt;li&gt;Model switching&lt;/li&gt;
&lt;li&gt;Production debugging&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many teams end up creating custom frameworks around OpenAI, Anthropic, Gemini, or local models just to manage these concerns.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Genkit&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;Originally developed by Google, Genkit provides a framework for building AI-powered applications with a focus on workflows, tooling, observability, evaluation, and production readiness.&lt;/p&gt;

&lt;p&gt;While most examples online focus on Node.js, Genkit now has growing support for Go, making it an interesting option for backend engineers who want AI capabilities without introducing an entirely separate application stack.&lt;/p&gt;

&lt;p&gt;In this article we'll build practical examples and explore how Genkit helps structure real-world AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why Genkit Exists
&lt;/h1&gt;

&lt;p&gt;Most AI applications evolve like this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;callLLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Everything seems simple.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retry logic&lt;/li&gt;
&lt;li&gt;Prompt versioning&lt;/li&gt;
&lt;li&gt;JSON outputs&lt;/li&gt;
&lt;li&gt;Tool integrations&lt;/li&gt;
&lt;li&gt;Tracing&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Human review workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now your codebase starts accumulating AI-specific infrastructure.&lt;/p&gt;

&lt;p&gt;Genkit attempts to provide these building blocks from day one.&lt;/p&gt;

&lt;p&gt;Think of it as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Spring Boot for AI workflows" rather than "an LLM SDK."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;
  
  
  Installing Genkit for Go
&lt;/h1&gt;

&lt;p&gt;Create a new project:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;genkit-demo
&lt;span class="nb"&gt;cd &lt;/span&gt;genkit-demo

go mod init github.com/example/genkit-demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Install Genkit:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/firebase/genkit/go/ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Depending on your provider, you'll also install provider plugins.&lt;/p&gt;

&lt;p&gt;For Gemini:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/firebase/genkit/go/plugins/googleai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Your First AI Call
&lt;/h1&gt;

&lt;p&gt;Let's start with a simple generation.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/firebase/genkit/go/ai"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/firebase/genkit/go/genkit"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/firebase/genkit/go/plugins/googleai"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPlugins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;googleai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GoogleAI&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;APIKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"YOUR_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GenerateRequest&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"googleai/gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Prompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Explain vector databases in one paragraph."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This resembles a normal LLM call, but Genkit's value becomes more apparent when applications grow beyond this stage.&lt;/p&gt;


&lt;h1&gt;
  
  
  Structured Outputs: Stop Parsing AI Text
&lt;/h1&gt;

&lt;p&gt;One of the most common mistakes in AI systems is asking models to return text and then parsing it manually.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name: John
Score: 87
Risk: Medium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Use schemas.&lt;/p&gt;

&lt;p&gt;Imagine a customer-support ticket classifier.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;TicketClassification&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Category&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"category"`&lt;/span&gt;
    &lt;span class="n"&gt;Priority&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"priority"`&lt;/span&gt;
    &lt;span class="n"&gt;Summary&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"summary"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Prompt:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Classify this support ticket.

Return JSON matching the schema.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now downstream services can safely consume the result.&lt;/p&gt;

&lt;p&gt;Real-world uses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lead qualification&lt;/li&gt;
&lt;li&gt;Risk analysis&lt;/li&gt;
&lt;li&gt;Invoice extraction&lt;/li&gt;
&lt;li&gt;Customer support routing&lt;/li&gt;
&lt;li&gt;Contract review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Structured outputs dramatically reduce prompt fragility.&lt;/p&gt;
&lt;h1&gt;
  
  
  Building Multi-Step AI Workflows
&lt;/h1&gt;

&lt;p&gt;Most production AI systems involve multiple steps.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Customer email arrives.&lt;/p&gt;

&lt;p&gt;Workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Summarize email&lt;/li&gt;
&lt;li&gt;Detect sentiment&lt;/li&gt;
&lt;li&gt;Extract action items&lt;/li&gt;
&lt;li&gt;Generate response draft&lt;/li&gt;
&lt;li&gt;Send for human review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Without a framework:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Controller
 ├─ LLM Call #1
 ├─ LLM Call #2
 ├─ LLM Call #3
 └─ LLM Call #4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Logic becomes difficult to maintain.&lt;/p&gt;

&lt;p&gt;With Genkit, you can model the workflow as a flow.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;summaryFlow&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefineFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"summarizeCustomerEmail"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

        &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GenerateRequest&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"googleai/gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;Prompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Summarize:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Flows become reusable application components rather than scattered LLM calls.&lt;/p&gt;
&lt;h1&gt;
  
  
  Tool Calling: Let the Model Use Your Systems
&lt;/h1&gt;

&lt;p&gt;A common misconception is that AI models should know everything.&lt;/p&gt;

&lt;p&gt;In reality:&lt;/p&gt;

&lt;p&gt;Models should reason.&lt;/p&gt;

&lt;p&gt;Systems should provide facts.&lt;/p&gt;

&lt;p&gt;Imagine an order-tracking assistant.&lt;/p&gt;

&lt;p&gt;Instead of teaching the model about orders:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Order #78291
Status: Shipped
Carrier: FedEx
ETA: Tomorrow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Expose a tool.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetOrderStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;orderID&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"Shipped"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The model decides:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I need order information.
Call tool.
Read result.
Answer user.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This pattern enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Database lookups&lt;/li&gt;
&lt;li&gt;CRM access&lt;/li&gt;
&lt;li&gt;Internal APIs&lt;/li&gt;
&lt;li&gt;Inventory systems&lt;/li&gt;
&lt;li&gt;Knowledge bases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many enterprise AI systems are essentially:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM + Tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;rather than&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM + More Prompting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h1&gt;
  
  
  Observability: The Feature Most Teams Discover Too Late
&lt;/h1&gt;

&lt;p&gt;Suppose users report:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The AI gave a terrible answer."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without tracing, you're blind.&lt;/p&gt;

&lt;p&gt;Questions immediately arise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which prompt was used?&lt;/li&gt;
&lt;li&gt;Which model answered?&lt;/li&gt;
&lt;li&gt;What context was supplied?&lt;/li&gt;
&lt;li&gt;Which tool calls executed?&lt;/li&gt;
&lt;li&gt;How much did it cost?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Genkit includes observability capabilities that make debugging AI workflows significantly easier.&lt;/p&gt;

&lt;p&gt;Traditional debugging:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error at line 87
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;AI debugging:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt
→ Context
→ Tool Calls
→ Model Output
→ Final Result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is often the difference between a manageable production system and weeks of confusion.&lt;/p&gt;
&lt;h1&gt;
  
  
  Real Example: AI-Powered Incident Summaries
&lt;/h1&gt;

&lt;p&gt;Imagine you're running a platform team.&lt;/p&gt;

&lt;p&gt;Every incident generates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slack messages&lt;/li&gt;
&lt;li&gt;Alerts&lt;/li&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Jira tickets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Engineers spend time creating incident reports.&lt;/p&gt;

&lt;p&gt;A Genkit workflow could:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Collect incident data&lt;/li&gt;
&lt;li&gt;Summarize timeline&lt;/li&gt;
&lt;li&gt;Identify root cause indicators&lt;/li&gt;
&lt;li&gt;Draft postmortem&lt;/li&gt;
&lt;li&gt;Suggest follow-up actions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pseudo-flow:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Alerts
   ↓
Summarization
   ↓
Root Cause Analysis
   ↓
Draft Postmortem
   ↓
Engineer Review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is exactly the type of repeatable, multi-step process where Genkit shines.&lt;/p&gt;
&lt;h1&gt;
  
  
  Model Portability Matters More Than Most Teams Expect
&lt;/h1&gt;

&lt;p&gt;Early-stage teams often assume they'll stay with one model forever.&lt;/p&gt;

&lt;p&gt;Reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pricing changes&lt;/li&gt;
&lt;li&gt;New models appear&lt;/li&gt;
&lt;li&gt;Performance shifts&lt;/li&gt;
&lt;li&gt;Compliance requirements emerge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Today's choice:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gemini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Six months later:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Anthropic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Twelve months later:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Local model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Frameworks that separate application logic from model providers reduce migration pain.&lt;/p&gt;

&lt;p&gt;Genkit encourages this separation.&lt;/p&gt;

&lt;p&gt;Your workflow logic remains relatively stable while models evolve underneath.&lt;/p&gt;
&lt;h1&gt;
  
  
  Common Mistakes When Adopting Genkit
&lt;/h1&gt;
&lt;h3&gt;
  
  
  1. Treating It Like Another SDK
&lt;/h3&gt;

&lt;p&gt;Genkit is most valuable when you embrace workflows, tools, schemas, and evaluation.&lt;/p&gt;

&lt;p&gt;Using it only for text generation leaves much of its value unused.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Over-Automating
&lt;/h3&gt;

&lt;p&gt;Not every process should become autonomous.&lt;/p&gt;

&lt;p&gt;Many successful systems use:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI → Human Review → Action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;rather than&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI → Action
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Ignoring Evaluations
&lt;/h3&gt;

&lt;p&gt;A workflow that works today may degrade after:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt changes&lt;/li&gt;
&lt;li&gt;Model upgrades&lt;/li&gt;
&lt;li&gt;Data changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Evaluation should be treated as seriously as unit testing.&lt;/p&gt;
&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;The AI ecosystem currently has no shortage of model providers.&lt;/p&gt;

&lt;p&gt;What many teams actually need is better infrastructure around those models.&lt;/p&gt;

&lt;p&gt;Genkit addresses a practical gap between simple API calls and production-grade AI systems. It provides a structured way to build workflows, integrate tools, monitor behavior, and evolve applications as models change.&lt;/p&gt;

&lt;p&gt;For Go developers, that's particularly valuable because it allows AI capabilities to live inside existing backend services rather than forcing a separate JavaScript stack.&lt;/p&gt;

&lt;p&gt;The interesting question is no longer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Which model should I use?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's increasingly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"How do I build a system that can survive five generations of models?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Frameworks like Genkit are one possible answer.&lt;/p&gt;

&lt;p&gt;If you were building an AI-powered product today, which capability would you invest in first:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;better models, better prompts, better tools, or better workflows?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And more importantly, which of those do you think will still be a competitive advantage three years from now?&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Build a&lt;/strong&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Steering Vectors: The Hidden Control Knobs Inside Large Language Models</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Thu, 04 Jun 2026 18:52:02 +0000</pubDate>
      <link>https://dev.to/shrsv/steering-vectors-the-hidden-control-knobs-inside-large-language-models-3hj0</link>
      <guid>https://dev.to/shrsv/steering-vectors-the-hidden-control-knobs-inside-large-language-models-3hj0</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;What if you could change how an AI thinks without retraining it?&lt;/p&gt;

&lt;p&gt;Not by rewriting prompts. Not by fine-tuning billions of parameters. Not by collecting another mountain of training data.&lt;/p&gt;

&lt;p&gt;Instead, imagine finding a direction inside the model's internal representation space and nudging the model a little in that direction.&lt;/p&gt;

&lt;p&gt;A small push.&lt;/p&gt;

&lt;p&gt;A different behavior.&lt;/p&gt;

&lt;p&gt;This idea sits at the heart of one of the most fascinating areas of modern AI interpretability: &lt;strong&gt;steering vectors&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Steering vectors suggest that many behaviors we care about—careful reasoning, honesty, coding style, security awareness, verbosity, and more—may already exist inside a model. The challenge is learning how to activate them.&lt;/p&gt;

&lt;p&gt;Let's explore what steering vectors are, how they're created, and why they might become one of the most practical tools for controlling AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  1. What Exactly Is a Steering Vector?
&lt;/h1&gt;

&lt;p&gt;Large language models process information through layers of high-dimensional activations.&lt;/p&gt;

&lt;p&gt;At any point during generation, the model's internal state can be represented as a vector containing thousands of numbers.&lt;/p&gt;

&lt;p&gt;Researchers discovered something surprising:&lt;/p&gt;

&lt;p&gt;Different behaviors often correspond to different regions of this activation space.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing Python code&lt;/li&gt;
&lt;li&gt;Solving math problems&lt;/li&gt;
&lt;li&gt;Speaking French&lt;/li&gt;
&lt;li&gt;Explaining concepts carefully&lt;/li&gt;
&lt;li&gt;Producing insecure code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each tends to produce distinctive activation patterns.&lt;/p&gt;

&lt;p&gt;A steering vector is essentially the difference between two activation patterns.&lt;/p&gt;

&lt;p&gt;Suppose we gather examples where the model is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Careful&lt;/li&gt;
&lt;li&gt;Methodical&lt;/li&gt;
&lt;li&gt;Thorough&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;and compare them to examples where it is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rushed&lt;/li&gt;
&lt;li&gt;Superficial&lt;/li&gt;
&lt;li&gt;Incomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The average difference between these internal states becomes a steering vector.&lt;/p&gt;

&lt;p&gt;At inference time, we can add that vector back into the model's activations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;new_activation = activation + α × steering_vector
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;where α controls the steering strength.&lt;/p&gt;

&lt;p&gt;Conceptually, it's like moving the model's internal state toward a desired behavior.&lt;/p&gt;
&lt;h1&gt;
  
  
  2. Why Steering Vectors Matter
&lt;/h1&gt;

&lt;p&gt;Traditionally, changing model behavior meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More training&lt;/li&gt;
&lt;li&gt;More data&lt;/li&gt;
&lt;li&gt;More compute&lt;/li&gt;
&lt;li&gt;More cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Steering vectors challenge that assumption.&lt;/p&gt;

&lt;p&gt;They suggest that many capabilities already exist inside the model and merely need to be activated.&lt;/p&gt;

&lt;p&gt;This has an important implication:&lt;/p&gt;

&lt;p&gt;The model may know more than it appears to know.&lt;/p&gt;

&lt;p&gt;The behavior is already present, but not always dominant.&lt;/p&gt;

&lt;p&gt;Instead of teaching the model something new, steering often means amplifying a latent behavior that already exists.&lt;/p&gt;

&lt;p&gt;This is one reason steering vectors have attracted significant attention from interpretability researchers.&lt;/p&gt;

&lt;p&gt;They provide a glimpse into how concepts may be organized internally.&lt;/p&gt;
&lt;h1&gt;
  
  
  3. The Most Useful Coding Applications
&lt;/h1&gt;

&lt;p&gt;For software engineering, steering vectors could be particularly valuable.&lt;/p&gt;
&lt;h2&gt;
  
  
  Careful Code Review
&lt;/h2&gt;

&lt;p&gt;Imagine building a vector from examples of excellent code reviews versus weak reviews.&lt;/p&gt;

&lt;p&gt;When applied, the model might become more likely to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify edge cases&lt;/li&gt;
&lt;li&gt;Spot race conditions&lt;/li&gt;
&lt;li&gt;Notice missing validation&lt;/li&gt;
&lt;li&gt;Highlight maintainability concerns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;without changing the prompt itself.&lt;/p&gt;
&lt;h2&gt;
  
  
  Security-Oriented Coding
&lt;/h2&gt;

&lt;p&gt;A vector could be constructed from secure versus insecure implementations.&lt;/p&gt;

&lt;p&gt;The model may become more likely to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate inputs&lt;/li&gt;
&lt;li&gt;Sanitize outputs&lt;/li&gt;
&lt;li&gt;Handle failures explicitly&lt;/li&gt;
&lt;li&gt;Avoid common vulnerabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Better Refactoring
&lt;/h2&gt;

&lt;p&gt;Some code is technically correct but difficult to maintain.&lt;/p&gt;

&lt;p&gt;A refactoring-oriented steering vector could encourage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clearer abstractions&lt;/li&gt;
&lt;li&gt;Better naming&lt;/li&gt;
&lt;li&gt;Simpler control flow&lt;/li&gt;
&lt;li&gt;Reduced complexity&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Thinking Before Coding
&lt;/h2&gt;

&lt;p&gt;Perhaps the most interesting possibility is steering toward analysis before implementation.&lt;/p&gt;

&lt;p&gt;Many coding assistants jump directly into code generation.&lt;/p&gt;

&lt;p&gt;A steering vector could encourage the model to spend more effort evaluating requirements, assumptions, and tradeoffs before writing the first line of code.&lt;/p&gt;
&lt;h1&gt;
  
  
  4. How Researchers Create Steering Vectors
&lt;/h1&gt;

&lt;p&gt;The simplest approach is surprisingly straightforward.&lt;/p&gt;

&lt;p&gt;First, collect two sets of examples.&lt;/p&gt;
&lt;h3&gt;
  
  
  Positive Examples
&lt;/h3&gt;

&lt;p&gt;Examples that exhibit the target behavior.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-quality code reviews&lt;/li&gt;
&lt;li&gt;Secure implementations&lt;/li&gt;
&lt;li&gt;Careful reasoning traces&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Negative Examples
&lt;/h3&gt;

&lt;p&gt;Examples lacking that behavior.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Superficial reviews&lt;/li&gt;
&lt;li&gt;Insecure implementations&lt;/li&gt;
&lt;li&gt;Rushed solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run both datasets through the model.&lt;/li&gt;
&lt;li&gt;Capture activations from a chosen layer.&lt;/li&gt;
&lt;li&gt;Compute the average activation for each group.&lt;/li&gt;
&lt;li&gt;Subtract one average from the other.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The resulting difference vector becomes the steering vector.&lt;/p&gt;

&lt;p&gt;Researchers often call this a &lt;strong&gt;contrastive activation difference&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;More advanced approaches use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear probes&lt;/li&gt;
&lt;li&gt;PCA&lt;/li&gt;
&lt;li&gt;Sparse Autoencoders (SAEs)&lt;/li&gt;
&lt;li&gt;Contrastive learning techniques&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to identify cleaner and more interpretable directions.&lt;/p&gt;
&lt;h1&gt;
  
  
  5. How Do You Evaluate a Steering Vector?
&lt;/h1&gt;

&lt;p&gt;Creating a steering vector is easy.&lt;/p&gt;

&lt;p&gt;Proving it works is much harder.&lt;/p&gt;

&lt;p&gt;A common mistake is assuming a behavior improved simply because the output changed.&lt;/p&gt;

&lt;p&gt;Researchers typically evaluate steering vectors by running controlled benchmarks.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate a set of coding tasks.&lt;/li&gt;
&lt;li&gt;Run the baseline model.&lt;/li&gt;
&lt;li&gt;Run the steered model.&lt;/li&gt;
&lt;li&gt;Compare measurable outcomes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Metrics might include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bugs discovered&lt;/li&gt;
&lt;li&gt;Security issues identified&lt;/li&gt;
&lt;li&gt;Test coverage quality&lt;/li&gt;
&lt;li&gt;Correctness&lt;/li&gt;
&lt;li&gt;False positive rates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Human review is equally important.&lt;/p&gt;

&lt;p&gt;Many steering vectors initially appear useful but primarily increase verbosity.&lt;/p&gt;

&lt;p&gt;Longer answers often look smarter, even when they aren't.&lt;/p&gt;

&lt;p&gt;A good evaluation distinguishes genuine capability improvements from stylistic changes.&lt;/p&gt;
&lt;h1&gt;
  
  
  6. Where Steering Vectors Are Heading Next
&lt;/h1&gt;

&lt;p&gt;The most exciting research is moving beyond single dense vectors.&lt;/p&gt;

&lt;p&gt;A common criticism of steering vectors is that they often blend multiple concepts together.&lt;/p&gt;

&lt;p&gt;A "careful reasoning" vector might simultaneously influence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Length&lt;/li&gt;
&lt;li&gt;Formality&lt;/li&gt;
&lt;li&gt;Confidence&lt;/li&gt;
&lt;li&gt;Attention to detail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Recent interpretability work attempts to break these behaviors into smaller, more precise features.&lt;/p&gt;

&lt;p&gt;Instead of steering toward a broad concept like "good coding," future systems may activate specific internal features such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checking edge cases&lt;/li&gt;
&lt;li&gt;Searching for counterexamples&lt;/li&gt;
&lt;li&gt;Validating assumptions&lt;/li&gt;
&lt;li&gt;Looking for security risks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The long-term vision is not merely controlling outputs.&lt;/p&gt;

&lt;p&gt;It is understanding and controlling the internal computations that generate those outputs.&lt;/p&gt;

&lt;p&gt;If successful, steering could become one of the most practical bridges between interpretability research and real-world AI systems.&lt;/p&gt;
&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Steering vectors reveal something profound about large language models.&lt;/p&gt;

&lt;p&gt;Many behaviors that appear mysterious from the outside may correspond to surprisingly simple geometric directions on the inside.&lt;/p&gt;

&lt;p&gt;We are still far from fully understanding these representations.&lt;/p&gt;

&lt;p&gt;But the idea that a model's behavior can be altered by moving through activation space—without retraining and sometimes without even changing the prompt—offers a fascinating glimpse into how intelligence may be organized inside neural networks.&lt;/p&gt;

&lt;p&gt;And perhaps more importantly, it suggests that the future of AI control might involve understanding the model's internal world rather than merely observing its outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt; If you could build a steering vector for your coding assistant today, what behavior would you choose: deeper reasoning, stronger security awareness, better code reviews, more maintainable code, or something else entirely?&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml" rel="noopener noreferrer"&gt;&lt;img alt="confidence.yml" title="confidence.yml: Minimum confidence workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml" rel="noopener noreferrer"&gt;&lt;img alt="status-doc-link-check.yml" title="status-doc-link-check.yml: Status document integrity workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Build a&lt;/strong&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Use claude-lrc Reviews to Make Your Prod More Stable</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Wed, 03 Jun 2026 19:24:38 +0000</pubDate>
      <link>https://dev.to/shrsv/use-claude-lrc-reviews-to-make-your-prod-more-stable-dk5</link>
      <guid>https://dev.to/shrsv/use-claude-lrc-reviews-to-make-your-prod-more-stable-dk5</guid>
      <description>&lt;p&gt;With tools like Claude, code generation is now fast and relatively inexpensive. For decades, writing code was expensive and reviewing it was relatively cheap, and most engineering processes were built around that reality.&lt;/p&gt;

&lt;p&gt;A handful of senior engineers could review the output of many developers because code generation itself was the constraint.&lt;/p&gt;

&lt;p&gt;Today, a single engineer can generate migrations, APIs, tests, infrastructure code, documentation, and refactors in a fraction of the time it previously required.&lt;/p&gt;

&lt;p&gt;AI has made code generation abundant.&lt;/p&gt;

&lt;p&gt;Most engineering processes are still adapting.&lt;/p&gt;

&lt;p&gt;Code generation has accelerated dramatically.&lt;/p&gt;

&lt;p&gt;Scrutiny has become the scarce resource.&lt;/p&gt;

&lt;h2&gt;
  
  
  The New Failure Mode
&lt;/h2&gt;

&lt;p&gt;The biggest risk with AI-generated code comes from assumptions that never receive enough scrutiny.&lt;/p&gt;

&lt;p&gt;Humans have always written imperfect code.&lt;/p&gt;

&lt;p&gt;Production stability usually comes from somebody asking difficult questions.&lt;/p&gt;

&lt;p&gt;Claude-generated code often looks reasonable.&lt;/p&gt;

&lt;p&gt;It compiles.&lt;/p&gt;

&lt;p&gt;It passes tests.&lt;/p&gt;

&lt;p&gt;It follows conventions.&lt;/p&gt;

&lt;p&gt;Production failures often emerge from assumptions that escaped scrutiny.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A retry loop that amplifies load.&lt;/li&gt;
&lt;li&gt;A missing authorization check.&lt;/li&gt;
&lt;li&gt;A concurrency edge case.&lt;/li&gt;
&lt;li&gt;A migration that works in staging but struggles in production.&lt;/li&gt;
&lt;li&gt;A cache invalidation strategy that quietly fails under real traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As AI increases code output, the amount of code requiring scrutiny grows as well.&lt;/p&gt;

&lt;p&gt;The same reviewers, staff engineers, security engineers, and technical leads are expected to evaluate far more than before.&lt;/p&gt;

&lt;p&gt;Review quality declines when reviews become rushed, and delivery slows when reviews become overloaded.&lt;/p&gt;

&lt;p&gt;Neither outcome improves software quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Centralized Review Breaks Down. So: Decentralize It!
&lt;/h2&gt;

&lt;p&gt;A small group of senior engineers can no longer carry the entire review burden.&lt;/p&gt;

&lt;p&gt;That model fit an environment where code generation was naturally constrained.&lt;/p&gt;

&lt;p&gt;Today, Claude and other AI tools allow every developer to produce significantly more code than before.&lt;/p&gt;

&lt;p&gt;The path forward involves spreading scrutiny throughout the development process.&lt;/p&gt;

&lt;p&gt;More developers need access to the kinds of questions experienced reviewers ask.&lt;/p&gt;

&lt;p&gt;More assumptions need examination before code reaches a formal review queue.&lt;/p&gt;

&lt;p&gt;More scrutiny needs to happen across the team instead of flowing through a handful of overloaded people.&lt;/p&gt;

&lt;p&gt;Review must scale.&lt;/p&gt;

&lt;p&gt;And scaling review means decentralizing it.&lt;/p&gt;

&lt;p&gt;Human reviewers remain essential.&lt;/p&gt;

&lt;p&gt;Their expertise has greater impact when scrutiny becomes as natural to each individual as brushing their teeth daily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Review Must Become as Ubiquitous and Obvious as Brushing One's Teeth
&lt;/h2&gt;

&lt;p&gt;The cheapest and most effective review happens before code gets committed.&lt;/p&gt;

&lt;p&gt;Developers benefit from challenging assumptions while they still have context and while fixes still take minutes.&lt;/p&gt;

&lt;p&gt;Instead of waiting for formal review, they can ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What risks am I missing?&lt;/li&gt;
&lt;li&gt;What assumptions am I making?&lt;/li&gt;
&lt;li&gt;What would a skeptical reviewer question?&lt;/li&gt;
&lt;li&gt;What reliability issues exist here?&lt;/li&gt;
&lt;li&gt;What security concerns should I investigate further?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are exactly the kinds of questions developers can ask Claude while they're still building.&lt;/p&gt;

&lt;p&gt;The earlier these questions are asked, the easier they are to answer.&lt;/p&gt;

&lt;p&gt;Broader review throughout development creates more opportunities to catch issues early.&lt;/p&gt;

&lt;p&gt;Frequent scrutiny while context is fresh helps teams move faster with greater confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Review as Engineering Hygiene
&lt;/h2&gt;

&lt;p&gt;Review is most valuable when it becomes a routine habit rather than a special event.&lt;/p&gt;

&lt;p&gt;Nobody waits until they have a cavity before brushing their teeth.&lt;/p&gt;

&lt;p&gt;The value comes from small preventative actions performed consistently.&lt;/p&gt;

&lt;p&gt;Software quality works the same way.&lt;/p&gt;

&lt;p&gt;A five-minute review before a commit can prevent hours of debugging after deployment.&lt;/p&gt;

&lt;p&gt;A quick challenge to an assumption can eliminate the need for an incident retrospective.&lt;/p&gt;

&lt;p&gt;The best production incident is the one that never happens.&lt;/p&gt;

&lt;p&gt;That requires making review part of everyday engineering work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where claude-lrc Fits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbbnmttmjxmqm6wgs8rp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frbbnmttmjxmqm6wgs8rp.png" alt="claude-lrc everyday" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/HexmosTech/claude-lrc" rel="noopener noreferrer"&gt;claude-lrc&lt;/a&gt; makes continuous review practical by helping developers challenge assumptions while they work.&lt;/p&gt;

&lt;p&gt;The goal is continuous scrutiny that fits naturally into development.&lt;/p&gt;

&lt;p&gt;The goal is review that developers can perform regularly.&lt;/p&gt;

&lt;p&gt;Inside Claude Code.&lt;/p&gt;

&lt;p&gt;Using natural language.&lt;/p&gt;

&lt;p&gt;Using slash commands.&lt;/p&gt;

&lt;p&gt;Using workflows they already use every day.&lt;/p&gt;

&lt;p&gt;Alongside code generation, claude-lrc helps developers examine the code they create.&lt;/p&gt;

&lt;p&gt;Throughout development, &lt;a href="https://github.com/HexmosTech/claude-lrc" rel="noopener noreferrer"&gt;claude-lrc&lt;/a&gt; helps distribute review across many small moments of scrutiny.&lt;/p&gt;

&lt;p&gt;Developers can challenge assumptions earlier.&lt;/p&gt;

&lt;p&gt;Teams can surface risks before formal review.&lt;/p&gt;

&lt;p&gt;Senior engineers can spend more time on high-value judgment and architectural decisions.&lt;/p&gt;

&lt;p&gt;Small reviews.&lt;/p&gt;

&lt;p&gt;Frequent reviews.&lt;/p&gt;

&lt;p&gt;Just like running tests, review should become a normal part of building software.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Next Engineering Habit
&lt;/h2&gt;

&lt;p&gt;Continuous review is becoming the next standard engineering practice.&lt;/p&gt;

&lt;p&gt;Version control became standard.&lt;/p&gt;

&lt;p&gt;Testing became standard.&lt;/p&gt;

&lt;p&gt;Continuous integration became standard.&lt;/p&gt;

&lt;p&gt;Observability became standard.&lt;/p&gt;

&lt;p&gt;AI-assisted development is creating demand for another standard practice.&lt;/p&gt;

&lt;p&gt;Claude can help generate more code than ever before.&lt;/p&gt;

&lt;p&gt;Delivering reliable software still depends on careful evaluation and sound judgment.&lt;/p&gt;

&lt;p&gt;Human attention remains a finite resource.&lt;/p&gt;

&lt;p&gt;The teams that succeed will scale scrutiny alongside generation.&lt;/p&gt;

&lt;p&gt;That means making review continuous.&lt;/p&gt;

&lt;p&gt;And making it available to everyone, not just reviewers.&lt;/p&gt;

&lt;p&gt;We already treat testing as a continuous activity.&lt;/p&gt;

&lt;p&gt;Should code review become continuous too?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>The Most Surprising Idea in AI Image Generation: Sculpting Meaning from Noise</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Tue, 02 Jun 2026 17:08:22 +0000</pubDate>
      <link>https://dev.to/shrsv/the-most-surprising-idea-in-ai-image-generation-sculpting-meaning-from-noise-gkc</link>
      <guid>https://dev.to/shrsv/the-most-surprising-idea-in-ai-image-generation-sculpting-meaning-from-noise-gkc</guid>
      <description>&lt;p&gt;When most developers first encounter AI image generation, they imagine something like a digital artist.&lt;/p&gt;

&lt;p&gt;You type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A red Ferrari on a mountain road at sunset&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and somewhere inside the model, it must be drawing wheels, painting reflections, and composing a scene.&lt;/p&gt;

&lt;p&gt;But that's not what happens.&lt;/p&gt;

&lt;p&gt;In many modern image generation systems, the process begins with something much stranger: pure static. Random noise. The visual equivalent of a detuned television.&lt;/p&gt;

&lt;p&gt;The AI doesn't start with a Ferrari.&lt;/p&gt;

&lt;p&gt;It starts with chaos.&lt;/p&gt;

&lt;p&gt;Then, step by step, it sculpts that chaos into an image that matches your prompt.&lt;/p&gt;

&lt;p&gt;Once you understand this idea, many things about modern AI suddenly make more sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Counterintuitive Discovery: Destroying Images Is Easy
&lt;/h2&gt;

&lt;p&gt;Imagine you have a photograph of a Ferrari.&lt;/p&gt;

&lt;p&gt;Now add a tiny amount of random noise.&lt;/p&gt;

&lt;p&gt;Then add a little more.&lt;/p&gt;

&lt;p&gt;Then more.&lt;/p&gt;

&lt;p&gt;Eventually, the image becomes indistinguishable from static.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Ferrari
   ↓
Slightly Noisy Ferrari
   ↓
Very Noisy Ferrari
   ↓
Static
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This process is trivial. Anyone can write it in a few lines of code.&lt;/p&gt;

&lt;p&gt;What Jascha Sohl-Dickstein and his collaborators realized in 2015 was that while the forward process is easy, perhaps the reverse process could be learned.&lt;/p&gt;

&lt;p&gt;In other words:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If we know how to destroy structure, can a neural network learn how to rebuild it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That simple question eventually became the foundation of diffusion models.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Sculptor Analogy That Finally Made It Click for Me
&lt;/h2&gt;

&lt;p&gt;Many explanations describe diffusion as "removing noise."&lt;/p&gt;

&lt;p&gt;Technically correct.&lt;/p&gt;

&lt;p&gt;But I think there's a better mental model.&lt;/p&gt;

&lt;p&gt;Imagine a sculptor standing beside a random lump of clay.&lt;/p&gt;

&lt;p&gt;The prompt doesn't create the clay.&lt;/p&gt;

&lt;p&gt;The prompt tells the sculptor what to carve.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Random Clay
      +
"Ferrari"
      ↓
Sculpting
      ↓
Ferrari
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same thing happens in diffusion models.&lt;/p&gt;

&lt;p&gt;The initial noise is usually independent of the prompt.&lt;/p&gt;

&lt;p&gt;Instead, the prompt influences every refinement step afterward.&lt;/p&gt;

&lt;p&gt;The model repeatedly asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If this image is supposed to become a Ferrari, what should I change next?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Over time, rough shapes emerge.&lt;/p&gt;

&lt;p&gt;Then wheels.&lt;/p&gt;

&lt;p&gt;Then reflections.&lt;/p&gt;

&lt;p&gt;Then details.&lt;/p&gt;

&lt;p&gt;The image isn't retrieved from memory. It's progressively constructed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Training Loop Is Shockingly Small
&lt;/h2&gt;

&lt;p&gt;One of the most surprising things about diffusion models is how simple the core training loop is.&lt;/p&gt;

&lt;p&gt;At a high level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;caption&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;random_timestep&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;noise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;randn_like&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;noisy_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;add_noise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;noise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;predicted_noise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;noisy_image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;caption&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;mse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_noise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;noise&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's essentially the whole idea.&lt;/p&gt;

&lt;p&gt;The model is not directly learning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Draw a Ferrari.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is learning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Given a noisy image and a caption, predict what noise was added.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds almost too simple.&lt;/p&gt;

&lt;p&gt;Yet the behavior that emerges is remarkable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Generation Is Just Training in Reverse
&lt;/h2&gt;

&lt;p&gt;During inference, we start from pure noise.&lt;/p&gt;

&lt;p&gt;Then we repeatedly ask the model what noise should be removed.&lt;/p&gt;

&lt;p&gt;Conceptually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;random_noise&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;

    &lt;span class="n"&gt;predicted_noise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;remove_noise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;predicted_noise&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;t&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The process looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Static
   ↓
Less Static
   ↓
Rough Shapes
   ↓
Car-Like Shapes
   ↓
Ferrari
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model performs the same skill it learned during training.&lt;/p&gt;

&lt;p&gt;The only difference is where the noisy image came from.&lt;/p&gt;

&lt;p&gt;During training, the noise came from a real photograph.&lt;/p&gt;

&lt;p&gt;During inference, it comes from randomness itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does the Noise Secretly Contain the Ferrari?
&lt;/h2&gt;

&lt;p&gt;This is one of the most common misconceptions.&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;The starting noise doesn't secretly contain a hidden Ferrari.&lt;/p&gt;

&lt;p&gt;It is genuinely random.&lt;/p&gt;

&lt;p&gt;However, the noise acts as a seed.&lt;/p&gt;

&lt;p&gt;Consider two different random starting points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Noise A → Ferrari A
Noise B → Ferrari B
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same prompt.&lt;/p&gt;

&lt;p&gt;Different image.&lt;/p&gt;

&lt;p&gt;Different camera angle.&lt;/p&gt;

&lt;p&gt;Different lighting.&lt;/p&gt;

&lt;p&gt;Different details.&lt;/p&gt;

&lt;p&gt;The prompt answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What should this image become?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The random seed answers:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which version of that thing?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In practice, both matter.&lt;/p&gt;

&lt;p&gt;The prompt provides direction.&lt;/p&gt;

&lt;p&gt;The noise provides variation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Do Transformers Enter the Picture?
&lt;/h2&gt;

&lt;p&gt;Many developers assume diffusion models replaced transformers.&lt;/p&gt;

&lt;p&gt;Not exactly.&lt;/p&gt;

&lt;p&gt;In most modern text-to-image systems, they work together.&lt;/p&gt;

&lt;p&gt;A simplified architecture looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt
   ↓
Transformer
   ↓
Text Embedding
   ↓
Diffusion Model
   ↓
Image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The transformer's job is understanding language.&lt;/p&gt;

&lt;p&gt;It learns relationships such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ferrari is a car&lt;/li&gt;
&lt;li&gt;Red modifies Ferrari&lt;/li&gt;
&lt;li&gt;Mountain road describes the scene&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The diffusion model then uses that understanding while denoising.&lt;/p&gt;

&lt;p&gt;At every step, the image generation process is guided by the prompt representation produced by the transformer.&lt;/p&gt;

&lt;p&gt;One model understands meaning.&lt;/p&gt;

&lt;p&gt;The other turns that meaning into pixels.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unusual Origin Story: A Physics Idea That Changed AI
&lt;/h2&gt;

&lt;p&gt;What makes diffusion particularly interesting is where it came from.&lt;/p&gt;

&lt;p&gt;Jascha Sohl-Dickstein's 2015 paper wasn't framed primarily as a computer vision breakthrough.&lt;/p&gt;

&lt;p&gt;It drew heavily from ideas in statistical physics and nonequilibrium thermodynamics.&lt;/p&gt;

&lt;p&gt;The original insight was not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do we draw images?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It was closer to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do complex probability distributions evolve into simple ones?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can we learn the reverse transformation?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift in perspective is what makes the idea feel so elegant.&lt;/p&gt;

&lt;p&gt;Many breakthroughs happen when someone enters a field carrying mental models from another discipline.&lt;/p&gt;

&lt;p&gt;Diffusion models are a great example.&lt;/p&gt;

&lt;p&gt;They treat image generation not as drawing, but as reversing a physical process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The next time an image model generates a stunning scene from a short prompt, it's worth remembering what happened under the hood.&lt;/p&gt;

&lt;p&gt;The system didn't start with a sketch.&lt;/p&gt;

&lt;p&gt;It didn't search a database for the closest image.&lt;/p&gt;

&lt;p&gt;It started with randomness.&lt;/p&gt;

&lt;p&gt;Then, hundreds of times in succession, it made tiny corrections guided by your prompt.&lt;/p&gt;

&lt;p&gt;A Ferrari emerged from static.&lt;/p&gt;

&lt;p&gt;A castle emerged from noise.&lt;/p&gt;

&lt;p&gt;Meaning emerged from chaos.&lt;/p&gt;

&lt;p&gt;And perhaps that's the most surprising part of all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you had been working on image generation in 2014, would you have tried to teach a model how to draw images—or would it ever have occurred to you to teach it how to remove noise instead?&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Building Truly Cross-Platform Claude Code Hooks with Go, Bash, PowerShell, WSL, and Git-Bash</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Sun, 31 May 2026 16:57:59 +0000</pubDate>
      <link>https://dev.to/shrsv/building-truly-cross-platform-claude-code-hooks-with-go-bash-powershell-wsl-and-git-bash-1ceo</link>
      <guid>https://dev.to/shrsv/building-truly-cross-platform-claude-code-hooks-with-go-bash-powershell-wsl-and-git-bash-1ceo</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Claude Code hooks are powerful. They let you intercept tool execution, enforce policies, run validations, collect telemetry, or integrate external systems before and after Claude performs actions.&lt;/p&gt;

&lt;p&gt;Unfortunately, the moment you try to distribute hooks to real developers, you run into a problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Some developers use Linux&lt;/li&gt;
&lt;li&gt;Some use macOS&lt;/li&gt;
&lt;li&gt;Some use Windows PowerShell&lt;/li&gt;
&lt;li&gt;Some use Git-Bash&lt;/li&gt;
&lt;li&gt;Some use WSL&lt;/li&gt;
&lt;li&gt;Some use combinations of all of the above&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple shell script quickly turns into a compatibility nightmare.&lt;/p&gt;

&lt;p&gt;After experimenting with several approaches, I arrived at a surprisingly effective pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use thin platform-specific wrappers whose only job is downloading and launching a Go binary. Put all real logic inside the Go executable.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This gives you the convenience of native hooks while keeping the implementation portable, testable, and maintainable.&lt;/p&gt;

&lt;p&gt;Let's walk through the architecture.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Cross-Platform Hook Problem
&lt;/h1&gt;

&lt;p&gt;Suppose you build a hook that validates commands before Claude executes them.&lt;/p&gt;

&lt;p&gt;The naive implementation might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;

python validate.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Looks fine until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python isn't installed&lt;/li&gt;
&lt;li&gt;The user runs PowerShell&lt;/li&gt;
&lt;li&gt;The user runs Git-Bash&lt;/li&gt;
&lt;li&gt;The user runs WSL&lt;/li&gt;
&lt;li&gt;Path handling differs&lt;/li&gt;
&lt;li&gt;Quoting rules differ&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now you have:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;validate.sh
validate.ps1
validate.py
requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;And eventually:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;works-on-my-machine/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The problem isn't Claude.&lt;/p&gt;

&lt;p&gt;The problem is that shells are operating-system specific.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Better Architecture
&lt;/h1&gt;

&lt;p&gt;Instead, think of the hook as a bootstrapper.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Claude Hook
     |
     v
Thin Wrapper
     |
     v
Download Go Binary (if needed)
     |
     v
Execute Go Binary
     |
     v
Actual Hook Logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The wrapper becomes extremely small.&lt;/p&gt;

&lt;p&gt;The Go executable contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Policy checks&lt;/li&gt;
&lt;li&gt;Configuration loading&lt;/li&gt;
&lt;li&gt;JSON parsing&lt;/li&gt;
&lt;li&gt;API calls&lt;/li&gt;
&lt;li&gt;Logging&lt;/li&gt;
&lt;li&gt;Cross-platform filesystem access&lt;/li&gt;
&lt;li&gt;Everything else&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the binary exists locally, future hook invocations bypass installation entirely.&lt;/p&gt;
&lt;h1&gt;
  
  
  Bootstrapping on First Run
&lt;/h1&gt;

&lt;p&gt;The wrapper checks whether the executable exists.&lt;/p&gt;

&lt;p&gt;If not:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detect platform&lt;/li&gt;
&lt;li&gt;Download correct binary&lt;/li&gt;
&lt;li&gt;Make executable if needed&lt;/li&gt;
&lt;li&gt;Run binary&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example release layout:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;releases/
├── hook-linux-amd64
├── hook-linux-arm64
├── hook-darwin-amd64
├── hook-darwin-arm64
├── hook-windows-amd64.exe
└── hook-windows-arm64.exe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A GitHub Releases page works perfectly for hosting.&lt;/p&gt;

&lt;p&gt;Example Bash wrapper:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nv"&gt;HOOK_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.claude-hooks"&lt;/span&gt;
&lt;span class="nv"&gt;BIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOOK_DIR&lt;/span&gt;&lt;span class="s2"&gt;/hook"&lt;/span&gt;

&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOOK_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;curl &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      https://example.com/hook-linux-amd64 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

    &lt;span class="nb"&gt;chmod&lt;/span&gt; +x &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BIN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$@&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The wrapper is tiny and almost never changes.&lt;/p&gt;
&lt;h1&gt;
  
  
  Supporting PowerShell
&lt;/h1&gt;

&lt;p&gt;Windows users deserve first-class support.&lt;/p&gt;

&lt;p&gt;PowerShell wrapper:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$HookDir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;USERPROFILE&lt;/span&gt;&lt;span class="s2"&gt;\.claude-hooks"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nv"&gt;$Binary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HookDir&lt;/span&gt;&lt;span class="s2"&gt;\hook.exe"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="n"&gt;New-Item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-ItemType&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Directory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Force&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$HookDir&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Out-Null&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="kr"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Test-Path&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$Binary&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;

    &lt;/span&gt;&lt;span class="n"&gt;Invoke-WebRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;-Uri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/hook-windows-amd64.exe"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;`
&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;-OutFile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$Binary&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$Binary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="bp"&gt;$args&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="kr"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$LASTEXITCODE&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The important detail is that PowerShell's quoting rules differ significantly from Bash.&lt;/p&gt;

&lt;p&gt;By moving all logic into Go, you avoid maintaining duplicate implementations.&lt;/p&gt;
&lt;h1&gt;
  
  
  Handling WSL and Git-Bash
&lt;/h1&gt;

&lt;p&gt;This is where things become interesting.&lt;/p&gt;

&lt;p&gt;Many Windows developers don't actually run PowerShell.&lt;/p&gt;

&lt;p&gt;They run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WSL Ubuntu&lt;/li&gt;
&lt;li&gt;Git-Bash&lt;/li&gt;
&lt;li&gt;MSYS2&lt;/li&gt;
&lt;li&gt;Cygwin&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each environment reports itself differently.&lt;/p&gt;

&lt;p&gt;A good Go bootstrapper can detect them.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;detectEnvironment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOOS&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;"windows"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"native"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"WSL_DISTRO_NAME"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"wsl"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"MSYSTEM"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"git-bash"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"powershell"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;You can then adjust behavior.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;detectEnvironment&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"wsl"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// Linux paths&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"git-bash"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// Mixed Windows/POSIX paths&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"powershell"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// Native Windows paths&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This is dramatically easier than maintaining separate shell implementations.&lt;/p&gt;
&lt;h1&gt;
  
  
  Cross-Platform Techniques in Go
&lt;/h1&gt;

&lt;p&gt;The Go standard library already solves most portability issues.&lt;/p&gt;
&lt;h2&gt;
  
  
  Use filepath
&lt;/h2&gt;

&lt;p&gt;Avoid hardcoded separators.&lt;/p&gt;

&lt;p&gt;Bad:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;home&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"/config/settings.json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Good:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"config"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s"&gt;"settings.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Use os.UserHomeDir
&lt;/h2&gt;

&lt;p&gt;Avoid platform assumptions.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;home&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UserHomeDir&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Works on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux&lt;/li&gt;
&lt;li&gt;macOS&lt;/li&gt;
&lt;li&gt;Windows&lt;/li&gt;
&lt;li&gt;WSL&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Use os.Executable
&lt;/h2&gt;

&lt;p&gt;Finding your own binary location:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;exe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Executable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Useful when loading bundled resources.&lt;/p&gt;
&lt;h2&gt;
  
  
  Detect Operating System
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOOS&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"windows"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// Windows&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"linux"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// Linux&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="s"&gt;"darwin"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="c"&gt;// macOS&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  Detect Architecture
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Possible values:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amd64
arm64
386
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Useful for selecting downloads.&lt;/p&gt;
&lt;h1&gt;
  
  
  Full Example Downloader
&lt;/h1&gt;

&lt;p&gt;A minimal self-updating launcher:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"io"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;download&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dest&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Copy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Combined with:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOOS&lt;/span&gt;
&lt;span class="n"&gt;runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GOARCH&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;you can dynamically fetch the correct binary.&lt;/p&gt;
&lt;h1&gt;
  
  
  Why This Pattern Scales Better
&lt;/h1&gt;

&lt;p&gt;The biggest benefit isn't portability.&lt;/p&gt;

&lt;p&gt;It's maintainability.&lt;/p&gt;

&lt;p&gt;Without this pattern:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hook.sh
hook.ps1
hook.py
hook.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;With this pattern:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hook.sh      (tiny)
hook.ps1     (tiny)

hook-go/
    all logic
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The wrappers rarely change.&lt;/p&gt;

&lt;p&gt;The Go binary evolves independently.&lt;/p&gt;

&lt;p&gt;Testing becomes easier.&lt;/p&gt;

&lt;p&gt;Distribution becomes easier.&lt;/p&gt;

&lt;p&gt;Versioning becomes easier.&lt;/p&gt;

&lt;p&gt;And most importantly, you stop fighting shell differences.&lt;/p&gt;
&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;Many engineering teams start by writing Claude hooks as shell scripts because it feels fast.&lt;/p&gt;

&lt;p&gt;That works for one machine.&lt;/p&gt;

&lt;p&gt;The moment multiple operating systems enter the picture, the complexity grows rapidly.&lt;/p&gt;

&lt;p&gt;A small bootstrap wrapper plus a Go executable gives you a surprisingly robust deployment model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bash support&lt;/li&gt;
&lt;li&gt;PowerShell support&lt;/li&gt;
&lt;li&gt;Linux support&lt;/li&gt;
&lt;li&gt;macOS support&lt;/li&gt;
&lt;li&gt;Windows support&lt;/li&gt;
&lt;li&gt;WSL support&lt;/li&gt;
&lt;li&gt;Git-Bash support&lt;/li&gt;
&lt;li&gt;Single implementation of business logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shell becomes a launcher.&lt;/p&gt;

&lt;p&gt;Go becomes the platform.&lt;/p&gt;

&lt;p&gt;That's usually the point where hook maintenance stops being a headache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are you handling cross-platform automation today—shell scripts, Node.js, Python, or compiled binaries? I'd be interested to hear which approach has held up best as your team and environments grew.&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml" rel="noopener noreferrer"&gt;&lt;img alt="confidence.yml" title="confidence.yml: Minimum confidence workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml" rel="noopener noreferrer"&gt;&lt;img alt="status-doc-link-check.yml" title="status-doc-link-check.yml: Status document integrity workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Build a&lt;/strong&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Lean4 Might Be the Missing Piece in AI: Why Theorem Provers Are Suddenly Everywhere</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Sat, 30 May 2026 17:23:37 +0000</pubDate>
      <link>https://dev.to/shrsv/lean4-might-be-the-missing-piece-in-ai-why-theorem-provers-are-suddenly-everywhere-3b7l</link>
      <guid>https://dev.to/shrsv/lean4-might-be-the-missing-piece-in-ai-why-theorem-provers-are-suddenly-everywhere-3b7l</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Most discussions about AI focus on larger models, larger datasets, and larger GPUs.&lt;/p&gt;

&lt;p&gt;But there is an uncomfortable reality that every engineer building production AI systems eventually runs into:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LLMs can produce convincing answers, but they cannot guarantee correctness.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Ask an LLM to write code, reason about a distributed system, derive a mathematical formula, or analyze a security protocol. The result might be brilliant. It might also be subtly wrong.&lt;/p&gt;

&lt;p&gt;The problem isn't intelligence.&lt;/p&gt;

&lt;p&gt;The problem is verification.&lt;/p&gt;

&lt;p&gt;That is why a relatively obscure technology from the world of formal methods is suddenly attracting attention:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lean4.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A theorem prover originally designed for mathematicians is increasingly being viewed as a way to build AI systems that can not only generate answers, but actually prove that those answers are correct.&lt;/p&gt;

&lt;p&gt;Let's look at what Lean4 is, how it works, and why some researchers believe theorem provers may become a critical layer in future AI systems.&lt;/p&gt;

&lt;h1&gt;
  
  
  The Fundamental Problem: LLMs Don't Know What's True
&lt;/h1&gt;

&lt;p&gt;Large language models operate by predicting likely sequences of tokens.&lt;/p&gt;

&lt;p&gt;That sounds obvious, but the implications are important.&lt;/p&gt;

&lt;p&gt;When ChatGPT generates a response, it isn't checking whether a statement is true.&lt;/p&gt;

&lt;p&gt;It is generating text that statistically resembles text associated with the prompt.&lt;/p&gt;

&lt;p&gt;Consider a simple coding example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
               &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Looks reasonable.&lt;/p&gt;

&lt;p&gt;But there is a subtle bug.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;is sorted, yet the function returns &lt;code&gt;False&lt;/code&gt; because it uses &lt;code&gt;&amp;lt;&lt;/code&gt; instead of &lt;code&gt;&amp;lt;=&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Many tests might pass.&lt;/p&gt;

&lt;p&gt;A code reviewer might miss it.&lt;/p&gt;

&lt;p&gt;An LLM might confidently explain why the implementation is correct.&lt;/p&gt;

&lt;p&gt;None of these establish correctness.&lt;/p&gt;

&lt;p&gt;Testing can show the presence of bugs.&lt;/p&gt;

&lt;p&gt;It cannot prove the absence of bugs.&lt;/p&gt;

&lt;p&gt;That distinction is what theorem proving is about.&lt;/p&gt;
&lt;h1&gt;
  
  
  What Exactly Is Lean4?
&lt;/h1&gt;

&lt;p&gt;Lean4 is two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A programming language&lt;/li&gt;
&lt;li&gt;A theorem prover&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The theorem prover part is the interesting piece.&lt;/p&gt;

&lt;p&gt;Instead of writing code and then testing it, you describe properties that must always hold.&lt;/p&gt;

&lt;p&gt;Lean then requires a mathematical proof that those properties are true.&lt;/p&gt;

&lt;p&gt;For example, consider a simple theorem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For every natural number n, n + 0 = n&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In Lean this becomes something that must be formally proven.&lt;/p&gt;

&lt;p&gt;The system does not accept hand-wavy reasoning.&lt;/p&gt;

&lt;p&gt;Every logical step must be justified.&lt;/p&gt;

&lt;p&gt;If any step is invalid, the proof fails.&lt;/p&gt;

&lt;p&gt;This is fundamentally different from traditional software validation.&lt;/p&gt;

&lt;p&gt;Traditional testing:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input A -&amp;gt; Pass
Input B -&amp;gt; Pass
Input C -&amp;gt; Pass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Formal proof:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For all valid inputs:
    Property P always holds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The theorem checker verifies the proof mechanically.&lt;/p&gt;

&lt;p&gt;No intuition.&lt;/p&gt;

&lt;p&gt;No assumptions.&lt;/p&gt;

&lt;p&gt;No trust.&lt;/p&gt;

&lt;p&gt;Only proof.&lt;/p&gt;
&lt;h1&gt;
  
  
  Why Lean Feels Different From Traditional Formal Methods
&lt;/h1&gt;

&lt;p&gt;Formal verification has existed for decades.&lt;/p&gt;

&lt;p&gt;Historically it suffered from two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tools were difficult to use&lt;/li&gt;
&lt;li&gt;Formalization was extremely expensive&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Lean changes the equation in several ways.&lt;/p&gt;

&lt;p&gt;First, it is designed as a practical programming language.&lt;/p&gt;

&lt;p&gt;Second, it has a large ecosystem called Mathlib containing thousands of formally verified definitions and theorems.&lt;/p&gt;

&lt;p&gt;Instead of proving everything from scratch, developers can build on existing verified foundations.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Natural numbers
Integers
Groups
Rings
Calculus
Probability
Linear algebra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Much of this already exists inside the ecosystem.&lt;/p&gt;

&lt;p&gt;This makes Lean feel closer to software engineering than traditional theorem proving systems.&lt;/p&gt;

&lt;p&gt;You are often composing verified building blocks rather than creating everything from first principles.&lt;/p&gt;
&lt;h1&gt;
  
  
  The AI + Lean Workflow Is What Makes This Interesting
&lt;/h1&gt;

&lt;p&gt;The most exciting development is not Lean itself.&lt;/p&gt;

&lt;p&gt;It's the combination of Lean and LLMs.&lt;/p&gt;

&lt;p&gt;Think about the typical AI workflow today:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt
    ↓
LLM
    ↓
Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Now compare that with an emerging architecture:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Prompt
    ↓
LLM
    ↓
Candidate Solution
    ↓
Lean
    ↓
Verification
    ↓
Accepted / Rejected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The LLM becomes a generator.&lt;/p&gt;

&lt;p&gt;Lean becomes a verifier.&lt;/p&gt;

&lt;p&gt;This separation is powerful.&lt;/p&gt;

&lt;p&gt;Humans already work this way.&lt;/p&gt;

&lt;p&gt;A mathematician may invent a proof.&lt;/p&gt;

&lt;p&gt;A journal referee verifies it.&lt;/p&gt;

&lt;p&gt;An engineer may write code.&lt;/p&gt;

&lt;p&gt;Tests verify it.&lt;/p&gt;

&lt;p&gt;An architect proposes a design.&lt;/p&gt;

&lt;p&gt;Structural calculations verify it.&lt;/p&gt;

&lt;p&gt;The same pattern can apply to AI systems.&lt;/p&gt;

&lt;p&gt;Generation and verification become separate concerns.&lt;/p&gt;
&lt;h1&gt;
  
  
  A Concrete Example: Finding Bugs Automatically
&lt;/h1&gt;

&lt;p&gt;Imagine an LLM generating a sorting algorithm.&lt;/p&gt;

&lt;p&gt;The desired property is:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;For any list L:

sort(L) returns:
    1. A permutation of L
    2. Elements in non-decreasing order
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;An LLM might generate:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;At first glance it appears to work.&lt;/p&gt;

&lt;p&gt;But duplicates disappear.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;becomes:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The algorithm violates the permutation property.&lt;/p&gt;

&lt;p&gt;A theorem prover can catch this immediately.&lt;/p&gt;

&lt;p&gt;The interesting part is that verification is not based on finding a counterexample through testing.&lt;/p&gt;

&lt;p&gt;The proof obligation itself fails.&lt;/p&gt;

&lt;p&gt;The algorithm cannot be proven correct.&lt;/p&gt;

&lt;p&gt;This is fundamentally stronger than conventional testing approaches.&lt;/p&gt;
&lt;h1&gt;
  
  
  Why This Matters Beyond Mathematics
&lt;/h1&gt;

&lt;p&gt;Many people hear "theorem prover" and assume this is only useful for mathematicians.&lt;/p&gt;

&lt;p&gt;That is increasingly false.&lt;/p&gt;

&lt;p&gt;Formal verification is already used in areas such as:&lt;/p&gt;
&lt;h3&gt;
  
  
  Compilers
&lt;/h3&gt;

&lt;p&gt;The famous CompCert compiler demonstrates that compiler correctness can be formally proven.&lt;/p&gt;
&lt;h3&gt;
  
  
  Cryptography
&lt;/h3&gt;

&lt;p&gt;Security protocols often rely on formal proofs.&lt;/p&gt;

&lt;p&gt;A tiny mistake can compromise billions of dollars.&lt;/p&gt;
&lt;h3&gt;
  
  
  Aerospace
&lt;/h3&gt;

&lt;p&gt;Flight control systems require exceptionally high confidence.&lt;/p&gt;
&lt;h3&gt;
  
  
  Finance
&lt;/h3&gt;

&lt;p&gt;Smart contracts and trading infrastructure can benefit from machine-checked guarantees.&lt;/p&gt;
&lt;h3&gt;
  
  
  AI Agents
&lt;/h3&gt;

&lt;p&gt;Agents increasingly perform actions instead of merely generating text.&lt;/p&gt;

&lt;p&gt;As autonomy increases, verification becomes more valuable.&lt;/p&gt;

&lt;p&gt;The more expensive a mistake becomes, the more attractive formal guarantees become.&lt;/p&gt;
&lt;h1&gt;
  
  
  The Bigger Picture: Probabilistic Intelligence + Deterministic Verification
&lt;/h1&gt;

&lt;p&gt;There is a tendency to think of theorem provers and LLMs as competing technologies.&lt;/p&gt;

&lt;p&gt;They're not.&lt;/p&gt;

&lt;p&gt;In many ways they complement each other.&lt;/p&gt;

&lt;p&gt;LLMs are excellent at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search&lt;/li&gt;
&lt;li&gt;Exploration&lt;/li&gt;
&lt;li&gt;Creativity&lt;/li&gt;
&lt;li&gt;Pattern matching&lt;/li&gt;
&lt;li&gt;Generating candidate solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Theorem provers are excellent at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verification&lt;/li&gt;
&lt;li&gt;Correctness&lt;/li&gt;
&lt;li&gt;Logical consistency&lt;/li&gt;
&lt;li&gt;Mathematical guarantees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One generates.&lt;/p&gt;

&lt;p&gt;The other validates.&lt;/p&gt;

&lt;p&gt;A useful analogy is software development itself.&lt;/p&gt;

&lt;p&gt;We don't replace programmers with compilers.&lt;/p&gt;

&lt;p&gt;We use compilers to verify what programmers produce.&lt;/p&gt;

&lt;p&gt;Future AI systems may look similar:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM = Generator

Theorem Prover = Verifier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The combination is potentially far more powerful than either component alone.&lt;/p&gt;
&lt;h1&gt;
  
  
  Final Thoughts
&lt;/h1&gt;

&lt;p&gt;For years the AI industry has largely optimized for capability.&lt;/p&gt;

&lt;p&gt;Can the model write code?&lt;/p&gt;

&lt;p&gt;Can it solve math problems?&lt;/p&gt;

&lt;p&gt;Can it reason?&lt;/p&gt;

&lt;p&gt;Those are important questions.&lt;/p&gt;

&lt;p&gt;But another question is becoming increasingly important:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do we know the answer is actually correct?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Theorem provers such as Lean4 offer one possible answer.&lt;/p&gt;

&lt;p&gt;They provide a mechanism for transforming "the model thinks this is right" into "this has been formally verified."&lt;/p&gt;

&lt;p&gt;Whether Lean itself becomes dominant remains to be seen.&lt;/p&gt;

&lt;p&gt;But the broader idea—combining probabilistic generation with formal verification—feels less like a niche research direction and more like a plausible next step in the evolution of AI systems.&lt;/p&gt;

&lt;p&gt;What do you think?&lt;/p&gt;

&lt;p&gt;Will theorem provers become a standard component of future AI stacks, or will they remain specialized tools used only in high-assurance domains?&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml" rel="noopener noreferrer"&gt;&lt;img alt="confidence.yml" title="confidence.yml: Minimum confidence workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml" rel="noopener noreferrer"&gt;&lt;img alt="status-doc-link-check.yml" title="status-doc-link-check.yml: Status document integrity workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Build a&lt;/strong&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I Pointed Chrome's Prompt API at a 1.25 Million Character Memoir, and It Got Interesting Fast</title>
      <dc:creator>Shrijith Venkatramana</dc:creator>
      <pubDate>Fri, 29 May 2026 18:36:12 +0000</pubDate>
      <link>https://dev.to/shrsv/i-pointed-chromes-prompt-api-at-a-125-million-character-memoir-and-it-got-interesting-fast-2069</link>
      <guid>https://dev.to/shrsv/i-pointed-chromes-prompt-api-at-a-125-million-character-memoir-and-it-got-interesting-fast-2069</guid>
      <description>&lt;p&gt;&lt;em&gt;Hello, I'm Shrijith Venkatramana. I'm building git-lrc, an AI code reviewer that runs on every commit. &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;Star Us&lt;/a&gt; to help devs discover the project. Do give it a try and share your feedback for improving the product.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;A straightforward engineering question: what happens when you feed a long book to an on-device language model in Chrome and start adjusting the parameters?&lt;/p&gt;

&lt;p&gt;To explore this, I built a small experiment called &lt;strong&gt;Gemini Nano Book Lab&lt;/strong&gt;: a Chrome extension sidepanel that uses Chrome’s built-in &lt;strong&gt;Prompt API&lt;/strong&gt; to answer questions about Richard Wagner’s &lt;em&gt;My Life&lt;/em&gt;, while also exposing some of the underlying mechanics.&lt;/p&gt;

&lt;p&gt;The response is only part of it. The experiment also captures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model download behavior
&lt;/li&gt;
&lt;li&gt;Retrieval cost
&lt;/li&gt;
&lt;li&gt;Time to first token
&lt;/li&gt;
&lt;li&gt;Context window pressure
&lt;/li&gt;
&lt;li&gt;Effects of different chunking strategies
&lt;/li&gt;
&lt;li&gt;Places where the API works well, and where its limits become obvious
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re an engineer interested in systems that have rough edges—and therefore teach you something—this is a useful area to explore.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Prompt API Is
&lt;/h2&gt;

&lt;p&gt;Chrome’s Prompt API is part of the browser’s built-in AI features. Instead of sending prompts to a cloud endpoint, a web app or extension can request an on-device language model session and prompt it locally.&lt;/p&gt;

&lt;p&gt;Resources:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://developer.chrome.com/docs/ai/prompt-api" rel="noopener noreferrer"&gt;The Prompt API&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.chrome.com/docs/ai/session-management" rel="noopener noreferrer"&gt;Session management best practices&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.chrome.com/docs/ai/structured-output-for-prompt-api" rel="noopener noreferrer"&gt;Structured output for the Prompt API&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.chrome.com/docs/ai/understand-built-in-model-management" rel="noopener noreferrer"&gt;Built-in model management in Chrome&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.chrome.com/docs/ai/debug-gemini-nano" rel="noopener noreferrer"&gt;Debug Gemini Nano&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Core capabilities:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local inference
&lt;/li&gt;
&lt;li&gt;Streaming results
&lt;/li&gt;
&lt;li&gt;Availability check before session creation
&lt;/li&gt;
&lt;li&gt;Context usage measurement
&lt;/li&gt;
&lt;li&gt;Events like &lt;code&gt;contextoverflow&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;(In some environments) sampling parameters like temperature and top-k
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it more than a simple text box—it becomes an environment for experimentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Long Book?
&lt;/h2&gt;

&lt;p&gt;Long inputs expose the interesting problems. Short prompts hide a lot; a paragraph‑long demo can make any model look magical. A long corpus forces concrete decisions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What chunk size works well?
&lt;/li&gt;
&lt;li&gt;Should chunks overlap?
&lt;/li&gt;
&lt;li&gt;How many chunks should you retrieve?
&lt;/li&gt;
&lt;li&gt;What latency comes from retrieval vs. prompting?
&lt;/li&gt;
&lt;li&gt;How much of the context window is spent just staging evidence?
&lt;/li&gt;
&lt;li&gt;When does naive retrieval return technically relevant but semantically weak passages?
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the first version, I used Project Gutenberg’s plain text of Richard Wagner’s &lt;em&gt;My Life&lt;/em&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.gutenberg.org/cache/epub/5197/pg5197.txt" rel="noopener noreferrer"&gt;Project Gutenberg source text&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gave a corpus of about &lt;strong&gt;219,572 words&lt;/strong&gt; and &lt;strong&gt;1,251,663 characters&lt;/strong&gt; in the run shown below.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The demo is a &lt;strong&gt;Chrome extension sidepanel&lt;/strong&gt; rather than a normal web app. This was a deliberate choice. Extensions provide a more reliable built‑in AI surface in Chrome, and they allow a compact benchmark UI where controls, streamed output, and telemetry live side by side.&lt;/p&gt;

&lt;p&gt;The extension has three tasks:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Load and normalize the Wagner text.
&lt;/li&gt;
&lt;li&gt;Chunk it and retrieve relevant excerpts for a user question.
&lt;/li&gt;
&lt;li&gt;Create a Prompt API session, run the prompt, and record timings and context usage.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fald1lcx2p1iu6nxigs5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fald1lcx2p1iu6nxigs5h.png" alt="Gemini Nano Book Lab full view" width="799" height="580"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Design
&lt;/h2&gt;

&lt;p&gt;The benchmark starts simple. I didn’t begin with embeddings, vector databases, or sophisticated semantic retrieval. I wanted a baseline that is easy to reason about.&lt;/p&gt;

&lt;p&gt;The first‑version controls are:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;question text
&lt;/li&gt;
&lt;li&gt;chunk size (characters)
&lt;/li&gt;
&lt;li&gt;chunk overlap (characters)
&lt;/li&gt;
&lt;li&gt;number of retrieved chunks
&lt;/li&gt;
&lt;li&gt;streaming on/off
&lt;/li&gt;
&lt;li&gt;temperature and top‑k (when exposed)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This provides enough surface to see the tradeoffs without making the experiment too complex.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt API Availability and Session Creation
&lt;/h2&gt;

&lt;p&gt;The first question isn’t “What should I prompt?” but “Is the model available here?”&lt;/p&gt;

&lt;p&gt;Here’s the availability and session setup wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getPromptApi&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;PromptApi&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;maybePromptApi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;LanguageModel&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;PromptApi&lt;/span&gt;
    &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;LanguageModel&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;maybePromptApi&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;inspectPromptApi&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PromptApiCapabilities&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;promptApi&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getPromptApi&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;promptApi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;supported&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;availability&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unavailable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;statusMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LanguageModel is unavailable in this browser context. Use a recent Chrome build with the Prompt API enabled.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;defaultTemperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;maxTemperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;defaultTopK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;maxTopK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;availability&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;promptApi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;availability&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;expectedInputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="na"&gt;expectedOutputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;languages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;supported&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;availability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;statusMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nx"&gt;availability&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;available&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
                &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Prompt API ready.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
                &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Model can be downloaded or is unavailable on this device.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultTemperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;maxTemperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;defaultTopK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;maxTopK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This may not look exciting, but it matters. One early lesson with built‑in AI is that &lt;strong&gt;availability is part of your product surface&lt;/strong&gt;. Hardware support, model download state, and browser support determine whether your app works at all.&lt;/p&gt;
&lt;h2&gt;
  
  
  Chunking a Large Corpus
&lt;/h2&gt;

&lt;p&gt;After loading the book, I split it into overlapping chunks. The code tries to respect paragraph and sentence boundaries rather than slicing blindly at exactly &lt;code&gt;N&lt;/code&gt; characters.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;CorpusChunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeChunkSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeOverlap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;clampOverlap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;safeChunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;overlap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CorpusChunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;startOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startOffset&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;desiredEnd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;startOffset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;safeChunkSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;endOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
            &lt;span class="nx"&gt;desiredEnd&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
                &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;
                &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;findBoundary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;startOffset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;desiredEnd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;textSlice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startOffset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;endOffset&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;textSlice&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;index&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`chunk-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;padStart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;textSlice&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;startOffset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;endOffset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;endOffset&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nx"&gt;startOffset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;endOffset&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;safeOverlap&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;startOffset&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This decision changes the system’s behavior. Small chunks improve precision but can break context apart. Large chunks preserve narrative structure but use more context budget. Overlap helps with boundaries but increases repeated text and token pressure. Engineering often comes down to choosing which kind of trade‑off you can accept.&lt;/p&gt;
&lt;h2&gt;
  
  
  Cheap Retrieval on Purpose
&lt;/h2&gt;

&lt;p&gt;The first retriever is lexical, not semantic. That keeps the failure modes visible. If retrieval is too smart too early, you skip an educational stage.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rankChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;CorpusChunk&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;maxChunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;RankedChunk&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryTokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;matchedTerms&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;scoreChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;queryTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nx"&gt;matchedTerms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;left&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;right&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;right&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;left&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;maxChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This retriever scores term overlap between the question and chunk text. It is fast, explainable, and flawed—exactly what I wanted for a baseline.&lt;/p&gt;
&lt;h2&gt;
  
  
  Measuring More Than the Final Answer
&lt;/h2&gt;

&lt;p&gt;The benchmark records more than whether the model answered correctly. It measures:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;corpus load time
&lt;/li&gt;
&lt;li&gt;chunking time
&lt;/li&gt;
&lt;li&gt;retrieval time
&lt;/li&gt;
&lt;li&gt;session setup time
&lt;/li&gt;
&lt;li&gt;context measurement time
&lt;/li&gt;
&lt;li&gt;prompt time
&lt;/li&gt;
&lt;li&gt;time to first streamed chunk
&lt;/li&gt;
&lt;li&gt;context usage before and after the run
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the core flow:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;loadWagnerCorpus&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chunkOverlap&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;selectedChunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;rankChunks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retrievedChunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createPromptSession&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;onDownloadProgress&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;progress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;downloadProgress&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;progress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;estimatedInputUsage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;measureContextUsage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;firstChunkMs&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;executePrompt&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;streaming&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;streaming&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;onChunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onChunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;At this point the demo becomes less a “chatbot” and more an instrument panel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wfhd961ssgv3vgysv24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wfhd961ssgv3vgysv24.png" alt="Telemetry panel close-up" width="419" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpekof2p9t0ui9u888jep.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpekof2p9t0ui9u888jep.png" alt="Retrieved excerpts close-up" width="441" height="877"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What the Results Show
&lt;/h2&gt;

&lt;p&gt;In the run shown in the screenshots, the app reported approximately:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;corpus size: &lt;strong&gt;219,572 words / 1,251,663 characters&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;chunk count: &lt;strong&gt;477&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;retrieval time: &lt;strong&gt;8.7 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;session setup: &lt;strong&gt;7.5 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;prompt time: &lt;strong&gt;17,396 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;first chunk: &lt;strong&gt;7,226 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;total: &lt;strong&gt;32,766 ms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;context usage: &lt;strong&gt;3417 / 9216&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Several observations stand out.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Retrieval is essentially free compared to inference
&lt;/h3&gt;

&lt;p&gt;Lexical retrieval took &lt;strong&gt;8.7 ms&lt;/strong&gt;. That is tiny compared to the &lt;strong&gt;17.4 second&lt;/strong&gt; prompt time. For early‑stage RAG in the browser, this suggests a useful lesson: before over‑optimizing retrieval, understand your inference costs. In this setup, retrieval is not the bottleneck. Prompting is.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Time to first token matters more than people think
&lt;/h3&gt;

&lt;p&gt;The first chunk arrived after about &lt;strong&gt;7.2 seconds&lt;/strong&gt;. That number changes the perceived feel of the product. If the first token arrives quickly, the experience feels responsive. If it takes several seconds, users may wonder if it has hung or if they asked too much. A good benchmark should capture that moment, not just the final duration.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. The context window is generous but not infinite
&lt;/h3&gt;

&lt;p&gt;The run used about &lt;strong&gt;3417&lt;/strong&gt; units of a &lt;strong&gt;9216&lt;/strong&gt; context window. That sounds comfortable, but long‑form exploration can consume budget quickly. If you increase chunk size, overlap, or retrieved chunk count, the window fills with evidence before the model answers. That’s why the demo exposes chunk controls prominently.&lt;/p&gt;
&lt;h3&gt;
  
  
  4. Total time tells a bigger story than prompt time alone
&lt;/h3&gt;

&lt;p&gt;The total was about &lt;strong&gt;32.8 seconds&lt;/strong&gt;—notably higher than prompt time alone. That gap hides real product behavior: corpus loading, chunking, preparation work, model readiness, UI update overhead, and one‑time costs that don’t appear if you only look at &lt;code&gt;prompt()&lt;/code&gt;. For engineers, this is an important shift: users experience the whole pipeline, not just the API call.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Limits Show Up Quickly
&lt;/h2&gt;

&lt;p&gt;The Prompt API is interesting not because it’s limitless, but because its limits are visible and teach you something. Here are the main ones I encountered.&lt;/p&gt;
&lt;h3&gt;
  
  
  Long context is still a budgeting problem
&lt;/h3&gt;

&lt;p&gt;You cannot stuff an entire million‑character book into a prompt. Even when the corpus lives locally, context remains scarce. That pushes you toward retrieval, chunking, and prompt construction strategies sooner than you might expect.&lt;/p&gt;
&lt;h3&gt;
  
  
  Lexical retrieval is fast but semantically clumsy
&lt;/h3&gt;

&lt;p&gt;The retrieved excerpts screenshot shows this clearly. Some selected chunks are relevant to the query “How does Wagner describe his early artistic ambitions?” But some are relevant mostly because they contain overlapping words like “early”, “artistic”, or “ambitions”, not because they are the best narrative evidence. That is a useful failure mode—it shows why better retrieval becomes necessary.&lt;/p&gt;
&lt;h3&gt;
  
  
  Availability is not guaranteed
&lt;/h3&gt;

&lt;p&gt;The Prompt API is not a universal browser primitive yet. It depends on Chrome support, device capability, model management, and the environment. Every serious app needs a plan for unsupported devices, first‑time model download, delayed readiness, and the possibility that the model is unavailable or removed.&lt;/p&gt;
&lt;h3&gt;
  
  
  Streaming helps, but latency is still real
&lt;/h3&gt;

&lt;p&gt;Streaming makes the wait feel more humane after generation starts, but it does not remove the wait before generation starts. A slow first‑token experience remains an issue.&lt;/p&gt;
&lt;h3&gt;
  
  
  Browser AI is not the same as exact system telemetry
&lt;/h3&gt;

&lt;p&gt;In the current version, I can measure prompt timing and context usage cleanly. What I cannot claim cleanly is exact model memory consumption, the way I might with a dedicated server‑side runtime. Some metrics are authoritative; some are approximate. Good benchmarking should label the difference honestly.&lt;/p&gt;
&lt;h2&gt;
  
  
  What I Like About This API
&lt;/h2&gt;

&lt;p&gt;Even with those limits, building on a browser‑native AI surface has clear benefits. You ask the browser what is available. You create a session. You stream output. You inspect context pressure. You see download progress. You can build a real experiment around that.&lt;/p&gt;

&lt;p&gt;For an engineer, that means you can learn about product design, retrieval systems, latency, UI feedback, and model constraints all within one project.&lt;/p&gt;
&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;Obvious and useful extensions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compare lexical retrieval against something semantic.
&lt;/li&gt;
&lt;li&gt;Run the same question across several chunk sizes side by side.
&lt;/li&gt;
&lt;li&gt;Test how structured output changes context usage.
&lt;/li&gt;
&lt;li&gt;Make failure cases first‑class, not hidden.
&lt;/li&gt;
&lt;li&gt;Add exportable benchmark traces so results can be compared over time.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This becomes less about whether the model answered, and more about &lt;strong&gt;why this configuration behaved the way it did&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The Prompt API made me think less about “AI features” and more about systems behavior under constraints. That is why this experiment felt worth building. The model answered a question about Wagner—fine. But the more interesting outcome was watching the browser become a measurable inference environment with its own quirks, bottlenecks, and product tradeoffs.&lt;/p&gt;

&lt;p&gt;If you are early in your engineering journey, this is the kind of project I would recommend: one that looks like a demo from a distance, but up close turns into a lesson about architecture. And that is usually where the real learning starts.&lt;/p&gt;



&lt;p&gt;*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;

&lt;p&gt;git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*&lt;/p&gt;

&lt;p&gt;Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/HexmosTech" rel="noopener noreferrer"&gt;
        HexmosTech
      &lt;/a&gt; / &lt;a href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;
        git-lrc
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Free, Micro AI Code Reviews That Run on Commit
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;p&gt;| &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.da.md" rel="noopener noreferrer"&gt;🇩🇰 Dansk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.es.md" rel="noopener noreferrer"&gt;🇪🇸 Español&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fa.md" rel="noopener noreferrer"&gt;🇮🇷 Farsi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.fi.md" rel="noopener noreferrer"&gt;🇫🇮 Suomi&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ja.md" rel="noopener noreferrer"&gt;🇯🇵 日本語&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.nn.md" rel="noopener noreferrer"&gt;🇳🇴 Norsk&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.pt.md" rel="noopener noreferrer"&gt;🇵🇹 Português&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.ru.md" rel="noopener noreferrer"&gt;🇷🇺 Русский&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.sq.md" rel="noopener noreferrer"&gt;🇦🇱 Shqip&lt;/a&gt; | &lt;a href="https://github.com/HexmosTech/git-lrc/readme/README.zh.md" rel="noopener noreferrer"&gt;🇨🇳 中文&lt;/a&gt; |&lt;/p&gt;
&lt;br&gt;
&lt;br&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;img width="60" alt="git-lrc logo" src="https://camo.githubusercontent.com/948c8f2d5cf41b48985cd364d48c3a2dc9bfbfd42eab3e0a9a1b3e61f5f17ce3/68747470733a2f2f6865786d6f732e636f6d2f66726565646576746f6f6c732f7075626c69632f6c725f6c6f676f2e737667"&gt;&lt;/a&gt;
&lt;br&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;git-lrc&lt;/h1&gt;
&lt;/div&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Free, Micro AI Code Reviews That Run on Commit&lt;/h2&gt;
&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.producthunt.com/products/git-lrc?embed=true&amp;amp;utm_source=badge-top-post-badge&amp;amp;utm_medium=badge&amp;amp;utm_campaign=badge-git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt" width="200" src="https://camo.githubusercontent.com/87bf2d4283c1e0aa99e254bd17fefb1c67c0c0d39300043a243a4aa633b6cecc/68747470733a2f2f6170692e70726f6475637468756e742e636f6d2f776964676574732f656d6265642d696d6167652f76312f746f702d706f73742d62616467652e7376673f706f73745f69643d31303739323632267468656d653d6c6967687426706572696f643d6461696c7926743d31373731373439313730383638"&gt;&lt;/a&gt;
&amp;nbsp;&lt;/p&gt;
&lt;br&gt;
&lt;a href="https://discord.gg/sGdnKwB3qq" rel="nofollow noopener noreferrer"&gt;
  &lt;img alt="Discord Community" src="https://camo.githubusercontent.com/b8f979318aaabc8dec512b9d4e6e2a12431fba3c8a3b8738e1a97a0722d4e4bf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436f6d6d756e6974792d3538363546323f6c6f676f3d646973636f7264266c6162656c436f6c6f723d7768697465"&gt;
&lt;/a&gt; &lt;a href="https://goreportcard.com/report/github.com/HexmosTech/git-lrc" rel="nofollow noopener noreferrer"&gt;&lt;img alt="Go Report Card" src="https://camo.githubusercontent.com/e74c0651c3ee9165a2ed01cb0f6842c494029960df30eb9c24cf622d3d21bf46/68747470733a2f2f676f7265706f7274636172642e636f6d2f62616467652f6769746875622e636f6d2f4865786d6f73546563682f6769742d6c7263"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml" rel="noopener noreferrer"&gt;&lt;img alt="confidence.yml" title="confidence.yml: Minimum confidence workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/confidence.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml" rel="noopener noreferrer"&gt;&lt;img alt="status-doc-link-check.yml" title="status-doc-link-check.yml: Status document integrity workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/status-doc-link-check.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml" rel="noopener noreferrer"&gt;&lt;img alt="gitleaks.yml" title="gitleaks.yml: Secret scanning workflow" src="https://github.com/HexmosTech/git-lrc/actions/workflows/gitleaks.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml" rel="noopener noreferrer"&gt;&lt;img alt="osv-scanner.yml" title="osv-scanner.yml: Dependency vulnerability scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/osv-scanner.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml" rel="noopener noreferrer"&gt;&lt;img alt="govulncheck.yml" title="govulncheck.yml: Go vulnerability check" src="https://github.com/HexmosTech/git-lrc/actions/workflows/govulncheck.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml" rel="noopener noreferrer"&gt;&lt;img alt="semgrep.yml" title="semgrep.yml: Static analysis security scan" src="https://github.com/HexmosTech/git-lrc/actions/workflows/semgrep.yml/badge.svg"&gt;&lt;/a&gt;&amp;nbsp;&lt;a rel="noopener noreferrer" href="https://github.com/HexmosTech/git-lrc/./gfx/dependabot-enabled.svg"&gt;&lt;img alt="dependabot-enabled" title="dependabot-enabled: Automated dependency updates are enabled" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2FHexmosTech%2Fgit-lrc%2FHEAD%2F.%2Fgfx%2Fdependabot-enabled.svg"&gt;&lt;/a&gt;
&lt;/div&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;AI agents write code fast. They also &lt;em&gt;silently remove logic&lt;/em&gt;, change behavior, and introduce bugs -- without telling you. You often find out in production.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;git-lrc&lt;/code&gt; fixes this.&lt;/strong&gt; It hooks into &lt;code&gt;git commit&lt;/code&gt; and reviews every diff &lt;em&gt;before&lt;/em&gt; it lands. 60-second setup. Completely free.&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;See It In Action&lt;/h2&gt;
&lt;/div&gt;
&lt;blockquote&gt;
&lt;p&gt;See git-lrc catch serious security issues such as leaked credentials, expensive cloud
operations, and sensitive material in log statements&lt;/p&gt;
&lt;/blockquote&gt;

  
    
    &lt;span class="m-1"&gt;git-lrc-intro-60s.mp4&lt;/span&gt;
    
  

  

  


&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why&lt;/h2&gt;

&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;🤖 &lt;strong&gt;AI agents silently break things.&lt;/strong&gt; Code removed. Logic changed. Edge cases gone. You won't notice until production.&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Catch it before it ships.&lt;/strong&gt; AI-powered inline comments show you &lt;em&gt;exactly&lt;/em&gt; what changed and what looks wrong.&lt;/li&gt;
&lt;li&gt;🔁 &lt;strong&gt;Build a&lt;/strong&gt;…&lt;/li&gt;
&lt;/ul&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/HexmosTech/git-lrc" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
