<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alberto Nieto</title>
    <description>The latest articles on DEV Community by Alberto Nieto (@albertocodes).</description>
    <link>https://dev.to/albertocodes</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3803172%2Fa2d1083a-bdd7-433f-99b3-7091a36a8cd8.png</url>
      <title>DEV Community: Alberto Nieto</title>
      <link>https://dev.to/albertocodes</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/albertocodes"/>
    <language>en</language>
    <item>
      <title>From one model to seven — what it took to make TurboQuant model-portable</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Wed, 01 Apr 2026 00:42:31 +0000</pubDate>
      <link>https://dev.to/albertocodes/from-one-model-to-seven-what-it-took-to-make-turboquant-model-portable-4fjc</link>
      <guid>https://dev.to/albertocodes/from-one-model-to-seven-what-it-took-to-make-turboquant-model-portable-4fjc</guid>
      <description>&lt;p&gt;A KV cache compression plugin that only works on one model is a demo, not a tool. turboquant-vllm v1.0.0 shipped four days ago with one validated architecture: Molmo2. v1.3.0 validates seven — Llama 3.1, Mistral 7B, Qwen2.5, Phi-3-mini, Phi-4, Gemma-2, and Gemma-3. The path between those two points was more interesting than the destination.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fused paged kernels (v1.2.0).&lt;/strong&gt; The original architecture decompressed KV cache from TQ4 to FP16 in HBM, then ran standard attention on the result. The new fused kernel reads compressed blocks directly from vLLM's page table, decompresses in SRAM, and computes attention in a single pass. HBM traffic: 1,160 → 136 bytes per token.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# One flag. Same as before.
&lt;/span&gt;&lt;span class="n"&gt;vllm&lt;/span&gt; &lt;span class="n"&gt;serve&lt;/span&gt; &lt;span class="n"&gt;meta&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;llama&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Llama&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;3.1&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="n"&gt;B&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;attention&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="n"&gt;CUSTOM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Non-pow2 head dimensions (v1.3.0).&lt;/strong&gt; Triton's &lt;code&gt;tl.arange&lt;/code&gt; requires power-of-two ranges. Phi-3-mini has head_dim=96. Gemma has head_dim=256. All five Triton kernels needed pad-to-next-power-of-two with boundary masking. 23 new tests cover the three new dimension classes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sliding window attention bypass (v1.3.0).&lt;/strong&gt; Gemma-2 and Gemma-3 mix global and sliding window attention layers. Compressing SWA layers breaks cache eviction. The fix: SWA layers bypass compression automatically via the &lt;code&gt;is_sliding&lt;/code&gt; attribute. Global layers compress normally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verify CLI.&lt;/strong&gt; Check any model in thirty seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; turboquant_vllm.verify &lt;span class="nt"&gt;--model&lt;/span&gt; google/gemma-2-2b &lt;span class="nt"&gt;--bits&lt;/span&gt; 4
&lt;span class="c"&gt;# PASS — all layers, cosine 0.9951&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Design
&lt;/h2&gt;

&lt;p&gt;The fused kernel architecture was a prerequisite for everything else. Without it, model expansion would have multiplied a slow path — decompress-to-HBM on every decode step across more models means more wasted bandwidth. Fusing first meant each new model gets the fast path for free.&lt;/p&gt;

&lt;p&gt;The non-pow2 fix was not a config change. It was a kernel rewrite across five files, each with different padding constraints depending on whether the kernel reads keys, values, or both. The ~5–15% throughput penalty for non-pow2 dimensions is real and documented — but for head_dim=128 models (the majority), it's zero.&lt;/p&gt;

&lt;p&gt;The production hotfixes (v1.2.1, v1.2.2) are worth mentioning because they came from container benchmarking, not unit tests. Running TQ4 inside the vLLM container against real video clips surfaced OOM bugs that synthetic tests never would. Both patches landed within 24 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;turboquant-vllm[vllm]&amp;gt;&lt;span class="o"&gt;=&lt;/span&gt;1.3.0
vllm serve meta-llama/Llama-3.1-8B &lt;span class="nt"&gt;--attention-backend&lt;/span&gt; CUSTOM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify compression quality on any supported model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-m&lt;/span&gt; turboquant_vllm.verify &lt;span class="nt"&gt;--model&lt;/span&gt; &amp;lt;model-id&amp;gt; &lt;span class="nt"&gt;--bits&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Validated models:&lt;/strong&gt; Molmo2-4B, Llama 3.1 8B, Mistral 7B, Qwen2.5-3B, Phi-3-mini, Phi-4, Gemma-2-2b, Gemma-3-4B-it. All pass at cosine ≥0.99.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmarks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VLM (Molmo2-4B, FP16 baseline): 3.76x KV compression&lt;/li&gt;
&lt;li&gt;Text-only (Llama 3.1 / Mistral, FP8 baseline): 1.88x KV capacity, lossless at temperature=0&lt;/li&gt;
&lt;li&gt;At 16K context: 6x concurrent requests vs baseline 3x&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Upstream vLLM contribution (&lt;a href="https://github.com/vllm-project/vllm/issues/38171" rel="noopener noreferrer"&gt;vllm#38171&lt;/a&gt; — 49 upvotes)&lt;/li&gt;
&lt;li&gt;Flash Attention kernel fusion for multi-layer correctness&lt;/li&gt;
&lt;li&gt;VL-Cache stacking for multiplicative VLM savings&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pypi.org/project/turboquant-vllm/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; | &lt;a href="https://alberto-codes.github.io/turboquant-vllm/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/turboquant-vllm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>vllm</category>
      <category>gpu</category>
      <category>triton</category>
    </item>
    <item>
      <title>Compressed VLM inference from a single Containerfile — turboquant-vllm v1.1</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Sat, 28 Mar 2026 17:56:54 +0000</pubDate>
      <link>https://dev.to/albertocodes/compressed-vlm-inference-from-a-single-containerfile-turboquant-vllm-v11-35bm</link>
      <guid>https://dev.to/albertocodes/compressed-vlm-inference-from-a-single-containerfile-turboquant-vllm-v11-35bm</guid>
      <description>&lt;p&gt;The hardest part of GPU inference isn't the model — it's the environment. CUDA versions, driver compatibility, pip dependency conflicts. You can have a working quantization plugin and still spend an hour getting it to run on a fresh machine.&lt;/p&gt;

&lt;p&gt;turboquant-vllm v1.1.0 ships a &lt;code&gt;Containerfile&lt;/code&gt; that eliminates that setup. It extends the official vLLM image, installs the TQ4 compression plugin from PyPI, and verifies the plugin entry point at build time — not at runtime when you're debugging a silent fallback to uncompressed attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed in v1.1
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Container support.&lt;/strong&gt; A single &lt;code&gt;Containerfile&lt;/code&gt; bakes turboquant-vllm into the official &lt;code&gt;vllm-openai&lt;/code&gt; image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Alberto-Codes/turboquant-vllm.git
&lt;span class="nb"&gt;cd &lt;/span&gt;turboquant-vllm
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; vllm-turboquant &lt;span class="nt"&gt;-f&lt;/span&gt; infra/Containerfile.vllm &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then serve a vision-language model with compressed KV cache:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;podman run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--device&lt;/span&gt; nvidia.com/gpu&lt;span class="o"&gt;=&lt;/span&gt;all &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--shm-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8g &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  vllm-turboquant &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; allenai/Molmo2-8B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attention-backend&lt;/span&gt; CUSTOM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One flag: &lt;code&gt;--attention-backend CUSTOM&lt;/code&gt;. That's it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation site.&lt;/strong&gt; Auto-generated API reference from docstrings, usage guides for vLLM, HuggingFace, and container deployment — including Quadlet examples for running as a systemd service. Deployed to GitHub Pages on every release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-layer quality tests.&lt;/strong&gt; 12 new cosine similarity tests verify compression fidelity at each of the 36 transformer layers, not just end-to-end output. This catches layer-specific precision degradation that whole-model tests miss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Design
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;Containerfile&lt;/code&gt; is deliberately minimal — 11 lines. It installs from PyPI (not from source) and verifies the plugin entry point at build time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; docker.io/vllm/vllm-openai:v0.18.0&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; TURBOQUANT_VERSION=1.1.0&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="s2"&gt;"turboquant-vllm[vllm]==&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TURBOQUANT_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s2"&gt;import importlib.metadata; &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s2"&gt;eps = [e for e in importlib.metadata.entry_points(group='vllm.general_plugins') &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s2"&gt;       if e.name == 'tq4_backend']; &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s2"&gt;assert len(eps) == 1, 'TQ4 entry point not found'"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build-time verification matters because vLLM's plugin discovery is silent. If the entry point isn't registered, vLLM falls back to its default attention backend without any error. You'd serve uncompressed inference thinking you had 3.76x compression. The &lt;code&gt;assert&lt;/code&gt; in the Containerfile makes that failure loud and early.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;TURBOQUANT_VERSION&lt;/code&gt; build arg means you can pin or upgrade versions without editing the file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;Install from PyPI if you don't need the container:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;turboquant-vllm[vllm]
vllm serve allenai/Molmo2-8B &lt;span class="nt"&gt;--attention-backend&lt;/span&gt; CUSTOM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or build the container for reproducible deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;podman build &lt;span class="nt"&gt;-t&lt;/span&gt; vllm-turboquant &lt;span class="nt"&gt;-f&lt;/span&gt; infra/Containerfile.vllm &lt;span class="nb"&gt;.&lt;/span&gt;
podman run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--device&lt;/span&gt; nvidia.com/gpu&lt;span class="o"&gt;=&lt;/span&gt;all &lt;span class="nt"&gt;-p&lt;/span&gt; 8000:8000 &lt;span class="se"&gt;\&lt;/span&gt;
  vllm-turboquant &lt;span class="nt"&gt;--model&lt;/span&gt; allenai/Molmo2-8B &lt;span class="nt"&gt;--attention-backend&lt;/span&gt; CUSTOM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify compression is active in the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INFO [cuda.py:257] Using AttentionBackendEnum.CUSTOM backend.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Upstream vLLM contribution (&lt;a href="https://github.com/vllm-project/vllm/issues/38171" rel="noopener noreferrer"&gt;vllm#38171&lt;/a&gt; — 49 upvotes)&lt;/li&gt;
&lt;li&gt;Full Flash Attention-style kernel fusion for multi-layer correctness&lt;/li&gt;
&lt;li&gt;Stacking with token pruning (VL-Cache) for multiplicative VLM savings&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pypi.org/project/turboquant-vllm/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; | &lt;a href="https://alberto-codes.github.io/turboquant-vllm/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/turboquant-vllm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>vllm</category>
      <category>gpu</category>
      <category>containers</category>
    </item>
    <item>
      <title>I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Fri, 27 Mar 2026 22:57:34 +0000</pubDate>
      <link>https://dev.to/albertocodes/i-shipped-googles-turboquant-as-a-vllm-plugin-72-hours-after-the-paper-heres-what-nobody-else-473g</link>
      <guid>https://dev.to/albertocodes/i-shipped-googles-turboquant-as-a-vllm-plugin-72-hours-after-the-paper-heres-what-nobody-else-473g</guid>
      <description>&lt;p&gt;Google published &lt;a href="https://arxiv.org/abs/2504.19874" rel="noopener noreferrer"&gt;TurboQuant&lt;/a&gt; at ICLR 2026 — a technique that compresses transformer KV caches to 4 bits per coordinate with zero accuracy loss. The paper reports 5-6x memory reduction on H100 GPUs, tested on text models like Gemma and Mistral.&lt;/p&gt;

&lt;p&gt;I wanted to know: does it work on a &lt;strong&gt;vision-language model&lt;/strong&gt; processing &lt;strong&gt;video&lt;/strong&gt;? On a &lt;strong&gt;consumer GPU&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;72 hours later, &lt;code&gt;turboquant-vllm&lt;/code&gt; is on PyPI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;turboquant-vllm[vllm]
vllm serve allenai/Molmo2-8B &lt;span class="nt"&gt;--attention-backend&lt;/span&gt; CUSTOM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. The plugin auto-registers via vLLM's entry point system. No code changes, no forking, no monkey-patching.&lt;/p&gt;

&lt;p&gt;For HuggingFace users:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DynamicCache&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;turboquant_vllm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CompressedDynamicCache&lt;/span&gt;

&lt;span class="n"&gt;cache&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DynamicCache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;compressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CompressedDynamicCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head_dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Pass cache (not wrapper) to model.generate()
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why Vision-Language Models Matter
&lt;/h2&gt;

&lt;p&gt;Every other TurboQuant implementation tests on text-only models with hundreds of tokens. But a 12-second video clip through Molmo2-4B produces ~11,000 visual tokens — 1.6 GB of KV cache on a 24 GB GPU.&lt;/p&gt;

&lt;p&gt;That's 10x more memory, 10x more opportunities for precision bugs to compound across 36 transformer layers. The existing VLM compression literature (VL-Cache, Dynamic-LLaVA, ZipVL) is all token pruning — deciding which tokens to discard. TurboQuant compresses the tokens you keep. They're complementary approaches, and nobody had validated whether vector quantization survives the visual token regime.&lt;/p&gt;

&lt;p&gt;It does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Molmo2-4B on RTX 4090, 11K visual tokens from a Seinfeld video clip:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;TQ4 Compressed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;KV cache&lt;/td&gt;
&lt;td&gt;1,639 MiB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;435 MiB (3.76x)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output quality&lt;/td&gt;
&lt;td&gt;Detailed scene description&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Near-identical&lt;/strong&gt; (100+ tokens match)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decode overhead&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1.78x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Molmo2-8B: same 3.76x ratio, correctly identifies all Seinfeld characters. Full 23-minute episode processed at 24 tok/s.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built Differently
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Plugin, not fork
&lt;/h3&gt;

&lt;p&gt;Other vLLM TurboQuant efforts are forks or monkey-patches. &lt;code&gt;turboquant-vllm&lt;/code&gt; uses vLLM's official plugin entry point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[project.entry-points."vllm.general_plugins"]&lt;/span&gt;
&lt;span class="py"&gt;tq4_backend&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"turboquant_vllm.vllm:register_tq4_backend"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Incremental dequantization
&lt;/h3&gt;

&lt;p&gt;The naive approach decompresses the full KV cache at every layer, every step — 3.36x overhead. Incremental dequantization decompresses only the 1 new token per step and appends to a running buffer. Overhead drops to 1.78x. This isn't in Google's paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-platform Triton
&lt;/h3&gt;

&lt;p&gt;Fused kernels run on both NVIDIA CUDA and AMD ROCm without code changes. 84/84 GPU tests pass on a Radeon 890M iGPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bugs Nobody Else Has Found
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FP16 norms fail at scale.&lt;/strong&gt; Works at 11,385 tokens, garbles output at 11,397 tokens. The 0.01% error per vector compounds across 36 layers. Always use fp32.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;QJL correction is invisible in standard attention.&lt;/strong&gt; The paper's Stage 2 (2-bit MSE + 1-bit QJL) wastes 1 bit of precision — standard &lt;code&gt;Q @ K^T&lt;/code&gt; can't use the correction. Full 3-bit MSE produces identical output.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-layer precision drift in fused kernels.&lt;/strong&gt; A 0.023 cosine gap per layer between fp32 Triton and bf16 SDPA compounds to produce "pizza pizza pizza" at 36 layers. Flash Attention-style fusion needed.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Validation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;180+ tests, 9 test files, 95%+ coverage&lt;/li&gt;
&lt;li&gt;16 GPU experiments with documented failures&lt;/li&gt;
&lt;li&gt;Cross-platform: NVIDIA RTX 4090 + AMD Radeon 890M&lt;/li&gt;
&lt;li&gt;End-to-end: installed from PyPI into stock &lt;code&gt;vllm/vllm-openai:latest&lt;/code&gt; container&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Upstream contribution to vLLM (&lt;a href="https://github.com/vllm-project/vllm/issues/38171" rel="noopener noreferrer"&gt;issue #38171&lt;/a&gt;, 49 upvotes)&lt;/li&gt;
&lt;li&gt;Full Flash Attention fusion for the fused Triton kernels&lt;/li&gt;
&lt;li&gt;Stacking with VL-Cache-style token pruning for multiplicative VLM savings&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pypi.org/project/turboquant-vllm/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/turboquant-vllm/tree/main/docs" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/turboquant-vllm" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>gpu</category>
    </item>
    <item>
      <title>I Implemented Google's TurboQuant and Tested It on a Vision-Language Model — Here's What the Paper Doesn't Tell You</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Thu, 26 Mar 2026 08:24:13 +0000</pubDate>
      <link>https://dev.to/albertocodes/i-implemented-googles-turboquant-and-tested-it-on-a-vision-language-model-heres-what-the-paper-22a9</link>
      <guid>https://dev.to/albertocodes/i-implemented-googles-turboquant-and-tested-it-on-a-vision-language-model-heres-what-the-paper-22a9</guid>
      <description>&lt;p&gt;Google published &lt;a href="https://arxiv.org/abs/2504.19874" rel="noopener noreferrer"&gt;TurboQuant&lt;/a&gt; at ICLR 2026 — a technique that compresses transformer KV caches to 3-4 bits per coordinate with zero accuracy loss. Their paper reports 5-6x memory reduction and 8x attention speedup on H100 GPUs, tested on text-only models like Gemma and Mistral.&lt;/p&gt;

&lt;p&gt;I wanted to know: does it work on a &lt;strong&gt;vision-language model&lt;/strong&gt; processing &lt;strong&gt;video&lt;/strong&gt;? On a &lt;strong&gt;consumer GPU&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;So I implemented it from the paper and ran it on Molmo2-4B analyzing Seinfeld clips (~11,000 visual tokens) on an RTX 4090.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;TQ4 Compressed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;KV cache&lt;/td&gt;
&lt;td&gt;1,639 MiB&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;435 MiB (3.76x)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output quality&lt;/td&gt;
&lt;td&gt;Detailed scene description&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Near-identical&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Decode overhead&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;1.78x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model describes the same Seinfeld scene with the same visual details. Different phrasing on minor points, both equally valid.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Things the Paper Doesn't Tell You
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. 4-bit nibble packing beats 3-bit unpacked
&lt;/h3&gt;

&lt;p&gt;The paper focuses on 3-bit quantization (8 centroids). But 3-bit values cross byte boundaries — no existing Python/Triton implementation actually packs them. Every implementation I found stores 3-bit indices in 8-bit bytes, wasting 62.5% of storage.&lt;/p&gt;

&lt;p&gt;4-bit indices pack trivially into nibbles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Pack two 4-bit indices into one byte
&lt;/span&gt;&lt;span class="n"&gt;packed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;indices&lt;/span&gt;&lt;span class="p"&gt;[...,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;indices&lt;/span&gt;&lt;span class="p"&gt;[...,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Unpack
&lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;packed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;packed&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="mh"&gt;0x0F&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And 16 centroids (4-bit) give ~97% cosine similarity vs ~95% for 8 centroids (3-bit). &lt;strong&gt;Better quality AND nearly double the compression&lt;/strong&gt; of unpacked 3-bit, with two lines of code.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Storage&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3-bit unpacked (uint8)&lt;/td&gt;
&lt;td&gt;132 B/block&lt;/td&gt;
&lt;td&gt;1.94x&lt;/td&gt;
&lt;td&gt;~95% cosine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4-bit nibble-packed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;68 B/block&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.76x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~97% cosine&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. FP16 norms fail silently at scale
&lt;/h3&gt;

&lt;p&gt;I initially stored vector norms as float16 to save 2 bytes per vector. It worked fine on short sequences.&lt;/p&gt;

&lt;p&gt;Then I tested with an 11,397-token video clip and got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Output: "In the video,1.0 0 0 0 0 0 0 0 0 0 0 0..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model produced 4 correct tokens ("In the video,") then completely degenerated. The same prompt with 11,385 tokens (12 fewer) worked perfectly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; FP16 precision loss (~0.01% per vector) accumulated across 36 transformer layers, shifting attention logits at low-confidence decision boundaries. Token-by-token analysis showed the divergence at step 5 where the logit margin was &amp;lt; 0.5.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Float32 norms. The 2 extra bytes per vector barely affect the compression ratio (1.97x → 1.94x). No other TurboQuant implementation I've found documents this failure mode.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The boring optimization wins
&lt;/h3&gt;

&lt;p&gt;I built a fused Triton kernel that computes &lt;code&gt;Q @ compressed_K^T&lt;/code&gt; directly from nibble-packed indices. It achieves &lt;strong&gt;17.8x speedup&lt;/strong&gt; on the micro-benchmark by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-rotating queries once (&lt;code&gt;q_rot = Q @ rotation_matrix^T&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Having the kernel do simple centroid lookups and dot products&lt;/li&gt;
&lt;li&gt;Eliminating the 128x128 rotation matmul from the inner loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sounds great. But when I wired it into all 36 layers of Molmo2, the output degenerated into "pizza pizza pizza pizza."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; The fused kernel computes in fp32, but the model expects bf16 attention behavior (via SDPA/FlashAttention). The 0.023 per-layer cosine gap between fp32 kernel output and bf16 SDPA output compounds catastrophically across 36 layers.&lt;/p&gt;

&lt;p&gt;The fix that actually worked: &lt;strong&gt;incremental dequantization&lt;/strong&gt;. Instead of decompressing the entire 11K-token cache at every layer at every decode step (the 3.36x overhead), decompress only the 1 new token and append it to a running buffer. Standard SDPA handles the attention.&lt;/p&gt;

&lt;p&gt;Overhead went from 3.36x to 1.78x. No custom kernels needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fused Kernel Isn't Wasted
&lt;/h2&gt;

&lt;p&gt;The kernel is correct (1.0 cosine similarity on the micro-benchmark) and fast (17.8x). It just needs to be part of a full Flash Attention-style fusion — computing softmax and V multiplication inside the kernel, not just Q@K^T scores. That's a future project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation Details
&lt;/h2&gt;

&lt;p&gt;The full implementation is at &lt;strong&gt;&lt;a href="https://github.com/Alberto-Codes/turboquant-consumer" rel="noopener noreferrer"&gt;github.com/Alberto-Codes/turboquant-consumer&lt;/a&gt;&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core algorithm:&lt;/strong&gt; Lloyd-Max codebook solver, TurboQuantMSE (Stage 1), TurboQuantProd (Stage 2 with QJL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CompressedDynamicCache:&lt;/strong&gt; Drop-in KV cache wrapper with nibble packing and incremental dequant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fused Triton kernel:&lt;/strong&gt; Nibble unpack + centroid gather + GQA mapping&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark harness:&lt;/strong&gt; A/B testing CLI for any HuggingFace model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;62 tests&lt;/strong&gt; including long-sequence regression (36 layers, 1024 tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 experiment logs&lt;/strong&gt; with full results
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick start&lt;/span&gt;
git clone https://github.com/Alberto-Codes/turboquant-consumer.git
&lt;span class="nb"&gt;cd &lt;/span&gt;turboquant-consumer &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; uv &lt;span class="nb"&gt;sync&lt;/span&gt;

&lt;span class="c"&gt;# Run tests&lt;/span&gt;
uv run pytest tests/ &lt;span class="nt"&gt;-v&lt;/span&gt;

&lt;span class="c"&gt;# Benchmark (requires GPU)&lt;/span&gt;
uv run python &lt;span class="nt"&gt;-m&lt;/span&gt; turboquant_consumer.benchmark &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--model&lt;/span&gt; allenai/Molmo2-4B &lt;span class="nt"&gt;--bits&lt;/span&gt; 4 &lt;span class="nt"&gt;--compressed&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--video&lt;/span&gt; /path/to/video.mp4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Molmo2-8B validation&lt;/strong&gt; — the 8B model recognizes character names&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flash Attention-style fused kernel&lt;/strong&gt; — full softmax+V fusion for multi-layer correctness&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vLLM integration&lt;/strong&gt; — waiting for upstream cache backend API&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This is the first TurboQuant implementation validated on a vision-language model with video input. If you're working on KV cache compression, I'd love to hear about your experiences — especially if you've hit the fp16 norms issue.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>gpu</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Your docstrings are lying — docvet 1.14 catches them</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Mon, 23 Mar 2026 00:04:09 +0000</pubDate>
      <link>https://dev.to/albertocodes/your-docstrings-are-lying-docvet-114-catches-them-9a</link>
      <guid>https://dev.to/albertocodes/your-docstrings-are-lying-docvet-114-catches-them-9a</guid>
      <description>&lt;p&gt;A &lt;a href="https://arxiv.org/abs/2404.03114" rel="noopener noreferrer"&gt;2024 study by Macke &amp;amp; Doyle&lt;/a&gt; found that incorrect documentation degrades LLM task success by 22.6 percentage points. Missing documentation? No statistically significant effect. Your AI coding assistant performs &lt;em&gt;worse&lt;/em&gt; with wrong docs than with no docs at all.&lt;/p&gt;

&lt;p&gt;That's the gap docvet fills. And with v1.14, it closes the gap further — checking not just whether your docstrings exist, but whether they &lt;em&gt;match your code&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Parameter Agreement Checks
&lt;/h3&gt;

&lt;p&gt;Two new rules — &lt;code&gt;missing-param-in-docstring&lt;/code&gt; and &lt;code&gt;extra-param-in-docstring&lt;/code&gt; — compare function signatures against &lt;code&gt;Args:&lt;/code&gt; sections, parameter by parameter.&lt;/p&gt;

&lt;p&gt;You know the drill: you rename &lt;code&gt;retries&lt;/code&gt; to &lt;code&gt;max_retries&lt;/code&gt; across a refactor, update every call site, and forget the docstring. Now docvet catches it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/client.py:47: missing-param-in-docstring Function 'connect' has parameters not documented in Args: max_retries [required]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Handles positional-only params (PEP 570), keyword-only, &lt;code&gt;self&lt;/code&gt;/&lt;code&gt;cls&lt;/code&gt; exclusion, and both Google and Sphinx styles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reverse Enrichment Checks
&lt;/h3&gt;

&lt;p&gt;Before 1.14, docvet asked "did the docstring mention this behavior?" Now it also asks the reverse: "does the docstring &lt;em&gt;claim&lt;/em&gt; behavior the code doesn't exhibit?"&lt;/p&gt;

&lt;p&gt;Three new rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;extra-raises-in-docstring&lt;/code&gt; — documents exceptions the function never raises&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;extra-yields-in-docstring&lt;/code&gt; — documents yields in a non-generator&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;extra-returns-in-docstring&lt;/code&gt; — documents returns the function never makes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A docstring that claims &lt;code&gt;FileNotFoundError&lt;/code&gt; when the function never raises anything is a trap. Callers write &lt;code&gt;try/except&lt;/code&gt; blocks for phantom exceptions. AI tools generate defensive code for errors that can't happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trivial Docstring Detection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get user.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This passes every presence check but adds zero information. The &lt;code&gt;trivial-docstring&lt;/code&gt; rule decomposes symbol names and summaries into word sets, filters stop words, and flags cases where the summary is just an echo of the name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Also in This Release
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;missing-deprecation&lt;/strong&gt; — catches &lt;code&gt;warnings.warn(DeprecationWarning)&lt;/code&gt; or &lt;code&gt;@deprecated&lt;/code&gt; (PEP 702) without a deprecation notice in the docstring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;missing-return-type&lt;/strong&gt; (opt-in) — flags &lt;code&gt;Returns:&lt;/code&gt; sections with no type when there's no return annotation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;undocumented-init-params&lt;/strong&gt; (opt-in) — catches &lt;code&gt;__init__&lt;/code&gt; methods with parameters but no &lt;code&gt;Args:&lt;/code&gt; section&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Design note:&lt;/strong&gt; Reverse checks use &lt;code&gt;recommended&lt;/code&gt; severity (not &lt;code&gt;required&lt;/code&gt;) to account for delegation patterns. Two rules are opt-in for progressive adoption. &lt;a href="https://alberto.codes/blog/2026-03-22-when-docstrings-lie-your-ai-tools-pay-the-price" rel="noopener noreferrer"&gt;Full design tradeoffs in the blog post.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;&lt;span class="nv"&gt;docvet&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;1.14.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Param agreement and reverse checks are on by default. Opt-in rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.docvet.enrichment]&lt;/span&gt;
&lt;span class="py"&gt;require-return-type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;require-init-params&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docvet check src/ &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Semantic verification — not just "did you document the parameters?" but "is what you said about them accurate?"&lt;/li&gt;
&lt;li&gt;Expanding multi-style support across all rule categories&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://pypi.org/project/docvet/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; | &lt;a href="https://alberto-codes.github.io/docvet/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/docvet" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>documentation</category>
      <category>ai</category>
      <category>developertools</category>
    </item>
    <item>
      <title>How docvet learned to read Sphinx and NumPy docstrings</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Sun, 08 Mar 2026 16:27:53 +0000</pubDate>
      <link>https://dev.to/albertocodes/how-docvet-learned-to-read-sphinx-and-numpy-docstrings-2o6</link>
      <guid>https://dev.to/albertocodes/how-docvet-learned-to-read-sphinx-and-numpy-docstrings-2o6</guid>
      <description>&lt;h2&gt;
  
  
  The problem: one inspector, one language
&lt;/h2&gt;

&lt;p&gt;docvet checks whether your Python docstrings are present, complete, accurate, and renderable. Since v1.0, it's caught missing &lt;code&gt;Raises:&lt;/code&gt; sections, stale docstrings after code changes, broken mkdocs rendering, and more — 22 rules across five check modules.&lt;/p&gt;

&lt;p&gt;But it only understood Google-style.&lt;/p&gt;

&lt;p&gt;That's a problem, because a huge portion of the Python ecosystem uses something else:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sphinx/RST style&lt;/strong&gt; (&lt;code&gt;:param name:&lt;/code&gt;, &lt;code&gt;:returns:&lt;/code&gt;, &lt;code&gt;:raises:&lt;/code&gt;) — Django, Flask, SQLAlchemy, requests, boto3, CPython stdlib&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NumPy style&lt;/strong&gt; (underlined section headers) — NumPy, SciPy, pandas, scikit-learn, matplotlib&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you maintain a Django app or a scientific Python library, docvet's enrichment checks couldn't parse your docstrings. As of v1.13.0, that's fixed.&lt;/p&gt;

&lt;h2&gt;
  
  
  How style support works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It's a project-level setting, not auto-detection
&lt;/h3&gt;

&lt;p&gt;docvet doesn't guess your style per-file. You tell it once in &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.docvet]&lt;/span&gt;
&lt;span class="py"&gt;docstring-style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sphinx"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two valid options: &lt;code&gt;"google"&lt;/code&gt; (default) and &lt;code&gt;"sphinx"&lt;/code&gt;. The setting applies project-wide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The NumPy twist:&lt;/strong&gt; NumPy-style underlined headers are recognized automatically in the default Google mode. If your project uses NumPy-style docstrings, you don't need to change anything — it already works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Sphinx/RST parsing
&lt;/h3&gt;

&lt;p&gt;When you set &lt;code&gt;docstring-style = "sphinx"&lt;/code&gt;, docvet maps field-list directives to the same internal section model used for Google-style:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sphinx directive&lt;/th&gt;
&lt;th&gt;Maps to&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:param:&lt;/code&gt;, &lt;code&gt;:type:&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Args&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:returns:&lt;/code&gt;, &lt;code&gt;:return:&lt;/code&gt;, &lt;code&gt;:rtype:&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Returns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;:raises:&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Raises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;:ivar:&lt;/code&gt;, &lt;code&gt;:cvar:&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Attributes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.. seealso::&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;See Also&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;.. code-block::&lt;/code&gt;, &lt;code&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Examples&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means all existing enrichment rules apply — &lt;code&gt;missing-raises&lt;/code&gt;, &lt;code&gt;missing-examples&lt;/code&gt;, &lt;code&gt;missing-attributes&lt;/code&gt;, and the rest.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5432&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Connection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Open a database connection.

    :param host: Hostname or IP address.
    :param port: Port number.
    :returns: An active connection object.
    :raises ConnectionError: If the host is unreachable.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;docvet checks this the same way it checks a Google-style docstring — are all raised exceptions documented? Are there parameters in the signature not covered by &lt;code&gt;:param:&lt;/code&gt; directives?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-disabled rules:&lt;/strong&gt; Five enrichment rules that have no Sphinx/RST equivalent are automatically disabled in Sphinx mode: &lt;code&gt;require_yields&lt;/code&gt;, &lt;code&gt;require_receives&lt;/code&gt;, &lt;code&gt;require_warns&lt;/code&gt;, &lt;code&gt;require_other_parameters&lt;/code&gt;, and &lt;code&gt;prefer_fenced_code_blocks&lt;/code&gt;. You can override any of them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[tool.docvet]&lt;/span&gt;
&lt;span class="py"&gt;docstring-style&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"sphinx"&lt;/span&gt;

&lt;span class="nn"&gt;[tool.docvet.enrichment]&lt;/span&gt;
&lt;span class="py"&gt;require_yields&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c"&gt;# re-enable if your project uses a yields convention&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Griffe compatibility:&lt;/strong&gt; The griffe check is auto-skipped in Sphinx mode, since griffe's parser targets Google-style docstrings.&lt;/p&gt;

&lt;h3&gt;
  
  
  NumPy section recognition
&lt;/h3&gt;

&lt;p&gt;NumPy-style uses section headers with matching-length underlines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Apply transformation along an axis.

    Parameters
    ----------
    data : array_like
        Input data.
    axis : int, optional
        Axis along which to operate.

    Returns
    -------
    result : ndarray
        Transformed data.

    Raises
    ------
    ValueError
        If axis is out of bounds.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the default Google mode, docvet already recognizes these headers alongside Google colon-format headers. The section parser looks for 3+ consecutive dashes or equals signs on the line following a known header name. No config change needed.&lt;/p&gt;

&lt;p&gt;NumPy-specific sections like &lt;code&gt;Notes&lt;/code&gt;, &lt;code&gt;References&lt;/code&gt;, &lt;code&gt;Warnings&lt;/code&gt;, &lt;code&gt;Extended Summary&lt;/code&gt;, and &lt;code&gt;Methods&lt;/code&gt; are recognized for section boundary detection but don't have their own enforcement rules — they won't trigger findings.&lt;/p&gt;

&lt;h2&gt;
  
  
  New rules
&lt;/h2&gt;

&lt;h3&gt;
  
  
  missing-returns
&lt;/h3&gt;

&lt;p&gt;Functions that return a value should document what they return:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docvet flags this — return value is undocumented
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_total&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Sum all item prices.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rule skips cases where a return section doesn't make sense:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stubs&lt;/strong&gt; (&lt;code&gt;...&lt;/code&gt; or &lt;code&gt;pass&lt;/code&gt; body) — interface definitions, not implementations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;__init__&lt;/code&gt; methods&lt;/strong&gt; — return &lt;code&gt;None&lt;/code&gt; by convention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Properties&lt;/strong&gt; — the getter docstring describes the attribute, not a return value&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-raise-only functions&lt;/strong&gt; — they don't meaningfully "return"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Works in both Google and Sphinx modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  overload-has-docstring
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;@typing.overload&lt;/code&gt; signatures describe distinct call patterns. They deserve docstrings explaining &lt;em&gt;when&lt;/em&gt; to use each variant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@overload&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="nd"&gt;@overload&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Parse input data into a dictionary.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;docvet flags the overload signatures missing docstrings. The existing &lt;code&gt;missing-docstring&lt;/code&gt; rule skips overloads to avoid double-reporting — each rule owns its scope cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This release is about &lt;strong&gt;reach&lt;/strong&gt;. docvet's quality model — six layers from presence to rendering — applies regardless of docstring style. The rules don't change; the parser learned new dialects.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Style&lt;/th&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;Default, no config needed&lt;/td&gt;
&lt;td&gt;Original support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NumPy&lt;/td&gt;
&lt;td&gt;Default, no config needed&lt;/td&gt;
&lt;td&gt;Recognized automatically in Google mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sphinx/RST&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docstring-style = "sphinx"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;One line in pyproject.toml&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're on a Django team, a scientific Python project, or any codebase using Sphinx-style docs, docvet is ready.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; docvet
docvet check &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;22 rules across 5 check modules. Zero runtime dependencies beyond typer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://alberto-codes.github.io/docvet/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://pypi.org/project/docvet/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>documentation</category>
      <category>devtools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Encrypt Google ADK Sessions in 5 Minutes</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Sat, 07 Mar 2026 07:09:05 +0000</pubDate>
      <link>https://dev.to/albertocodes/encrypt-google-adk-sessions-in-5-minutes-5b9f</link>
      <guid>https://dev.to/albertocodes/encrypt-google-adk-sessions-in-5-minutes-5b9f</guid>
      <description>&lt;p&gt;Google ADK stores everything your agent knows — tool calls, user messages, conversation context — in plaintext SQLite. If that makes you uncomfortable, this post fixes it.&lt;/p&gt;

&lt;p&gt;This is the recipe card. Ingredients, steps, done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Python 3.12+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An existing ADK agent&lt;/strong&gt; using &lt;code&gt;DatabaseSessionService&lt;/code&gt; (or a willingness to create a minimal one)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No system libraries, no C compilation, no Docker. The library is pure Python with two runtime dependencies: &lt;code&gt;google-adk&lt;/code&gt; and &lt;code&gt;cryptography&lt;/code&gt;. A short ingredient list.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;adk-secure-sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv add adk-secure-sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import adk_secure_sessions; print('OK')"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Swap the Import
&lt;/h2&gt;

&lt;p&gt;Your agent code probably has something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before — ADK default (unencrypted):
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DatabaseSessionService&lt;/span&gt;

&lt;span class="n"&gt;session_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DatabaseSessionService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite+aiosqlite:///sessions.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After — encrypted:
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;adk_secure_sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FernetBackend&lt;/span&gt;

&lt;span class="n"&gt;session_service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite+aiosqlite:///sessions.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;FernetBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-secret-passphrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two changes: the import line and the constructor. Everything else in your agent stays the same — &lt;code&gt;create_session&lt;/code&gt;, &lt;code&gt;get_session&lt;/code&gt;, &lt;code&gt;list_sessions&lt;/code&gt;, &lt;code&gt;delete_session&lt;/code&gt;, &lt;code&gt;append_event&lt;/code&gt; — the full ADK session lifecycle, identical behavior. The difference is what hits the disk.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Use the Async Context Manager
&lt;/h2&gt;

&lt;p&gt;For proper connection cleanup, wrap the service in &lt;code&gt;async with&lt;/code&gt;. Here's a complete, runnable script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;adk_secure_sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FernetBackend&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FernetBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-secret-passphrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite+aiosqlite:///sessions.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Create a session with sensitive state
&lt;/span&gt;        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;patient_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jane Doe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;diagnosis_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;J06.9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-secret-key-12345&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Created session: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Retrieve — state is automatically decrypted
&lt;/span&gt;        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decrypted state: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# List sessions for this app/user
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_sessions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sessions found: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sessions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Clean up when you're done
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session deleted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy this into a file and run it. The API behaves identically to ADK's &lt;code&gt;DatabaseSessionService&lt;/code&gt; — same methods, same signatures, same return types. The only difference is what's stored on disk: switching from a glass jar to a lockbox. Same ingredients go in, same ingredients come out, but nobody can peek inside without the key.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Verify the Encryption
&lt;/h2&gt;

&lt;p&gt;Trust but verify. Open the SQLite database directly and confirm the data is actually encrypted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using the &lt;code&gt;sqlite3&lt;/code&gt; CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sqlite3 sessions.db &lt;span class="s2"&gt;"SELECT state FROM sessions LIMIT 1;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see a base64-encoded string — the encrypted envelope — not readable JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AQFnQUFBQUJuVm1Gc2RX...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Using Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;

&lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sessions.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT state FROM sessions LIMIT 1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchone&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# First 60 chars of the encrypted envelope
&lt;/span&gt;&lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you won't see: &lt;code&gt;{"patient_name": "Jane Doe", "diagnosis_code": "J06.9"}&lt;/code&gt;. That's the point. With &lt;code&gt;DatabaseSessionService&lt;/code&gt;, anyone with file access reads your mise en place. With &lt;code&gt;EncryptedSessionService&lt;/code&gt;, they see noise.&lt;/p&gt;

&lt;p&gt;For a more convincing demo, run the &lt;a href="https://github.com/Alberto-Codes/adk-secure-sessions/blob/main/examples/basic_usage.py" rel="noopener noreferrer"&gt;basic usage example&lt;/a&gt; from the repo — it runs a real multi-turn ADK agent with Ollama and then inspects the raw database to prove no plaintext leaks. After a three-turn conversation about patient intake, the database contains zero occurrences of "Jane Doe" or "headache."&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Manage Your Passphrase
&lt;/h2&gt;

&lt;p&gt;The passphrase is the only secret. Never hardcode it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;adk_secure_sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FernetBackend&lt;/span&gt;

&lt;span class="n"&gt;backend&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FernetBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SESSION_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set it in your environment, your &lt;code&gt;.env&lt;/code&gt; file, or your secrets manager. The library handles everything else — &lt;code&gt;FernetBackend&lt;/code&gt; derives a cryptographic key using PBKDF2-HMAC-SHA256 with 480,000 iterations. You don't need to generate, store, or rotate raw key material.&lt;/p&gt;

&lt;p&gt;If you use the wrong passphrase to read a session encrypted with a different one, you get a clear &lt;code&gt;DecryptionError&lt;/code&gt; — never garbage data, never silent corruption.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Just Built
&lt;/h2&gt;

&lt;p&gt;Five steps, plaintext to encrypted-at-rest:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Installed&lt;/strong&gt; — &lt;code&gt;pip install adk-secure-sessions&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swapped&lt;/strong&gt; — one import, one constructor change&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ran&lt;/strong&gt; — same API, encrypted storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verified&lt;/strong&gt; — the database contains ciphertext, not JSON&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secured&lt;/strong&gt; — passphrase in the environment, not the codebase&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Your agent still works the same way. Your tests still pass. But the SQLite file is now useless without the key — like a walk-in freezer with a combination lock. Nothing changes about how the food is stored or retrieved, but the back door isn't open anymore.&lt;/p&gt;




&lt;h2&gt;
  
  
  Error Handling
&lt;/h2&gt;

&lt;p&gt;When things go wrong, the library tells you what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ConfigurationError&lt;/code&gt;&lt;/strong&gt; — raised at startup if the backend is misconfigured. You'll catch this before any data is written.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;DecryptionError&lt;/code&gt;&lt;/strong&gt; — raised if you read a session with the wrong key. The library never returns garbage.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;adk_secure_sessions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ConfigurationError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DecryptionError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;FernetBackend&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;EncryptedSessionService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlite+aiosqlite:///sessions.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;FernetBackend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;correct-passphrase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-123&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;some-session-id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Session not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;ConfigurationError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Backend doesn&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t conform to EncryptionBackend protocol&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;DecryptionError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Wrong key — cannot decrypt session data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One install, one import change&lt;/strong&gt; — &lt;code&gt;pip install adk-secure-sessions&lt;/code&gt;, swap the constructor, done&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full ADK lifecycle&lt;/strong&gt; — &lt;code&gt;create_session&lt;/code&gt;, &lt;code&gt;get_session&lt;/code&gt;, &lt;code&gt;list_sessions&lt;/code&gt;, &lt;code&gt;delete_session&lt;/code&gt;, and &lt;code&gt;append_event&lt;/code&gt; all work identically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify it yourself&lt;/strong&gt; — inspect the SQLite file to confirm ciphertext, not plaintext&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Passphrase management&lt;/strong&gt; — use environment variables, never hardcode secrets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear errors&lt;/strong&gt; — &lt;code&gt;DecryptionError&lt;/code&gt; for wrong keys, &lt;code&gt;ConfigurationError&lt;/code&gt; for bad setup&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://alberto-codes.github.io/adk-secure-sessions/" rel="noopener noreferrer"&gt;adk-secure-sessions documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alberto-codes.github.io/adk-secure-sessions/getting-started/" rel="noopener noreferrer"&gt;Getting Started guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/adk-secure-sessions/" rel="noopener noreferrer"&gt;adk-secure-sessions on PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/sessions/session/" rel="noopener noreferrer"&gt;Google ADK Session docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Alberto-Codes/adk-secure-sessions" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Let an Algorithm Rewrite My AI Agent's Prompts. It Found Things I Never Would Have.</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Fri, 06 Mar 2026 07:13:19 +0000</pubDate>
      <link>https://dev.to/albertocodes/i-let-an-algorithm-rewrite-my-ai-agents-prompts-it-found-things-i-never-would-have-30dm</link>
      <guid>https://dev.to/albertocodes/i-let-an-algorithm-rewrite-my-ai-agents-prompts-it-found-things-i-never-would-have-30dm</guid>
      <description>&lt;p&gt;I started with this instruction for a Google ADK agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Greet the user appropriately."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Five words. Seemed fine. The agent produced decent greetings. I could've shipped it.&lt;/p&gt;

&lt;p&gt;Instead, I ran it through an evolutionary optimizer. Three iterations later, the instruction was three paragraphs long — covering formality tiers, period-appropriate language for different honorifics, tonal variation based on social context, and specific vocabulary constraints I never would have thought to include.&lt;/p&gt;

&lt;p&gt;The agent's quality score went from 0.35 to 0.81. Same model, same training examples, completely different output quality. The only thing that changed was the instruction text — and I didn't write a single word of the new one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Prompt engineering is guess-and-check. You write something, test it on a couple examples, tweak a word, test again. It works, kind of — like seasoning food without tasting it. You'll get something edible, but you'll never find the version that's genuinely great.&lt;/p&gt;

&lt;p&gt;The core issue: the space of possible instructions is infinite, and your intuition can only explore a tiny corner of it. You get stuck in local optima. You test against too few examples. You optimize for what &lt;em&gt;feels&lt;/em&gt; wrong instead of what &lt;em&gt;measurably&lt;/em&gt; underperforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Let an LLM Critique and Rewrite the Prompts
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Alberto-Codes/gepa-adk" rel="noopener noreferrer"&gt;gepa-adk&lt;/a&gt; automates this loop using evolutionary optimization (based on the &lt;a href="https://arxiv.org/abs/2507.19457" rel="noopener noreferrer"&gt;GEPA paper&lt;/a&gt;):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run&lt;/strong&gt; the agent on training examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt; outputs with a critic agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reflect&lt;/strong&gt; — an LLM analyzes what went wrong&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mutate&lt;/strong&gt; — proposes a better instruction based on the analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep or discard&lt;/strong&gt; based on whether scores improve&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The mutation isn't random. The reflection model sees every output, every score, and every piece of critic feedback. It makes targeted changes. Think of it less like genetic mutation and more like a head chef tasting every plate and adjusting the recipe.&lt;/p&gt;

&lt;p&gt;Here's the entire thing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;LlmAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;gepa_adk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;evolve_sync&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SimpleCriticOutput&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;greeter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Greet the user appropriately.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score for formal, Dickens-style greetings. 0.0-1.0.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_schema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SimpleCriticOutput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trainset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I am His Majesty, the King.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I am your mother.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I am a close friend.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evolve_sync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trainset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;original_score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; -&amp;gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;final_score&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. &lt;code&gt;evolve_sync&lt;/code&gt; handles the loop. You get back the evolved instruction and the score trajectory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Else Can Evolve
&lt;/h2&gt;

&lt;p&gt;Instructions are the default target, but gepa-adk can also optimize output schemas (Pydantic models), generation config (temperature, top-p), and even multi-agent systems — evolving how multiple agents coordinate together.&lt;/p&gt;

&lt;p&gt;The multi-agent case is where it gets wild. In a pipeline, one agent's instruction affects another agent's input. Evolving them together finds coordination patterns you'd never discover tuning each agent in isolation.&lt;/p&gt;

&lt;h2&gt;
  
  
  When This Makes Sense
&lt;/h2&gt;

&lt;p&gt;It shines when you have measurable quality criteria, diverse inputs, and you're building for production where the difference between 0.65 and 0.82 matters at scale.&lt;/p&gt;

&lt;p&gt;It's overkill for one-off prompts or tasks where "good enough" is actually good enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;gepa-adk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/Alberto-Codes/gepa-adk" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://pypi.org/project/gepa-adk/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt; | &lt;a href="https://alberto-codes.github.io/gepa-adk/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; | &lt;a href="https://github.com/Alberto-Codes/gepa-adk/discussions/303" rel="noopener noreferrer"&gt;v1.0.0 Announcement&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the full deep dive on the evolution loop, critic agents, and architecture: &lt;a href="https://alberto.codes/blog/stop-writing-ai-agent-prompts-by-hand" rel="noopener noreferrer"&gt;Stop Writing AI Agent Prompts by Hand&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Based on the &lt;a href="https://arxiv.org/abs/2507.19457" rel="noopener noreferrer"&gt;GEPA paper&lt;/a&gt; — built on &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Google ADK&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the worst prompt you've manually tuned into submission? I'm curious if evolution would've found something better.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Give Your AI Coding Agent a Docstring Quality Tool (MCP Setup for VS Code, Cursor, and Claude Code)</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Wed, 04 Mar 2026 14:56:40 +0000</pubDate>
      <link>https://dev.to/albertocodes/give-your-ai-coding-agent-a-docstring-quality-tool-mcp-setup-for-vs-code-cursor-and-claude-code-4hdm</link>
      <guid>https://dev.to/albertocodes/give-your-ai-coding-agent-a-docstring-quality-tool-mcp-setup-for-vs-code-cursor-and-claude-code-4hdm</guid>
      <description>&lt;p&gt;Your AI coding agent can read your code, run your tests, and search your repo. But can it check whether your docstrings actually match what the code does?&lt;/p&gt;

&lt;p&gt;Research shows incorrect documentation drops LLM task success by &lt;a href="https://arxiv.org/abs/2404.03114" rel="noopener noreferrer"&gt;22.6 percentage points&lt;/a&gt;. Missing docs are annoying. &lt;em&gt;Wrong&lt;/em&gt; docs are toxic — they create false confidence in generated code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Alberto-Codes/docvet" rel="noopener noreferrer"&gt;docvet&lt;/a&gt; catches these gaps: 19 rules that check whether your docstrings actually match what the code does. Since v1.8, it ships an MCP server — meaning any MCP-aware editor can give its AI agent direct, programmatic access to those checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Your Agent Gets
&lt;/h2&gt;

&lt;p&gt;Two tools appear in the agent's toolbox:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;docvet_check&lt;/code&gt;&lt;/strong&gt; — Run checks on any Python file or directory. Returns structured JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"findings"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/pipeline/extract.py"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"symbol"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"extract_text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"rule"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"missing-raises"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Function 'extract_text' raises ValueError but has no Raises section"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"required"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"by_category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"recommended"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"files_checked"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;docvet_rules&lt;/code&gt;&lt;/strong&gt; — List all 19 rules with descriptions and categories.&lt;/p&gt;

&lt;p&gt;No CLI output to parse. No regex. Typed fields the agent reasons about directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup: One Block of JSON
&lt;/h2&gt;

&lt;p&gt;The MCP server runs on stdio via &lt;code&gt;uvx&lt;/code&gt; — no &lt;code&gt;pip install&lt;/code&gt; in your project, no virtual environment pollution, no global packages. &lt;code&gt;uvx&lt;/code&gt; downloads and runs docvet in an isolated environment automatically. You add the config and it just works.&lt;/p&gt;

&lt;h3&gt;
  
  
  VS Code
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;.vscode/mcp.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"docvet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"docvet[mcp]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; VS Code uses &lt;code&gt;"servers"&lt;/code&gt;, not &lt;code&gt;"mcpServers"&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Cursor
&lt;/h3&gt;

&lt;p&gt;Add to &lt;code&gt;.cursor/mcp.json&lt;/code&gt; (project) or &lt;code&gt;~/.cursor/mcp.json&lt;/code&gt; (global):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"docvet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"uvx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"docvet[mcp]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Claude Code
&lt;/h3&gt;

&lt;p&gt;One command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add &lt;span class="nt"&gt;--transport&lt;/span&gt; stdio &lt;span class="nt"&gt;--scope&lt;/span&gt; project docvet &lt;span class="nt"&gt;--&lt;/span&gt; uvx &lt;span class="s2"&gt;"docvet[mcp]"&lt;/span&gt; mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Others
&lt;/h3&gt;

&lt;p&gt;Windsurf, Claude Desktop, and anything that speaks MCP — same &lt;code&gt;mcpServers&lt;/code&gt; pattern. &lt;a href="https://alberto-codes.github.io/docvet/editor-integration/#client-configuration" rel="noopener noreferrer"&gt;Full configs here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workflow
&lt;/h2&gt;

&lt;p&gt;Once configured, the agent uses docvet as part of its normal flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Agent opens a Python file to modify&lt;/li&gt;
&lt;li&gt;Agent runs &lt;code&gt;docvet_check&lt;/code&gt; on the file&lt;/li&gt;
&lt;li&gt;Findings come back — missing Raises sections, stale signatures, undocumented attributes&lt;/li&gt;
&lt;li&gt;Agent fixes the docstrings alongside the code change&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The feedback loop becomes automatic — like a line cook who taste-tests every dish before it leaves the pass. Code and documentation stay in sync because the agent checks both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Add the &lt;code&gt;.vscode/mcp.json&lt;/code&gt; block above&lt;/li&gt;
&lt;li&gt;Open a Python file with a known gap (function raises an exception, no &lt;code&gt;Raises:&lt;/code&gt; section)&lt;/li&gt;
&lt;li&gt;Ask your AI agent to check the file with docvet&lt;/li&gt;
&lt;li&gt;Watch it fix the docstring&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://alberto-codes.github.io/docvet/editor-integration/#__tabbed_1_3" rel="noopener noreferrer"&gt;VS Code MCP setup (copy-paste config)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alberto-codes.github.io/docvet/editor-integration/" rel="noopener noreferrer"&gt;Full editor integration docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Alberto-Codes/docvet/discussions" rel="noopener noreferrer"&gt;GitHub announcement: docvet is on the MCP Registry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/Alberto-Codes/docvet" rel="noopener noreferrer"&gt;docvet on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/docvet/" rel="noopener noreferrer"&gt;docvet on PyPI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://registry.modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP Registry listing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://alberto.codes/blog/2026-02-25-your-ai-reads-your-docstrings" rel="noopener noreferrer"&gt;Previous post: Your AI Reads Your Docstrings. Are They Right?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>python</category>
      <category>mcp</category>
      <category>vscode</category>
      <category>ai</category>
    </item>
    <item>
      <title>Your AI Reads Your Docstrings. Are They Right?</title>
      <dc:creator>Alberto Nieto</dc:creator>
      <pubDate>Tue, 03 Mar 2026 06:35:17 +0000</pubDate>
      <link>https://dev.to/albertocodes/your-ai-reads-your-docstrings-are-they-right-2g89</link>
      <guid>https://dev.to/albertocodes/your-ai-reads-your-docstrings-are-they-right-2g89</guid>
      <description>&lt;p&gt;Copilot, Claude Code, Cursor — they all read your docstrings to understand your code. When those docstrings are wrong, your AI makes confident, wrong suggestions.&lt;/p&gt;

&lt;p&gt;And wrong docs are worse than no docs. Studies show incorrect documentation drops LLM task success by 22 percentage points compared to correct docs.&lt;/p&gt;

&lt;p&gt;Your linter checks &lt;strong&gt;style&lt;/strong&gt;. But who checks that the docstring is actually &lt;strong&gt;accurate&lt;/strong&gt;?&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap in your toolchain
&lt;/h2&gt;

&lt;p&gt;Existing tools cover the basics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ruff&lt;/strong&gt; — docstring style and formatting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;interrogate&lt;/strong&gt; — docstring presence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But neither checks whether your docstring &lt;em&gt;matches the code&lt;/em&gt;. A function that raises &lt;code&gt;ValueError&lt;/code&gt; but doesn't document it. A parameter added last sprint but missing from the docstring. Code that changed but the docstring didn't.&lt;/p&gt;

&lt;p&gt;That's layers 3–6 of docstring quality — and nothing was checking them.&lt;/p&gt;

&lt;h2&gt;
  
  
  docvet fills that gap
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Alberto-Codes/docvet" rel="noopener noreferrer"&gt;docvet&lt;/a&gt; is a CLI tool that vets docstrings across six quality layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Presence&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docvet presence&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Public symbols with no docstring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Completeness&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docvet enrichment&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Missing Raises, Yields, Attributes sections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docvet freshness&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Code changed, docstring didn't&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rendering&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docvet griffe&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Docstrings that break mkdocs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visibility&lt;/td&gt;
&lt;td&gt;&lt;code&gt;docvet coverage&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Modules invisible to doc generators&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;docvet
docvet check &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it on your codebase. You'll probably find something.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for AI
&lt;/h2&gt;

&lt;p&gt;Docstrings are no longer just for humans reading your code. They're the context window for every AI tool touching your codebase. Accurate docstrings create a feedback loop: better context → better AI suggestions → better code.&lt;/p&gt;

&lt;p&gt;docvet keeps that contract honest.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://alberto-codes.github.io/docvet/" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; · &lt;a href="https://github.com/Alberto-Codes/docvet" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://pypi.org/project/docvet/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
