<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alan West</title>
    <description>The latest articles on DEV Community by Alan West (@alanwest).</description>
    <link>https://dev.to/alanwest</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3834047%2F6413d0cf-9d90-4ccc-80a9-123656fd78ba.png</url>
      <title>DEV Community: Alan West</title>
      <link>https://dev.to/alanwest</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/alanwest"/>
    <language>en</language>
    <item>
      <title>Why your quantized LLM loses its MTP heads and how to keep them</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Wed, 27 May 2026 16:00:08 +0000</pubDate>
      <link>https://dev.to/alanwest/why-your-quantized-llm-loses-its-mtp-heads-and-how-to-keep-them-m7h</link>
      <guid>https://dev.to/alanwest/why-your-quantized-llm-loses-its-mtp-heads-and-how-to-keep-them-m7h</guid>
      <description>&lt;h2&gt;
  
  
  The frustrating problem
&lt;/h2&gt;

&lt;p&gt;Last month a teammate pinged me with a classic head-scratcher. He'd taken a base model with multi-token prediction (MTP) heads, ran it through a standard quantization pipeline to ship a smaller GGUF for edge inference, and the latency numbers came back &lt;em&gt;worse&lt;/em&gt; than expected. The model still generated coherent text, but the speculative decoding speedup he'd built his benchmarks around was gone.&lt;/p&gt;

&lt;p&gt;We poked around for an hour before the penny dropped. The MTP heads had silently been dropped on the floor during conversion. The base weights survived. The extra prediction heads — the whole reason MTP exists — did not.&lt;/p&gt;

&lt;p&gt;If you've worked with models that ship MTP layers (the technique popularized by DeepSeek-V3, where the model predicts the next N tokens in parallel as draft tokens), you might have already run into this. The conversion toolchain assumes anything that isn't a vanilla transformer block is dead weight and trims it. Here's why it happens and how to stop it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MTP heads actually are
&lt;/h2&gt;

&lt;p&gt;Quick refresher so we're on the same page. MTP (multi-token prediction) adds auxiliary heads on top of the base model that each predict a future token at offset +1, +2, +3, etc. At inference time you can use them as a built-in draft model for speculative decoding, which gives you a real throughput win without needing a separate small model.&lt;/p&gt;

&lt;p&gt;The key thing: these heads are &lt;strong&gt;architecturally distinct&lt;/strong&gt; from the regular &lt;code&gt;lm_head&lt;/code&gt;. They live in their own module tree, often named something like &lt;code&gt;model.mtp.layers.0&lt;/code&gt;, &lt;code&gt;model.mtp.layers.1&lt;/code&gt; and so on. They reference shared embeddings but have their own normalization, attention, and projection weights.&lt;/p&gt;

&lt;p&gt;That naming convention is exactly what trips up the tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root cause: conversion scripts have an opinionated allowlist
&lt;/h2&gt;

&lt;p&gt;Most quantization toolchains weren't designed with MTP in mind. They walk the state dict and apply transformations based on regex matches against expected layer names. Anything that doesn't match is either:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Silently dropped (worst case)&lt;/li&gt;
&lt;li&gt;Left in fp16/fp32 in the output (works but bloats the file)&lt;/li&gt;
&lt;li&gt;Renamed in a way the loader can't recover (subtle breakage)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I dug into the llama.cpp conversion script for the project, the relevant logic was essentially this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified version of what most converters do
&lt;/span&gt;&lt;span class="n"&gt;KNOWN_PREFIXES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.layers.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.embed_tokens.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.norm.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lm_head.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tensor&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;KNOWN_PREFIXES&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# MTP heads land here and get skipped
&lt;/span&gt;        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skipping unknown tensor: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="nf"&gt;write_quantized&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;logger.debug&lt;/code&gt; is the killer. Unless you run conversion with debug logging on, you never see the skip messages. The file converts "successfully" and you walk away thinking everything's fine.&lt;/p&gt;

&lt;p&gt;GPTQ-style quantizers have a related but different failure mode. They calibrate against forward passes through the model, and if your calibration code only exercises the main &lt;code&gt;lm_head&lt;/code&gt; path, the MTP heads never see calibration data. Even if the weights are preserved, the resulting quantized heads are essentially random.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-step solution
&lt;/h2&gt;

&lt;p&gt;Here's the workflow I now use whenever I touch a model with MTP heads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Inventory the heads before you touch anything
&lt;/h3&gt;

&lt;p&gt;Before any conversion, dump the full state dict and grep for MTP-related modules. This sets your baseline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;safetensors&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;safe_open&lt;/span&gt;

&lt;span class="n"&gt;mtp_tensors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;safe_open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.safetensors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;framework&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="c1"&gt;# Adjust prefix to whatever your model uses
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mtp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;multi_token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;mtp_tensors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get_shape&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shape&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;mtp_tensors&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Total MTP tensors: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mtp_tensors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save this output. You'll diff against it after every conversion step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Patch the converter's allowlist
&lt;/h3&gt;

&lt;p&gt;For llama.cpp style converters, you need to extend the known prefix list and add a mapping rule for the MTP heads. The clean way is to subclass or monkey-patch rather than editing the upstream script directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;convert_hf_to_gguf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Model&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MTPAwareModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;map_tensor_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Handle MTP heads explicitly before falling through
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.mtp.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="c1"&gt;# Preserve the layer index and submodule path
&lt;/span&gt;            &lt;span class="c1"&gt;# Output name needs to match what your loader expects
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model.mtp.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mtp.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;map_tensor_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;modify_tensors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Skip the parent class's filter for MTP layers
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mtp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map_tensor_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;modify_tensors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bid&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical bit is overriding &lt;code&gt;modify_tensors&lt;/code&gt; — the default implementation has the silent skip we saw earlier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: For GPTQ, calibrate through the MTP path
&lt;/h3&gt;

&lt;p&gt;If you're using GPTQ-style quantization, your calibration loop needs to actually hit the MTP heads. The default &lt;code&gt;model(input_ids)&lt;/code&gt; forward pass only routes through the main LM head. You need to force the MTP heads to see activations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calibration_forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Standard forward populates main path activations
&lt;/span&gt;    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_hidden_states&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Manually invoke MTP heads using the final hidden state
&lt;/span&gt;    &lt;span class="c1"&gt;# This ensures each head gets calibration statistics
&lt;/span&gt;    &lt;span class="n"&gt;hidden&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hidden_states&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mtp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;layers&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Shift input so head i predicts token at position +i+1
&lt;/span&gt;        &lt;span class="n"&gt;shifted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hidden&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shifted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, your MTP heads quantize to garbage even though the file looks complete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Verify post-conversion
&lt;/h3&gt;

&lt;p&gt;Re-run the inventory script against the converted file. The tensor count should match. If you went GGUF, you can also dump metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# llama.cpp ships a metadata inspection tool&lt;/span&gt;
./gguf-dump model-quantized.gguf | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; mtp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run a quick speculative decoding sanity check. If the MTP heads are intact and properly calibrated, you should see your tokens-per-second numbers match (or get very close to) the unquantized baseline's speedup ratio.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention tips
&lt;/h2&gt;

&lt;p&gt;A few habits that have saved me repeated pain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always run converters with debug logging enabled.&lt;/strong&gt; The skip messages are the single most useful signal you'll get, and they're hidden by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tensor-count diff as part of CI.&lt;/strong&gt; If your pipeline converts models automatically, fail the build when the output has fewer tensors than the input minus a known allowlist of intentionally-dropped weights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test speculative decoding throughput, not just generation quality.&lt;/strong&gt; A model can produce fluent text with broken MTP heads — your end-to-end latency benchmark is the only thing that will catch the regression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin your converter version.&lt;/strong&gt; Upstream conversion scripts change their tensor-name handling more often than you'd think. A model that converted cleanly six months ago might silently break today.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MTP is one of those features where the failure mode is invisible until you measure the thing the feature was supposed to improve. Treat the conversion pipeline as untrusted by default, and you'll avoid burning an afternoon on it like we did.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>llm</category>
      <category>python</category>
      <category>quantization</category>
    </item>
    <item>
      <title>How to build reliable geo-restrictions that actually hold up in production</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Wed, 27 May 2026 01:41:07 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-build-reliable-geo-restrictions-that-actually-hold-up-in-production-1n83</link>
      <guid>https://dev.to/alanwest/how-to-build-reliable-geo-restrictions-that-actually-hold-up-in-production-1n83</guid>
      <description>&lt;p&gt;Last week I saw another platform get blocked in a European market because their geo-restriction setup was, charitably, optimistic. A single header check. No IP verification. Nothing to handle VPNs or the weird middle-ground of corporate proxies. The result? Regulators noticed users in restricted regions were still getting through, and the whole product got pulled.&lt;/p&gt;

&lt;p&gt;I've shipped jurisdiction-based access controls for a fintech and a streaming-adjacent product, and I'll tell you up front: this stuff is harder than it looks. The problem isn't "detect the country" — that part has been solved for years. The problem is doing it reliably enough that compliance won't yell at you, without breaking your legitimate users in the process.&lt;/p&gt;

&lt;p&gt;Let's walk through what actually goes wrong and how to build something that holds up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why naive geo-blocking fails
&lt;/h2&gt;

&lt;p&gt;Most teams start with something like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Don't do this in production
app.use((req, res, next) =&amp;gt; {
  const country = req.headers['cf-ipcountry']; // or similar
  if (BLOCKED_COUNTRIES.includes(country)) {
    return res.status(451).send('Not available in your region');
  }
  next();
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Looks fine. Ships in five minutes. And it falls apart roughly the moment a real user hits it. Here's what I've seen break:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;IP geolocation is probabilistic, not deterministic.&lt;/strong&gt; Free databases are accurate maybe 95-97% at the country level. The remaining few percent is users hitting your block page in the wrong country, or the right country thinking they're somewhere else.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile carriers route weirdly.&lt;/strong&gt; A user in Madrid on a mobile network might appear to come from a carrier hub in Frankfurt. I've seen this trip up half a dozen geo checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CDN headers can be cached.&lt;/strong&gt; If you're caching responses upstream of your geo check, you can serve a Spanish user a German-cached page. Whoops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPN and residential proxy traffic.&lt;/strong&gt; This is the big one. If you only check IP, you're trivially bypassed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The root cause across all of these is that we're trying to make a binary decision (allow/deny) based on a fundamentally fuzzy signal (where is this packet coming from). The fix is to stop pretending the signal is clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  A layered approach that actually works
&lt;/h2&gt;

&lt;p&gt;The pattern I've landed on after migrating three projects is: &lt;strong&gt;multiple independent signals, scored, with explicit handling for ambiguous cases&lt;/strong&gt;. Not a single check.&lt;/p&gt;

&lt;p&gt;Here's the rough shape of the middleware I use. Adapt to your stack — this is Node, but the logic ports cleanly:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const geoip = require('geoip-lite'); // open-source, MaxMind GeoLite2 data

async function evaluateJurisdiction(req) {
  const signals = {
    ipCountry: null,
    headerCountry: null,
    declaredCountry: null,
    vpnLikely: false,
  };

  // Signal 1: server-side IP lookup (don't trust client headers alone)
  const ip = getClientIp(req); // see notes below on extracting this safely
  const lookup = geoip.lookup(ip);
  signals.ipCountry = lookup?.country || null;

  // Signal 2: edge/CDN-provided country header, if you have one
  // Useful as a cross-check, NOT as your only source
  signals.headerCountry = req.headers['x-edge-country'] || null;

  // Signal 3: account-declared jurisdiction from signup/KYC
  // This is the most reliable for authenticated users
  signals.declaredCountry = req.user?.declaredCountry || null;

  // Signal 4: VPN/proxy heuristics — ASN-based, since residential proxies
  // are detectable by the ASNs they route through
  signals.vpnLikely = await checkAsnAgainstVpnList(ip);

  return signals;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The key idea: collect signals first, decide second. The decision logic then looks something like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function shouldBlock(signals, restrictedCountries) {
  // Authenticated user with a declared country we restrict? Always block.
  // This is the strongest signal — they told us themselves.
  if (signals.declaredCountry &amp;amp;&amp;amp; restrictedCountries.includes(signals.declaredCountry)) {
    return { block: true, reason: 'declared_jurisdiction' };
  }

  // Two independent network signals agree on a restricted country? Block.
  const networkAgrees =
    signals.ipCountry === signals.headerCountry &amp;amp;&amp;amp;
    restrictedCountries.includes(signals.ipCountry);
  if (networkAgrees) {
    return { block: true, reason: 'ip_and_edge_agree' };
  }

  // VPN detected from a country we restrict, OR VPN with no other signal?
  // Force re-verification rather than silently allowing or blocking.
  if (signals.vpnLikely) {
    return { block: false, requireVerification: true, reason: 'vpn_detected' };
  }

  // IP says restricted but no corroborating signal — soft block with appeal path
  if (restrictedCountries.includes(signals.ipCountry)) {
    return { block: true, reason: 'ip_only', appealable: true };
  }

  return { block: false };
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Notice what this does that the naive version doesn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It distinguishes between "definitely restricted" and "probably restricted"&lt;/li&gt;
&lt;li&gt;It gives users a path to appeal a false positive&lt;/li&gt;
&lt;li&gt;It treats VPN traffic as a signal to verify, not silently allow&lt;/li&gt;
&lt;li&gt;It uses the user's own declared jurisdiction as the strongest signal when available&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting the client IP right
&lt;/h2&gt;

&lt;p&gt;One footgun worth calling out: extracting the actual client IP. If you naively use &lt;code&gt;req.ip&lt;/code&gt; behind a load balancer or CDN, you might get the proxy's IP. And if you blindly trust &lt;code&gt;X-Forwarded-For&lt;/code&gt;, anyone can spoof it.&lt;/p&gt;

&lt;p&gt;The pattern:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function getClientIp(req) {
  // X-Forwarded-For is a chain: client, proxy1, proxy2, ...
  // Trust only the rightmost N entries where N = number of YOUR trusted proxies
  const forwarded = req.headers['x-forwarded-for'];
  if (!forwarded) return req.socket.remoteAddress;

  const chain = forwarded.split(',').map(s =&amp;gt; s.trim());
  const trustedProxyCount = 1; // however many proxies sit in front of you
  const clientIndex = chain.length - 1 - trustedProxyCount;
  return chain[Math.max(0, clientIndex)] || req.socket.remoteAddress;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you're on Express, use &lt;code&gt;app.set('trust proxy', N)&lt;/code&gt; with the right number — the docs at &lt;a href="https://expressjs.com/en/guide/behind-proxies.html" rel="noopener noreferrer"&gt;expressjs.com&lt;/a&gt; cover this in more detail than I will here. The mistake I see most often is &lt;code&gt;trust proxy: true&lt;/code&gt;, which trusts everything in the chain and is exploitable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: things to bake in from day one
&lt;/h2&gt;

&lt;p&gt;A few things I wish I'd done sooner:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Log every blocked request with the signal breakdown.&lt;/strong&gt; When compliance asks "how do you verify users aren't in X?", you want auditable logs, not just "we have a check."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build the appeal flow before you need it.&lt;/strong&gt; Some percentage of your blocks will be wrong. Users hate hitting a wall with no recourse. A simple "verify with ID" path solves the majority of false positives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep your GeoIP database current.&lt;/strong&gt; &lt;a href="https://dev.maxmind.com/geoip/geolite2-free-geolocation-data" rel="noopener noreferrer"&gt;MaxMind's GeoLite2&lt;/a&gt; ships updates twice a week. If yours is six months stale, your accuracy is degrading silently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with VPNs as part of CI.&lt;/strong&gt; I have a smoke test that hits the staging environment through a few known VPN exit nodes. Catches regressions in the VPN detection logic faster than waiting for a user report.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The broader lesson: any time you're making a regulatory decision based on a network signal, you should assume the signal is partially wrong, partially gameable, and occasionally cached. Build for that, not against it. Your future self — and your compliance team — will thank you.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>security</category>
      <category>backend</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why your VPS might be part of a botnet — and how to find out</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Tue, 26 May 2026 03:02:13 +0000</pubDate>
      <link>https://dev.to/alanwest/why-your-vps-might-be-part-of-a-botnet-and-how-to-find-out-7l</link>
      <guid>https://dev.to/alanwest/why-your-vps-might-be-part-of-a-botnet-and-how-to-find-out-7l</guid>
      <description>&lt;p&gt;Last week I got a 3am email from my hosting provider. Subject: "Abuse report — your IP is participating in a DDoS." My first reaction was disbelief. My second was opening a laptop in bed and SSH'ing in like a chump.&lt;/p&gt;

&lt;p&gt;This happens more often than people admit. A server you set up six months ago, forgot about, and never patched becomes someone else's attack tool. Recent law enforcement actions against bulletproof hosting operations have made one thing clear — a lot of compromised infrastructure being used in attacks is just neglected developer boxes. Old WordPress installs, exposed Redis, weak SSH keys.&lt;/p&gt;

&lt;p&gt;Let's walk through how to actually figure out if your box is one of them, and what to do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptoms are quieter than you'd expect
&lt;/h2&gt;

&lt;p&gt;The Hollywood version of a compromised server is CPU pegged at 100% and obvious malware. Real life is duller. Modern attack toolkits are tuned to stay under the radar so the operator can keep using the box.&lt;/p&gt;

&lt;p&gt;What you'll actually see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slightly elevated baseline CPU (5-15% from nothing)&lt;/li&gt;
&lt;li&gt;Unexplained outbound traffic to weird ports&lt;/li&gt;
&lt;li&gt;New cron entries you didn't write&lt;/li&gt;
&lt;li&gt;SSH login attempts spiking from your own server's outbound logs (it's scanning for the next victim)&lt;/li&gt;
&lt;li&gt;Mysterious processes with names like &lt;code&gt;kdevtmpfsi&lt;/code&gt;, &lt;code&gt;xmrig&lt;/code&gt;, or randomly-generated 6-char names&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Look at outbound connections first
&lt;/h2&gt;

&lt;p&gt;Ingress filtering is easy. Egress is where most teams have nothing in place, and it's exactly where a compromised box gives itself away.&lt;/p&gt;

&lt;p&gt;Start with &lt;code&gt;ss&lt;/code&gt; — it's faster than &lt;code&gt;netstat&lt;/code&gt; and ships with iproute2 on most modern distros:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Show all established outbound TCP connections with the owning process&lt;/span&gt;
ss &lt;span class="nt"&gt;-tnpo&lt;/span&gt; state established

&lt;span class="c"&gt;# Group by remote IP to spot fan-out patterns&lt;/span&gt;
ss &lt;span class="nt"&gt;-tn&lt;/span&gt; state established | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'NR&amp;gt;1 {print $5}'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;: &lt;span class="nt"&gt;-f1&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; | &lt;span class="nb"&gt;uniq&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; | &lt;span class="nb"&gt;sort&lt;/span&gt; &lt;span class="nt"&gt;-rn&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see hundreds of connections to a single IP, or a wide spray to random IPs on port 22, 23, 80, or 445, that's scanner behavior. Legitimate apps usually talk to a small set of known endpoints.&lt;/p&gt;

&lt;p&gt;Next, pull the live process tree. I like &lt;code&gt;pstree -p&lt;/code&gt; because it shows parentage — a lot of malware spawns from cron or from a web server worker, and the parent process is the giveaway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pstree &lt;span class="nt"&gt;-panu&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look for processes whose parent is &lt;code&gt;cron&lt;/code&gt;, &lt;code&gt;sh&lt;/code&gt;, or your web server but whose command line is something opaque like a long base64 string or a binary in &lt;code&gt;/tmp&lt;/code&gt; or &lt;code&gt;/dev/shm&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Check the usual hiding spots
&lt;/h2&gt;

&lt;p&gt;Attackers are creatures of habit. Here's the rapid sweep I run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# World-writable temp dirs are the #1 dropzone&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-lat&lt;/span&gt; /tmp /var/tmp /dev/shm 2&amp;gt;/dev/null | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-30&lt;/span&gt;

&lt;span class="c"&gt;# Recently modified files in system bin dirs&lt;/span&gt;
find /usr/bin /usr/sbin /usr/local/bin &lt;span class="nt"&gt;-mtime&lt;/span&gt; &lt;span class="nt"&gt;-30&lt;/span&gt; &lt;span class="nt"&gt;-ls&lt;/span&gt; 2&amp;gt;/dev/null

&lt;span class="c"&gt;# Cron entries for every user (not just root)&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;u &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;: &lt;span class="nt"&gt;-f1&lt;/span&gt; /etc/passwd&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"--- &lt;/span&gt;&lt;span class="nv"&gt;$u&lt;/span&gt;&lt;span class="s2"&gt; ---"&lt;/span&gt;
  crontab &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$u&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt; 2&amp;gt;/dev/null
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Systemd timers and services added recently&lt;/span&gt;
systemctl list-timers &lt;span class="nt"&gt;--all&lt;/span&gt;
find /etc/systemd/system /lib/systemd/system &lt;span class="nt"&gt;-mtime&lt;/span&gt; &lt;span class="nt"&gt;-60&lt;/span&gt; &lt;span class="nt"&gt;-ls&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/dev/shm&lt;/code&gt; trick catches a lot of cryptominers — they drop the binary into shared memory because it's tmpfs (no disk writes, less forensic evidence) and runs from RAM.&lt;/p&gt;

&lt;p&gt;If you find a suspicious binary, before you delete it, get a hash and check it against VirusTotal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sha256sum&lt;/span&gt; /tmp/.suspicious_binary
&lt;span class="c"&gt;# Then upload the hash (not the file) to virustotal.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Trace the entry point with auditd
&lt;/h2&gt;

&lt;p&gt;This is the part most tutorials skip, and it's the most important. Cleaning the malware without knowing how it got in means you'll be doing this exact same dance next week.&lt;/p&gt;

&lt;p&gt;Install &lt;code&gt;auditd&lt;/code&gt; if you don't already have it, and set up rules to watch the obvious vectors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch execve syscalls — every program execution&lt;/span&gt;
auditctl &lt;span class="nt"&gt;-a&lt;/span&gt; always,exit &lt;span class="nt"&gt;-F&lt;/span&gt; &lt;span class="nb"&gt;arch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;b64 &lt;span class="nt"&gt;-S&lt;/span&gt; execve &lt;span class="nt"&gt;-k&lt;/span&gt; exec_trace

&lt;span class="c"&gt;# Watch writes to common malware dropzones&lt;/span&gt;
auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /tmp &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; tmp_writes
auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /dev/shm &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; shm_writes

&lt;span class="c"&gt;# Watch SSH key file changes (very common persistence trick)&lt;/span&gt;
auditctl &lt;span class="nt"&gt;-w&lt;/span&gt; /root/.ssh/authorized_keys &lt;span class="nt"&gt;-p&lt;/span&gt; wa &lt;span class="nt"&gt;-k&lt;/span&gt; ssh_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then search the existing logs for evidence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Look for shell execution chains from web server users&lt;/span&gt;
ausearch &lt;span class="nt"&gt;-k&lt;/span&gt; exec_trace &lt;span class="nt"&gt;--start&lt;/span&gt; recent | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'uid=(33|www-data|nginx|apache)'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my 3am incident, this is what cracked it — a PHP file in an abandoned WordPress install was being POSTed to, spawning &lt;code&gt;/bin/sh&lt;/code&gt;, which pulled down a payload via &lt;code&gt;curl&lt;/code&gt;. Classic webshell chain. The site hadn't been touched in two years.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Fix it properly
&lt;/h2&gt;

&lt;p&gt;Don't just &lt;code&gt;kill -9&lt;/code&gt; and move on. The malware will respawn from whatever persistence hook it installed. Order of operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Snapshot first.&lt;/strong&gt; If your provider supports it, take a disk snapshot before you change anything. You may need it for forensics or to satisfy an abuse report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cut network egress&lt;/strong&gt; before killing processes. A common mistake is killing the miner first, which triggers a watchdog reinstall. Block outbound first:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Drop all outbound traffic except SSH from your management IP&lt;/span&gt;
nft add table inet filter
nft add chain inet filter output &lt;span class="o"&gt;{&lt;/span&gt; &lt;span class="nb"&gt;type &lt;/span&gt;filter hook output priority 0 &lt;span class="se"&gt;\;&lt;/span&gt; policy drop &lt;span class="se"&gt;\;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
nft add rule inet filter output ct state established,related accept
nft add rule inet filter output oifname lo accept
&lt;span class="c"&gt;# Add your specific allowlist rules here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Kill the processes and remove persistence.&lt;/strong&gt; Cron, systemd units, &lt;code&gt;~/.bashrc&lt;/code&gt; lines, &lt;code&gt;/etc/ld.so.preload&lt;/code&gt;, modified SSH &lt;code&gt;authorized_keys&lt;/code&gt;. Check all of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate every credential&lt;/strong&gt; that touched the box. SSH keys, API tokens in env files, database creds, cloud provider credentials. Assume they're all in someone's wallet now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reinstall, don't clean.&lt;/strong&gt; I know. It's annoying. But a rootkit you didn't find will outlive your cleanup. If the box held anything sensitive, the right move is destroy and rebuild from a known-good config.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prevention: stop being an easy target
&lt;/h2&gt;

&lt;p&gt;Most compromises I've cleaned up came from a small handful of root causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Unattended upgrades disabled.&lt;/strong&gt; Turn them on. &lt;code&gt;unattended-upgrades&lt;/code&gt; on Debian/Ubuntu, &lt;code&gt;dnf-automatic&lt;/code&gt; on RHEL-likes. Yes it's a small risk. Running months-old kernels is a much bigger one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSH password auth enabled.&lt;/strong&gt; Set &lt;code&gt;PasswordAuthentication no&lt;/code&gt; and use keys. Add &lt;code&gt;fail2ban&lt;/code&gt; for the noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No egress filtering.&lt;/strong&gt; Default-deny outbound is annoying to set up but it's the single thing that would have prevented every botnet enrollment I've seen. Even a basic rule blocking outbound to common scan ports (22, 23, 445, 3389) from anywhere but a specific allowlist would catch most of them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgotten services.&lt;/strong&gt; That staging box. That old CMS. That Redis you exposed for "five minutes" to test something. Run &lt;code&gt;nmap&lt;/code&gt; against your own external IPs once a quarter. You will be surprised.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boring stuff genuinely works. I've stopped chasing exotic hardening guides and just kept a checklist of these basics for every new box. The 3am abuse emails have gone to zero.&lt;/p&gt;

</description>
      <category>security</category>
      <category>devops</category>
      <category>linux</category>
      <category>sysadmin</category>
    </item>
    <item>
      <title>How to Fix Tool-Use Loops in Autonomous Coding Agents</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Tue, 26 May 2026 01:38:16 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-fix-tool-use-loops-in-autonomous-coding-agents-540e</link>
      <guid>https://dev.to/alanwest/how-to-fix-tool-use-loops-in-autonomous-coding-agents-540e</guid>
      <description>&lt;p&gt;Last month I was helping a friend debug their autonomous coding agent. It had been "working" on a task for 47 minutes, burned through roughly twelve bucks in API costs, and somehow ended up exactly where it started. The logs showed it had called &lt;code&gt;read_file&lt;/code&gt; on the same five files 23 times.&lt;/p&gt;

&lt;p&gt;If you've built or experimented with AI coding agents, you've probably seen something like this. It's not a fun bug to debug — the agent isn't crashing, it isn't erroring, it just... never finishes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Why Agents Loop Forever
&lt;/h2&gt;

&lt;p&gt;Tool-use loops are the most expensive failure mode in agent design. From the outside, the agent looks busy. It's reading files, calling tools, generating thoughts, producing output. But it's not making progress toward the goal.&lt;/p&gt;

&lt;p&gt;The shape is almost always the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent reads file A&lt;/li&gt;
&lt;li&gt;Agent realizes it needs context from file B&lt;/li&gt;
&lt;li&gt;Reads file B, gets confused by something unexpected&lt;/li&gt;
&lt;li&gt;Goes back to file A "to double-check"&lt;/li&gt;
&lt;li&gt;Reads file B again because file A didn't have what it needed&lt;/li&gt;
&lt;li&gt;Repeat until your wallet cries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've now seen this in three different agent setups across two side projects and one client engagement. The symptoms are identical every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root Cause: Stateless Decision-Making
&lt;/h2&gt;

&lt;p&gt;The fundamental issue is that the agent's working state looks nearly identical at step N and step N+5. Same task description in the system prompt, same files implicitly available, same general feel of the conversation. So the model — given essentially the same inputs — makes essentially the same decision.&lt;/p&gt;

&lt;p&gt;There are three concrete causes worth separating:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No explicit action history.&lt;/strong&gt; The agent has called &lt;code&gt;read_file("config.yaml")&lt;/code&gt; four times, but each turn the model mostly "sees" the latest tool result, not the pattern of what it's already tried.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No reflection step.&lt;/strong&gt; Nothing in the loop ever asks "am I actually making progress?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Errors get summarized away.&lt;/strong&gt; A tool failure gets compressed into a vague "the previous call had an issue" and the model retries with the same broken inputs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's walk through fixing each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Track Tool Calls Explicitly
&lt;/h2&gt;

&lt;p&gt;Don't rely on the conversation history to encode what's been tried. Build a structured log the model can actually reason about.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ToolCallLog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Counts repeated (tool_name, args) pairs so we can detect loops
&lt;/span&gt;    &lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;_hash_args&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# stable hash of args
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;args&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result_preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;summary_for_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Surface repeated calls so the model SEES the loop forming
&lt;/span&gt;        &lt;span class="n"&gt;repeated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;repeated&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No repeated tool calls so far.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;- &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; called &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="nf"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;repeated&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Repeated calls detected:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then inject &lt;code&gt;log.summary_for_model()&lt;/code&gt; into the system prompt every turn. Suddenly the model can see that it's about to call &lt;code&gt;read_file("config.yaml")&lt;/code&gt; for the fifth time, and most modern models will course-correct on their own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Add a Loop Detector
&lt;/h2&gt;

&lt;p&gt;Don't trust the model to always notice. Add a circuit breaker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;MAX_IDENTICAL_CALLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="n"&gt;MAX_TOTAL_STEPS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;should_force_reflection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolCallLog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Return a reflection prompt if we detect a loop, else None
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;calls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_IDENTICAL_CALLS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;
            &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve called &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; with the same args &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; times. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;This is a loop. Stop and explain in one sentence what you &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;actually need, then choose a different strategy.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_TOTAL_STEPS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ve taken many steps without finishing. Summarize what you &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;know, what you still need, and propose a single next action.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When this triggers, inject the returned string as a user message before the next model call. I've found this single change cuts wasted tokens by something like half on the workflows I've tested. Your mileage will vary, but the direction is consistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Force Reflection on a Schedule
&lt;/h2&gt;

&lt;p&gt;Even without a detected loop, models drift on long tasks. A periodic forced reflection helps. The cadence I've landed on is every 8–10 tool calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;REFLECTION_INTERVAL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;maybe_reflect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;REFLECTION_INTERVAL&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Pause. Original task: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;In 3 short bullets, answer:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1. What have I actually accomplished?&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2. What is still blocking completion?&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3. Is my current approach working, or should I change it?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is borrowed from human pair programming — "hey, where are we?" every so often is healthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Make Errors Loud
&lt;/h2&gt;

&lt;p&gt;The last fix is the most boring but probably the most important. When a tool fails, don't soften the error message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;format_tool_error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Be specific about what failed. Generic errors invite retries.
&lt;/span&gt;    &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL ERROR: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; failed with &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Inputs were: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Do NOT retry with identical arguments. Either fix the inputs &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;or choose a different tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "Do NOT retry with identical arguments" line sounds silly but actually moves the needle. I tested with and without it on the same task three times — without it, the agent retried failing calls about 60% of the time. With it, closer to 10%. Tiny sample size, but the effect was obvious.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: Design Choices That Help
&lt;/h2&gt;

&lt;p&gt;A few patterns I now reach for by default when building agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cap context per tool.&lt;/strong&gt; Truncate &lt;code&gt;read_file&lt;/code&gt; results to the relevant section instead of dumping whole files. Less noise, more signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use scratchpad files.&lt;/strong&gt; Give the agent a &lt;code&gt;notes.md&lt;/code&gt; it can write to. Externalized memory is cheaper than re-deriving state from chat history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Separate planning from execution.&lt;/strong&gt; A small "planner" call that emits a 5-step plan, followed by an executor that follows it, loops far less than a single agent doing both.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log everything during development.&lt;/strong&gt; You cannot debug what you cannot see. Persist full tool histories to disk for the first few weeks of any new agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is novel — the broader agent research community has been writing about reflection, planning, and memory for a while. But it's easy to skip these when you're hacking together a prototype and assume "the model will figure it out." It won't. Not reliably.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Tool-use loops are not a model problem so much as a harness problem. The model is doing exactly what you'd expect given identical inputs every turn. Your job, as the person building the loop around the model, is to make sure the inputs aren't identical — that the agent can see its own history, get nudged when it's stuck, and feel the weight of its errors.&lt;/p&gt;

&lt;p&gt;Fix those four things and most of your runaway agent costs go away. The rest is just tuning.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>debugging</category>
    </item>
    <item>
      <title>How to Work Around MySQL's View Subquery Limitation (Bug #11472)</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Tue, 26 May 2026 01:01:45 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-work-around-mysqls-view-subquery-limitation-bug-11472-3k27</link>
      <guid>https://dev.to/alanwest/how-to-work-around-mysqls-view-subquery-limitation-bug-11472-3k27</guid>
      <description>&lt;h2&gt;
  
  
  That Moment Your VIEW Refuses to Compile
&lt;/h2&gt;

&lt;p&gt;You're refactoring a reporting query. It's gnarly — three layers of aggregation, a couple of LEFT JOINs, the works. You wrap it in a subquery in the FROM clause, run it, get the right numbers. Nice. Now you decide to encapsulate the whole thing in a VIEW so the analytics folks don't have to copy-paste it.&lt;/p&gt;

&lt;p&gt;You run &lt;code&gt;CREATE VIEW&lt;/code&gt;. MySQL throws this in your face:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR 1349 (HY000): View's SELECT contains a subquery in the FROM clause
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you've worked with MySQL long enough, this error has shown up at least once. It's the symptom of one of MySQL's oldest documented limitations — tracked as &lt;a href="https://bugs.mysql.com/bug.php?id=11472" rel="noopener noreferrer"&gt;Bug #11472&lt;/a&gt; — and developers have been finding workarounds for it for two decades.&lt;/p&gt;

&lt;p&gt;I want to walk through why this happens, how to work around it cleanly with the tools that exist today, and what the reportedly-landed fix might change for your codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Going On
&lt;/h2&gt;

&lt;p&gt;MySQL's VIEW implementation resolves views in one of two modes: MERGE and TEMPTABLE. MERGE rewrites the view's definition into the calling query. TEMPTABLE materializes the view into a temporary table first.&lt;/p&gt;

&lt;p&gt;The catch: a subquery in the FROM clause — what the SQL spec calls a &lt;em&gt;derived table&lt;/em&gt; — historically couldn't sit inside a view definition. The parser would let you create some shapes, but as soon as it hit a &lt;code&gt;FROM (SELECT ...) AS something&lt;/code&gt; inside &lt;code&gt;CREATE VIEW&lt;/code&gt;, you'd get error 1349.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/create-view.html" rel="noopener noreferrer"&gt;official VIEW docs&lt;/a&gt; describe the restriction. According to community discussion, a fix has reportedly landed — I haven't tested it against the official changelog myself, so I'm hedging there. Either way, the workarounds below are what most production codebases have been using.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Classic Workaround: Nested Views
&lt;/h2&gt;

&lt;p&gt;The most common trick is to split the view into two. The inner subquery becomes its own view, and the outer view selects from it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Step 1: the inner view does the heavy lifting&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;order_totals_per_customer&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Step 2: the outer view joins to the inner one&lt;/span&gt;
&lt;span class="c1"&gt;-- (instead of inlining a derived table)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;customer_revenue_report&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lifetime_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;order_totals_per_customer&lt;/span&gt; &lt;span class="n"&gt;ot&lt;/span&gt;
    &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works. It's also a pain — you now have two views to maintain, and the names tend to multiply when the query is more complex. I've seen schemas with 40+ "helper" views that exist solely to dodge this restriction.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Cleaner Approach: Common Table Expressions
&lt;/h2&gt;

&lt;p&gt;If you're on MySQL 8.0 or later, CTEs are your friend. They were added in 8.0 (see the &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/with.html" rel="noopener noreferrer"&gt;WITH syntax docs&lt;/a&gt;) and they're allowed inside view definitions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="n"&gt;customer_revenue_report&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;order_totals&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt;
        &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_count&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
    &lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lifetime_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;LEFT&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;order_totals&lt;/span&gt; &lt;span class="n"&gt;ot&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;ot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same logic, one view, no extra schema noise. I migrated a reporting pipeline from the nested-view approach to CTEs last year — went from 18 supporting views to four. Made the on-call rotation much happier.&lt;/p&gt;

&lt;p&gt;A couple of caveats from that migration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTEs in MySQL aren't always inlined the way you'd expect. The optimizer can choose to materialize them, and that may or may not be what you want for a given workload.&lt;/li&gt;
&lt;li&gt;If you need recursion (parent/child trees, for example), &lt;code&gt;WITH RECURSIVE&lt;/code&gt; works inside views too — just watch the &lt;code&gt;cte_max_recursion_depth&lt;/code&gt; setting.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Debugging When You Still Hit Error 1349
&lt;/h2&gt;

&lt;p&gt;If you've wrapped your query in a CTE and you're &lt;em&gt;still&lt;/em&gt; hitting the error, there are a few usual suspects:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;You're on MySQL 5.7 or earlier.&lt;/strong&gt; CTEs are 8.0+. There's no backport. Upgrade or stick with the nested-view workaround.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The view definer lacks permission on the underlying tables.&lt;/strong&gt; This isn't error 1349 specifically, but it shows up at view-creation time and the message can be misleading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You've got a derived table hiding inside the CTE.&lt;/strong&gt; A correlated subquery in the SELECT list is fine; a &lt;code&gt;FROM (SELECT ...)&lt;/code&gt; inside the CTE body can still trip the historic restriction.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To check which algorithm MySQL picked for your view, query &lt;code&gt;information_schema.VIEWS&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;is_updatable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- 'MERGE', 'TEMPTABLE', or 'UNDEFINED'&lt;/span&gt;
    &lt;span class="n"&gt;algorithm&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;information_schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;views&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;table_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see &lt;code&gt;TEMPTABLE&lt;/code&gt; where you expected &lt;code&gt;MERGE&lt;/code&gt;, that's a hint that something in the view forced materialization — non-deterministic functions, aggregates, &lt;code&gt;DISTINCT&lt;/code&gt;, or until recently, a derived table.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Fix Might Change
&lt;/h2&gt;

&lt;p&gt;Based on what's circulating in the bug tracker thread, the fix reportedly relaxes the derived-table restriction inside view definitions. I'll be honest — I haven't tested it thoroughly yet, and the exact MySQL version that ships the fix matters a lot for whether you can adopt it.&lt;/p&gt;

&lt;p&gt;A few things I'd want to verify before refactoring production code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which exact MySQL release contains the change (check the official changelog, not blog posts)&lt;/li&gt;
&lt;li&gt;Whether &lt;code&gt;ALGORITHM = MERGE&lt;/code&gt; works with derived tables, or if everything falls back to TEMPTABLE&lt;/li&gt;
&lt;li&gt;How the optimizer handles the inlined derived table compared to an equivalent CTE&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Until you can confirm those on your own deployment, the CTE pattern above is the safer bet. It's been in 8.0 since launch and the behavior is well-documented.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: Habits That Save Future-You
&lt;/h2&gt;

&lt;p&gt;A few rules I've ended up living by after too many "why doesn't this view work" debugging sessions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write the query as a plain SELECT first&lt;/strong&gt;, get it returning the right rows, then wrap it in a view. Don't debug DDL and query logic at the same time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prefer CTEs over nested views&lt;/strong&gt; when you're on 8.0+. They're easier to read and they don't pollute the schema.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check &lt;code&gt;algorithm&lt;/code&gt; in information_schema.&lt;/strong&gt; If TEMPTABLE shows up where you didn't expect it, your view won't be updatable and may not perform the way the planner makes it look.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't bury joins inside derived tables&lt;/strong&gt; when a CTE would do the same job. It reads better and the optimizer has more to work with.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Twenty years is a long time for a bug to sit open. The workarounds are well-worn, the tooling around CTEs is solid, and most of the pain this caused has been quietly absorbed into MySQL muscle memory at this point. Worth knowing both the old patterns and the new ones — you'll run into both in any sufficiently old codebase.&lt;/p&gt;

</description>
      <category>mysql</category>
      <category>database</category>
      <category>sql</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why your browser multitrack audio drifts out of sync (and how to fix it)</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Tue, 26 May 2026 00:08:02 +0000</pubDate>
      <link>https://dev.to/alanwest/why-your-browser-multitrack-audio-drifts-out-of-sync-and-how-to-fix-it-48l5</link>
      <guid>https://dev.to/alanwest/why-your-browser-multitrack-audio-drifts-out-of-sync-and-how-to-fix-it-48l5</guid>
      <description>&lt;p&gt;If you've ever tried to build anything more ambitious than a single &lt;code&gt;&amp;lt;audio&amp;gt;&lt;/code&gt; tag in the browser, you've probably hit this wall: you start two or three audio tracks at "the same time", and within 30 seconds they sound like a drunk wedding band. Drums leading the bass by 80ms, vocals lagging the guitar, the whole thing falling apart.&lt;/p&gt;

&lt;p&gt;I hit this exact problem on a project last month — building a small multitrack practice tool for a friend who teaches guitar. First version used &lt;code&gt;&amp;lt;audio&amp;gt;&lt;/code&gt; elements and &lt;code&gt;play()&lt;/code&gt; calls in a loop. It worked beautifully for about 4 seconds before the tracks started smearing.&lt;/p&gt;

&lt;p&gt;Let's dig into why this happens, and how to actually solve it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The root cause: two clocks, no coordination
&lt;/h2&gt;

&lt;p&gt;The core problem is that the browser has multiple timing systems and they do not agree with each other. When you call &lt;code&gt;audio.play()&lt;/code&gt; on an HTMLAudioElement, you're at the mercy of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The main thread event loop (which can stall during GC, layout, anything)&lt;/li&gt;
&lt;li&gt;The media element's internal scheduler (which buffers independently per element)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Date.now()&lt;/code&gt; / &lt;code&gt;performance.now()&lt;/code&gt; (wall clock, not audio clock)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each &lt;code&gt;&amp;lt;audio&amp;gt;&lt;/code&gt; element schedules itself against its own internal clock. There is no shared timebase. So when you do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This is the trap. Looks correct. Is not.&lt;/span&gt;
&lt;span class="nx"&gt;track1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;play&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;track2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;play&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;track3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;play&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three &lt;code&gt;play()&lt;/code&gt; calls return immediately, but each element starts producing samples whenever its decoder feels ready. On Chrome the gap might be 5ms. On Firefox under load, 40ms. And the elements keep drifting because they aren't slaved to the same sample clock.&lt;/p&gt;

&lt;p&gt;A 1ms drift on a kick drum is audible. A 10ms drift is unusable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: schedule against the AudioContext clock
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API" rel="noopener noreferrer"&gt;Web Audio API&lt;/a&gt; was designed specifically to solve this. The &lt;code&gt;AudioContext&lt;/code&gt; exposes a single, sample-accurate clock (&lt;code&gt;context.currentTime&lt;/code&gt;), and every source node you create is scheduled against that clock with sub-millisecond precision.&lt;/p&gt;

&lt;p&gt;The trick is to decode your audio into &lt;code&gt;AudioBuffer&lt;/code&gt; objects up front, then schedule playback at a single future timestamp.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// One context = one shared clock for everything&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AudioContext&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadTrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arrayBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="c1"&gt;// decodeAudioData returns a fully decoded buffer in memory&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decodeAudioData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;drums&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;bass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;guitar&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nf"&gt;loadTrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/drums.wav&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nf"&gt;loadTrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/bass.wav&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nf"&gt;loadTrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/guitar.wav&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now here's the part most tutorials get wrong. You don't start the sources at &lt;code&gt;ctx.currentTime&lt;/code&gt;. You schedule them slightly in the future, so all three &lt;code&gt;start()&lt;/code&gt; calls definitely land before the playback time arrives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;playSynced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 100ms lookahead. Long enough to absorb main-thread jitter,&lt;/span&gt;
  &lt;span class="c1"&gt;// short enough that the user does not perceive delay.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;src&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createBufferSource&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="c1"&gt;// All three sources are armed against the SAME future timestamp&lt;/span&gt;
    &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startAt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;sources&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The magic is in that shared &lt;code&gt;startAt&lt;/code&gt; value. The audio thread (which runs at high priority, separate from the main thread) sees three sources all scheduled for the same sample, and starts them in lockstep. No drift. Ever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Per-track gain, mute, and solo
&lt;/h2&gt;

&lt;p&gt;Once you have the basics, you almost always want per-track volume control. Insert a &lt;code&gt;GainNode&lt;/code&gt; between each source and the destination:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createTrack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createGain&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Set volume with a tiny ramp to avoid clicks on sudden changes&lt;/span&gt;
    &lt;span class="nf"&gt;setVolume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setTargetAtTime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;currentTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nf"&gt;mute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setVolume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="nf"&gt;unmute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setVolume&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note &lt;code&gt;setTargetAtTime&lt;/code&gt; instead of assigning &lt;code&gt;gain.gain.value&lt;/code&gt; directly. Direct assignment causes a zipper-noise click because the value jumps between sample frames. The exponential ramp smooths it out over ~10ms.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gotcha nobody mentions: source nodes are one-shot
&lt;/h2&gt;

&lt;p&gt;This tripped me up for an embarrassing amount of time. An &lt;code&gt;AudioBufferSourceNode&lt;/code&gt; can only be &lt;code&gt;start()&lt;/code&gt;ed once. Ever. If you stop playback and want to play again, you have to create a new source node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Track&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createGain&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;play&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startAt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Always create a fresh source — they are disposable&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createBufferSource&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gain&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;startAt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;AudioBuffer&lt;/code&gt; is the expensive thing — decoded PCM data sitting in memory. The source node is cheap, ephemeral plumbing. Treat them differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: a checklist
&lt;/h2&gt;

&lt;p&gt;After shipping a couple of multitrack browser tools, here's what I check before any audio code goes anywhere near production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One AudioContext per app.&lt;/strong&gt; Creating multiple contexts means multiple clocks, which defeats the entire point.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decode once, play many.&lt;/strong&gt; Hold &lt;code&gt;AudioBuffer&lt;/code&gt; objects in memory; never re-decode on each play.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always schedule with lookahead.&lt;/strong&gt; Never &lt;code&gt;start(ctx.currentTime)&lt;/code&gt;. Use at least &lt;code&gt;ctx.currentTime + 0.05&lt;/code&gt; to absorb jitter.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resume the context on user gesture.&lt;/strong&gt; Browsers ship &lt;code&gt;AudioContext&lt;/code&gt; in &lt;code&gt;suspended&lt;/code&gt; state. Call &lt;code&gt;ctx.resume()&lt;/code&gt; inside a click handler or playback silently fails.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;setTargetAtTime&lt;/code&gt; for parameter changes.&lt;/strong&gt; Direct assignment to &lt;code&gt;.value&lt;/code&gt; clicks. Ramps don't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch your buffer sizes.&lt;/strong&gt; A decoded stereo 44.1kHz minute is about 10MB in memory. Long sessions with many tracks add up fast.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Web Audio API has a steep initial learning curve because it inverts how you usually think about timing — you describe &lt;em&gt;when&lt;/em&gt; things should happen, then let the audio thread execute it, instead of imperatively saying &lt;em&gt;do it now&lt;/em&gt;. Once that clicks, the drift problems disappear entirely.&lt;/p&gt;

&lt;p&gt;For a deeper dive into scheduling patterns, Chris Wilson's &lt;a href="https://web.dev/articles/audio-scheduling" rel="noopener noreferrer"&gt;A Tale of Two Clocks&lt;/a&gt; is still the canonical reference and worth bookmarking. It's the article that finally made all of this make sense to me.&lt;/p&gt;

</description>
      <category>webaudio</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>audio</category>
    </item>
    <item>
      <title>Why LLM Coding Agents Drift on Long Back End Tasks (and How to Fix It)</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Mon, 25 May 2026 23:42:42 +0000</pubDate>
      <link>https://dev.to/alanwest/why-llm-coding-agents-drift-on-long-back-end-tasks-and-how-to-fix-it-53n2</link>
      <guid>https://dev.to/alanwest/why-llm-coding-agents-drift-on-long-back-end-tasks-and-how-to-fix-it-53n2</guid>
      <description>&lt;p&gt;Last month I spent three days debugging a Django service where the AI agent had written... mostly correct code. The endpoints worked. The tests passed. But somewhere around the fourth file, it had quietly dropped a database transaction wrapper around a multi-step write. By file seven, it had forgotten that one of the models required tenant scoping.&lt;/p&gt;

&lt;p&gt;This is constraint decay. And once you start watching for it, you see it everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  What constraint decay actually is
&lt;/h2&gt;

&lt;p&gt;When you hand an LLM agent a backend task, you give it a pile of constraints. Some are explicit (use this ORM, scope by &lt;code&gt;tenant_id&lt;/code&gt;, wrap writes in transactions). Some are implicit (auth middleware applies to all routes, errors map to specific status codes). Early in the task, those constraints are fresh in context and the agent honors them.&lt;/p&gt;

&lt;p&gt;As the task drags on, something predictable happens. The agent generates more code. That generated code pushes the original constraints further from the attention window. By the time it's writing the eighth function, the original instructions are competing with thousands of tokens of its own output for attention weight. Constraints fade. Output drifts.&lt;/p&gt;

&lt;p&gt;I should say upfront: I haven't read every paper on this in detail, and recent work like the &lt;em&gt;Constraint Decay&lt;/em&gt; preprint on arXiv is still being discussed. But the phenomenon itself is reproducible at home. Build a long enough agent loop with enough constraints and you'll watch it happen on your own machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Root cause: it's not memory, it's signal-to-noise
&lt;/h2&gt;

&lt;p&gt;The first instinct when you see drift is "well, just put it in the context window." Modern models have huge context windows. But window size isn't really the issue.&lt;/p&gt;

&lt;p&gt;The issue is that attention is a softmax over the entire context. When your system prompt is 200 tokens and the surrounding generated code is 8000 tokens of similar-looking function names, types, and patterns, the relative weight on the constraint shrinks. The constraint is &lt;em&gt;present&lt;/em&gt;. It's just not &lt;em&gt;salient&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;You can verify this with a quick experiment. Give an agent a constraint like "every database write must go through &lt;code&gt;audit_log()&lt;/code&gt;." Have it write five files. By file four, direct writes will often sneak in. Re-prompting with just the original constraint restores compliance immediately. The constraint never left the model — the model just stopped weighting it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-step fix
&lt;/h2&gt;

&lt;p&gt;Here's the pattern I've landed on after maybe a dozen agent-driven projects this year. It's not perfect. It does cut drift significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Externalize constraints as checked artifacts
&lt;/h3&gt;

&lt;p&gt;Don't rely on the agent remembering. Make the constraint a thing you can mechanically verify.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# constraints.py — source of truth for cross-cutting rules
&lt;/span&gt;&lt;span class="n"&gt;INVARIANTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_scoping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# any query on a multi-tenant model must include tenant_id
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applies_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Subscription&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;every .filter() / .get() includes tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audit_log&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# mutations to sensitive tables must be logged
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;applies_to&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;billing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;calls audit_log() before commit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then write a small AST-walking linter that checks these. Now the constraint has a teeth-having enforcer that doesn't decay.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Chunk the work, refresh between chunks
&lt;/h3&gt;

&lt;p&gt;Long single-shot generation is where decay is worst. Break the task into chunks, and between chunks, replay the relevant constraints.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decompose_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Constraints go near the top, fresh, every chunk
&lt;/span&gt;        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;prior_summary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# summary, not full output
&lt;/span&gt;            &lt;span class="n"&gt;current_chunk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key move is &lt;code&gt;summarize(results)&lt;/code&gt; instead of dumping all prior code. A summary preserves the architectural decisions without crowding the constraint with thousands of code tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Use a separate constraint-check pass
&lt;/h3&gt;

&lt;p&gt;After every chunk, run a separate, narrow LLM call whose only job is to check the new code against the constraints. Single responsibility, fresh context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check this code against the constraints below. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For each constraint, answer PASS or FAIL with one line of evidence.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONSTRAINTS:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;format_constraints&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;constraints&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CODE:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;generated_code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;narrow_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# smaller, cheaper model is fine
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is much more reliable than asking the main agent to self-check, because the checker isn't carrying the cognitive load of generation. Its context is short, its attention is undivided.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Make violations fail loud
&lt;/h3&gt;

&lt;p&gt;When the checker finds a violation, don't try to "patch" the offending file. Roll back the chunk and regenerate with the violated constraint pinned at the very top — sometimes repeated. Repetition is ugly but it works. Models weight constraints that appear multiple times more heavily.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;regenerate_with_emphasis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;violations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;emphasized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRITICAL CONSTRAINT (do not violate): &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;violations&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Yes, we repeat. Yes, it helps.
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;emphasized&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;emphasized&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Keep human review on the constraint surface, not the code
&lt;/h3&gt;

&lt;p&gt;This is the part people skip. You don't need to review every line the agent writes. You need to review the &lt;em&gt;constraint set&lt;/em&gt; and the &lt;em&gt;checker&lt;/em&gt;. If those two are correct, drift is bounded.&lt;/p&gt;

&lt;p&gt;I have a habit now of starting every agent project by writing the constraints file first. Before any code. It feels weird because you're writing rules against code that doesn't exist yet, but it forces you to articulate the invariants up front, while you're still thinking clearly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention: design for bounded drift
&lt;/h2&gt;

&lt;p&gt;A few patterns that keep the problem small in the first place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prefer narrow tasks.&lt;/strong&gt; "Add a new endpoint" is bounded. "Build the whole admin panel" is not. Decay scales with task length, so shorter tasks decay less.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use typed interfaces aggressively.&lt;/strong&gt; When the agent has to satisfy a type signature, the type acts as a local, always-visible constraint. Tools like &lt;a href="https://mypy.readthedocs.io/" rel="noopener noreferrer"&gt;mypy&lt;/a&gt; or TypeScript catch a surprising amount of drift for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lean on existing scaffolding.&lt;/strong&gt; If your codebase has a &lt;code&gt;BaseRepository&lt;/code&gt; that already enforces tenant scoping, the agent inherits the constraint by inheritance. The framework remembers what the agent forgets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust passing tests.&lt;/strong&gt; An agent that wrote both the code and the tests has aligned them to each other. Run the actual app. Hit the endpoints with &lt;code&gt;curl&lt;/code&gt;. Check the database directly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The honest summary: LLM agents are fantastic at the first hour of a task and progressively worse at the fourth. If you architect your workflow around that reality — short chunks, external constraints, mechanical checking — the drift becomes manageable. If you treat the agent like a junior dev who remembers everything you said, you'll be debugging silent constraint violations for days.&lt;/p&gt;

&lt;p&gt;Three days, in my case. I've structured my projects differently since.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>backend</category>
      <category>debugging</category>
    </item>
    <item>
      <title>How to Fix Context Loss in Multi-Step AI Agent Workflows</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Mon, 25 May 2026 19:56:13 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-fix-context-loss-in-multi-step-ai-agent-workflows-2343</link>
      <guid>https://dev.to/alanwest/how-to-fix-context-loss-in-multi-step-ai-agent-workflows-2343</guid>
      <description>&lt;p&gt;I spent last weekend debugging an agent that kept forgetting what it was doing. It would happily call three tools in sequence, then on the fourth one... blank stare. Wrong arguments. Hallucinated file paths. The classic "who am I, where am I, what was I doing" moment.&lt;/p&gt;

&lt;p&gt;If you've built anything that chains together more than a couple of tool calls, you've probably seen this. The agent starts strong, makes a plan, and then somewhere around step three or four it starts looking like it took a nap and woke up in someone else's session.&lt;/p&gt;

&lt;p&gt;Let's talk about why this happens and how to actually fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Your Agent Has Amnesia
&lt;/h2&gt;

&lt;p&gt;Here's the symptom. You build an agent skill that's supposed to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read a config file&lt;/li&gt;
&lt;li&gt;Validate it against a schema&lt;/li&gt;
&lt;li&gt;Apply a transformation&lt;/li&gt;
&lt;li&gt;Write the result back&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 1 and 2 work fine. By step 3, the agent has lost track of the variable names it pulled out in step 1. By step 4, it's writing to the wrong path entirely. You add logging. You add retries. You add a stern system prompt that says "DO NOT FORGET THE FILENAME." Nothing helps.&lt;/p&gt;

&lt;p&gt;I ran into this on a code review agent I was building. It would read a file, identify three issues, then when asked to apply fixes, it would invent issues that didn't exist. The original analysis just... evaporated.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause: Tool Calls Are Stateless
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody tells you when you start building agents: &lt;strong&gt;the model itself has no memory between tool calls&lt;/strong&gt;. The only "memory" is the message history you keep sending back.&lt;/p&gt;

&lt;p&gt;When the agent calls a tool, here's what happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model sees the entire message history&lt;/li&gt;
&lt;li&gt;It picks a tool and arguments based on what it sees&lt;/li&gt;
&lt;li&gt;The tool runs, returns a result&lt;/li&gt;
&lt;li&gt;The result gets appended to the history&lt;/li&gt;
&lt;li&gt;The model sees the updated history and picks the next move&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if step 1 returned a giant blob of JSON, step 2 returned another blob, step 3's context window is now mostly old tool output. The model's attention is spread thin. Important details — like the filename you extracted in step 1 — get drowned in noise.&lt;/p&gt;

&lt;p&gt;This isn't a bug. It's how the architecture works. Each turn is a fresh inference over a growing message log.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-Step Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Externalize State Into a Scratchpad
&lt;/h3&gt;

&lt;p&gt;The single most effective fix I've found is to give the agent an explicit "scratchpad" — a small structured object the agent reads from and writes to between steps. Don't rely on the model to remember things. Make it write them down.&lt;/p&gt;

&lt;p&gt;Here's a minimal version in Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;asdict&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentScratchpad&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Persistent across all tool calls in this run
&lt;/span&gt;    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;
    &lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;decisions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;to_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Inject this back into the system prompt every turn
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CURRENT STATE:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Two tools the agent uses to manage its own memory
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Saved &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decisions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Decision logged&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key move: you expose &lt;code&gt;remember&lt;/code&gt; and &lt;code&gt;decide&lt;/code&gt; as tools the agent can call. After step 1, it calls &lt;code&gt;remember("config_path", "/etc/app.yml")&lt;/code&gt;. By step 4, that fact is still right there in the system prompt, front and center.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Compress Old Tool Output
&lt;/h3&gt;

&lt;p&gt;The second problem is sheer volume. A &lt;code&gt;read_file&lt;/code&gt; call on a 500-line config returns 500 lines. Three of those calls and the context is mostly raw file dumps.&lt;/p&gt;

&lt;p&gt;Fix it by replacing old tool results with summaries once they're no longer the active focus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compress_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keep_recent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Keep system message and recent N turns verbatim
&lt;/span&gt;    &lt;span class="c1"&gt;# Summarize everything in between
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;keep_recent&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;

    &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# system prompt
&lt;/span&gt;    &lt;span class="n"&gt;tail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;keep_recent&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
    &lt;span class="n"&gt;middle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;keep_recent&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# Replace verbose tool results with one-line summaries
&lt;/span&gt;    &lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;summarize_turns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;middle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# your own summarizer
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;tail&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I usually run this every 5-6 turns. The agent still has access to the scratchpad for facts that matter, and the recent turns for the immediate context. The middle gets squashed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Validate State Before Every Critical Tool Call
&lt;/h3&gt;

&lt;p&gt;This is the prevention piece. For any tool that mutates something — writes a file, hits an API, runs a migration — wrap it in a guard that checks the scratchpad first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_file_safe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Refuse to write to a path the agent never recorded reading
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;scratchpad&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;facts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;known_paths&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ERROR: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; was never read in this session. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the file first or update scratchpad.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;do_write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches the hallucination case cleanly. If the agent invents a path, the tool rejects it and tells the agent why. Nine times out of ten, the next turn the agent corrects itself and reads the right file.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Pattern I've Started Using
&lt;/h2&gt;

&lt;p&gt;After migrating three agents to this approach, I converged on a structure that looks roughly like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;System prompt&lt;/strong&gt; holds the goal and tool definitions (static)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scratchpad&lt;/strong&gt; holds extracted facts and decisions (mutable, re-injected each turn)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recent messages&lt;/strong&gt; hold the last few turns verbatim (rolling window)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Summary&lt;/strong&gt; replaces older middle turns (compressed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent never needs to scan 50 turns of history to find a filename. It looks at the scratchpad, sees the fact, and acts on it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention Tips
&lt;/h2&gt;

&lt;p&gt;A few things I wish I'd done from the start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Design state-aware tools first.&lt;/strong&gt; Every tool should either read from or write to the scratchpad, not just return data into the void.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test long chains.&lt;/strong&gt; Most agent bugs only show up after 5+ tool calls. Write integration tests that force the agent through a long workflow and assert on final state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log token counts per turn.&lt;/strong&gt; When you see context creeping past 60-70% of the window, that's where attention starts to degrade. Compress earlier than you think you need to.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't trust the model to remember.&lt;/strong&gt; If a fact matters in step 5, write it to the scratchpad in step 1. Treat the model like a brilliant intern with severe short-term memory loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The broader lesson here: agent reliability isn't really about prompt engineering. It's about state management. The model is doing inference; you're doing the bookkeeping. Get the bookkeeping right and the agent stops forgetting things.&lt;/p&gt;

&lt;p&gt;I haven't tested the scratchpad pattern on workflows longer than about 20 tool calls, so your mileage may vary at extreme scale. But for the typical 3-10 step agent workflow, this fixed every context-loss bug I had.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>python</category>
      <category>debugging</category>
    </item>
    <item>
      <title>How to do partial page updates without shipping a framework</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Mon, 25 May 2026 19:01:36 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-do-partial-page-updates-without-shipping-a-framework-1dc0</link>
      <guid>https://dev.to/alanwest/how-to-do-partial-page-updates-without-shipping-a-framework-1dc0</guid>
      <description>&lt;h2&gt;
  
  
  The problem: swapping part of a page is harder than it should be
&lt;/h2&gt;

&lt;p&gt;You click a "Load more" button. Behind the scenes, your app fetches some HTML and shoves it into a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;. Simple, right?&lt;/p&gt;

&lt;p&gt;Except it never stays simple. You hit edge cases with orphaned event listeners, focus jumping around, scripts that mysteriously don't execute, and weird flashes when you replace a chunk of DOM.&lt;/p&gt;

&lt;p&gt;I ran into this again last month while refactoring a dashboard. The team had jQuery doing partial swaps for the better part of a decade and finally wanted out. The "modern" alternatives all looked great in demos. Each one came with its own quirks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the web makes this awkward
&lt;/h2&gt;

&lt;p&gt;HTML, at its core, doesn't have a primitive for "replace this region with the contents of that response." You can navigate the whole page (&lt;code&gt;&amp;lt;a href&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;form action&amp;gt;&lt;/code&gt;), or you can call &lt;code&gt;fetch()&lt;/code&gt; and write the result into &lt;code&gt;innerHTML&lt;/code&gt; yourself. There's no middle ground built into the platform.&lt;/p&gt;

&lt;p&gt;That gap is why entire categories of tools exist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTMX&lt;/strong&gt; turns attributes into AJAX swaps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Turbo (Hotwire)&lt;/strong&gt; wraps regions in &lt;code&gt;&amp;lt;turbo-frame&amp;gt;&lt;/code&gt; and intercepts navigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React/Vue/Svelte&lt;/strong&gt; rebuild the page from a virtual representation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unpoly&lt;/strong&gt; and friends do progressive enhancement around forms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of them is solving a problem the browser itself never tried to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  A quick tour of the pain
&lt;/h2&gt;

&lt;p&gt;Here's the typical "just fetch and replace" approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Replace #content with HTML from the server&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;loadFragment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks fine. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inline &lt;code&gt;&amp;lt;script&amp;gt;&lt;/code&gt; tags in &lt;code&gt;html&lt;/code&gt; won't run (a long-standing quirk of &lt;code&gt;innerHTML&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Any event listeners attached to the old &lt;code&gt;#content&lt;/code&gt; children are gone.&lt;/li&gt;
&lt;li&gt;If the user had focus inside the replaced region, it's lost.&lt;/li&gt;
&lt;li&gt;Custom elements may re-run constructors in surprising ways.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After getting bitten by all of these in production, most teams reach for a library.&lt;/p&gt;

&lt;h2&gt;
  
  
  How HTMX solves it today
&lt;/h2&gt;

&lt;p&gt;HTMX handles a lot of this for you with declarative attributes. The swap target and strategy live in the markup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;button&lt;/span&gt;
  &lt;span class="na"&gt;hx-get=&lt;/span&gt;&lt;span class="s"&gt;"/api/comments?page=2"&lt;/span&gt;
  &lt;span class="na"&gt;hx-target=&lt;/span&gt;&lt;span class="s"&gt;"#comments"&lt;/span&gt;
  &lt;span class="na"&gt;hx-swap=&lt;/span&gt;&lt;span class="s"&gt;"beforeend"&lt;/span&gt;
&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  Load more
&lt;span class="nt"&gt;&amp;lt;/button&amp;gt;&lt;/span&gt;

&lt;span class="nt"&gt;&amp;lt;ul&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"comments"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="c"&gt;&amp;lt;!-- existing items --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/ul&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It handles script execution, optional out-of-band swaps, focus preservation via &lt;code&gt;hx-preserve&lt;/code&gt;, and history integration. Worth reading the &lt;a href="https://htmx.org/docs/" rel="noopener noreferrer"&gt;official docs&lt;/a&gt; if you haven't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Turbo Frames, briefly
&lt;/h2&gt;

&lt;p&gt;If you're in the Rails/Hotwire world, &lt;code&gt;&amp;lt;turbo-frame&amp;gt;&lt;/code&gt; does similar work by wrapping regions and intercepting links inside them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;turbo-frame&lt;/span&gt; &lt;span class="na"&gt;id=&lt;/span&gt;&lt;span class="s"&gt;"comments"&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"/posts/42/comments"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  Loading…
&lt;span class="nt"&gt;&amp;lt;/turbo-frame&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both approaches work. Both require shipping a library. Both reinvent things the browser arguably should handle itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Chrome is reportedly exploring
&lt;/h2&gt;

&lt;p&gt;According to the &lt;a href="https://developer.chrome.com/blog/declarative-partial-updates" rel="noopener noreferrer"&gt;Chrome developers blog post on declarative partial updates&lt;/a&gt;, there's an early-stage proposal to give the platform a native way to express "replace this region with what comes back from the server." I haven't shipped anything against the proposal yet — at the time of writing it reads as an explainer, not a stable API — but the direction is interesting.&lt;/p&gt;

&lt;p&gt;The general idea, as I read it, is that you describe the swap declaratively in markup and the browser does the fetch and DOM update for you. Think of it as the platform absorbing patterns that libraries like HTMX have demonstrated work well.&lt;/p&gt;

&lt;p&gt;If you want to follow along officially, the right places to watch are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;a href="https://chromestatus.com/" rel="noopener noreferrer"&gt;Chrome status entries&lt;/a&gt; for the proposal&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://github.com/whatwg" rel="noopener noreferrer"&gt;WHATWG repos&lt;/a&gt; where standardization discussion happens&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://web-platform-tests.org/" rel="noopener noreferrer"&gt;Web Platform Tests&lt;/a&gt; repo once an implementation lands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would resist building anything load-bearing on a proposal-stage API. The shape will almost certainly shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do in the meantime
&lt;/h2&gt;

&lt;p&gt;If you're feeling the pain today, here's the order I'd try things in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use the simplest tool that fits.&lt;/strong&gt; If you only need partial updates in two places, hand-rolled &lt;code&gt;fetch&lt;/code&gt; plus &lt;code&gt;replaceChildren&lt;/code&gt; is fine.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reach for HTMX or Turbo if you're doing this everywhere.&lt;/strong&gt; Don't build your own framework. I've watched two teams try; both ended up with a worse HTMX.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep server-rendered HTML as the source of truth.&lt;/strong&gt; Returning JSON and reconstructing markup on the client is what got us into the SPA mess in the first place.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a safer vanilla pattern I lean on when a library would be overkill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;swapFragment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;targetSelector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Accept&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text/html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Fragment fetch failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createElement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;template&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;innerHTML&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// parsed in an inert context, not executed&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;targetSelector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// replaceChildren preserves more state than reassigning innerHTML&lt;/span&gt;
  &lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replaceChildren&lt;/span&gt;&lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;childNodes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few notes on why this is less awful than the naive version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;template&amp;gt;&lt;/code&gt; parses HTML in an inert context, so custom elements don't upgrade twice.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;replaceChildren&lt;/code&gt; is generally kinder to scroll position and selection than overwriting &lt;code&gt;innerHTML&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;We still need to manually re-run any inline scripts the server sends. Honestly, I just don't.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prevention tips so this doesn't haunt you again
&lt;/h2&gt;

&lt;p&gt;A handful of habits that have saved me grief:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pick one swap strategy per project.&lt;/strong&gt; Mixing HTMX, Turbo, and hand-rolled fetches makes debugging miserable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test focus, scroll, and selection&lt;/strong&gt; as part of normal QA — not just "did the right content appear."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat the response shape as a contract.&lt;/strong&gt; If one endpoint returns JSON for some callers and HTML for others, you've doubled your test surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch the standards work.&lt;/strong&gt; If a declarative API does land in browsers, your migration story will be much smoother if your current solution is markup-first rather than imperative JS scattered across the codebase.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;Partial page updates are one of those problems that look tiny from the outside and turn into a tar pit on the inside. Libraries have papered over the gap for years and done a respectable job. If browsers eventually expose a native primitive for this, a lot of glue code will go away. Until then: pick a sane tool, keep markup as the source of truth, and don't roll your own. I've tried. It's not worth it.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>html</category>
      <category>javascript</category>
      <category>htmx</category>
    </item>
    <item>
      <title>Migrating off Google Analytics: Umami vs Plausible vs Fathom</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Mon, 25 May 2026 18:39:21 +0000</pubDate>
      <link>https://dev.to/alanwest/migrating-off-google-analytics-umami-vs-plausible-vs-fathom-1bnj</link>
      <guid>https://dev.to/alanwest/migrating-off-google-analytics-umami-vs-plausible-vs-fathom-1bnj</guid>
      <description>&lt;p&gt;I've been on a slow march away from Google Analytics for about two years now. Three client projects, one personal blog, and a small SaaS — all moved over to privacy-focused alternatives. The decision usually starts with a cookie banner complaint and ends with someone asking "wait, why are we sending visitor data to an ad company again?"&lt;/p&gt;

&lt;p&gt;If you're thinking about making the switch, this post walks through the three options I've actually shipped to production: Umami, Plausible, and Fathom. I'll cover what each does well, where they stumble, and what the migration actually looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother migrating at all?
&lt;/h2&gt;

&lt;p&gt;The usual reasons, in roughly the order clients bring them up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GDPR/ePrivacy compliance&lt;/strong&gt; without the cookie banner gymnastics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Page weight&lt;/strong&gt; — the GA4 script weighs in around 50KB+ gzipped; the alternatives here are all under 3KB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard sanity&lt;/strong&gt; — GA4's UI is, charitably, an acquired taste&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data ownership&lt;/strong&gt; — especially relevant if you self-host&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are reasons to migrate a high-traffic e-commerce site overnight. But for content sites, marketing pages, and most SaaS dashboards, the tradeoff is usually worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The contenders
&lt;/h2&gt;

&lt;p&gt;Here's the honest side-by-side. I've used all three in production within the last 18 months.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Umami&lt;/th&gt;
&lt;th&gt;Plausible&lt;/th&gt;
&lt;th&gt;Fathom&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;AGPL-3.0&lt;/td&gt;
&lt;td&gt;Proprietary (Fathom Lite is MIT)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-host&lt;/td&gt;
&lt;td&gt;Yes (free)&lt;/td&gt;
&lt;td&gt;Yes (Community Edition)&lt;/td&gt;
&lt;td&gt;No (hosted only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosted option&lt;/td&gt;
&lt;td&gt;Yes (Umami Cloud)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (only option)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Script size&lt;/td&gt;
&lt;td&gt;~2KB&lt;/td&gt;
&lt;td&gt;&amp;lt;1KB&lt;/td&gt;
&lt;td&gt;~2KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cookies&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;MySQL or PostgreSQL&lt;/td&gt;
&lt;td&gt;PostgreSQL + ClickHouse&lt;/td&gt;
&lt;td&gt;N/A (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Built with&lt;/td&gt;
&lt;td&gt;Next.js / Node&lt;/td&gt;
&lt;td&gt;Elixir / Phoenix&lt;/td&gt;
&lt;td&gt;Closed source&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few clarifications since this stuff changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fathom Analytics&lt;/strong&gt; (the paid product) is closed source. There's an older project called &lt;em&gt;Fathom Lite&lt;/em&gt; that's MIT licensed, but it hasn't been actively developed in years. Don't confuse them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plausible Community Edition&lt;/strong&gt; is the self-host option. The hosted version has extra features that aren't in CE — check their docs before assuming parity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Umami&lt;/strong&gt; is the most permissive license-wise, which matters if you want to embed it in a commercial product.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the tracking code looks like
&lt;/h2&gt;

&lt;p&gt;This is roughly what you replace your GA4 snippet with. All three are a single script tag, which is part of the point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Umami&lt;/strong&gt; (self-hosted or cloud):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- Drop this in your &amp;lt;head&amp;gt;, replace with your own website ID --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;script
  &lt;/span&gt;&lt;span class="na"&gt;defer&lt;/span&gt;
  &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://your-umami-instance.com/script.js"&lt;/span&gt;
  &lt;span class="na"&gt;data-website-id=&lt;/span&gt;&lt;span class="s"&gt;"abc123-your-id-here"&lt;/span&gt;
&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Plausible:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script
  &lt;/span&gt;&lt;span class="na"&gt;defer&lt;/span&gt;
  &lt;span class="na"&gt;data-domain=&lt;/span&gt;&lt;span class="s"&gt;"yourdomain.com"&lt;/span&gt;
  &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://plausible.io/js/script.js"&lt;/span&gt;
&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fathom:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;script
  &lt;/span&gt;&lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"https://cdn.usefathom.com/script.js"&lt;/span&gt;
  &lt;span class="na"&gt;data-site=&lt;/span&gt;&lt;span class="s"&gt;"ABCDEFGH"&lt;/span&gt;
  &lt;span class="na"&gt;defer&lt;/span&gt;
&lt;span class="nt"&gt;&amp;gt;&amp;lt;/script&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice none of them ask you to configure consent mode, set up data streams, or wire up tag manager. That's mostly the appeal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracking a custom event
&lt;/h2&gt;

&lt;p&gt;This is where the APIs diverge a bit. Here's the same "button clicked" event in each:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Umami — global function attached to window&lt;/span&gt;
&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;umami&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;signup-clicked&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Plausible — same pattern, slightly different shape&lt;/span&gt;
&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plausible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Signup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;props&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pro&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Fathom — uses trackEvent&lt;/span&gt;
&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fathom&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trackEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;signup clicked&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fathom's custom event API is the most limited of the three — no arbitrary properties on events in the same way. If you need rich event metadata, Umami and Plausible are better fits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-hosting Umami in about 10 minutes
&lt;/h2&gt;

&lt;p&gt;This is the workflow I've used for the last two client deployments. Assumes Docker is installed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Grab the official compose file&lt;/span&gt;
curl &lt;span class="nt"&gt;-o&lt;/span&gt; docker-compose.yml &lt;span class="se"&gt;\&lt;/span&gt;
  https://raw.githubusercontent.com/umami-software/umami/master/docker-compose.yml

&lt;span class="c"&gt;# Generate a hash salt for sessions&lt;/span&gt;
openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32

&lt;span class="c"&gt;# Edit docker-compose.yml: set DATABASE_URL and APP_SECRET&lt;/span&gt;
&lt;span class="c"&gt;# Then bring it up&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default admin login is &lt;code&gt;admin&lt;/code&gt; / &lt;code&gt;umami&lt;/code&gt; — change that immediately. Then point a reverse proxy at port 3000 and you're done. I've run this on a $6/month VPS handling a few hundred thousand pageviews per month without trouble.&lt;/p&gt;

&lt;p&gt;Plausible CE is similar but heavier — it needs ClickHouse, which is a bigger commitment if you're not already running it. For small/medium sites, Umami is the easier self-host story.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration steps (the boring but important part)
&lt;/h2&gt;

&lt;p&gt;Here's the rough order I follow when moving a site off GA4:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stand up the new analytics tool&lt;/strong&gt; and verify it's collecting data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run both in parallel for 2-4 weeks&lt;/strong&gt; — you want to see how the numbers compare before you cut over&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export your historical GA data&lt;/strong&gt; (BigQuery export if you have GA4; the UI export is rough)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document the metric differences&lt;/strong&gt; for stakeholders — privacy-focused tools count visitors differently, and your numbers WILL drop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove the GA snippet&lt;/strong&gt; and any GTM containers that only existed to feed it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step about numbers dropping — be ready for it. Without cross-site cookies, returning visitors get counted as new more often. Bot filtering is also different. I've seen drops of 15-30% in reported sessions, and it usually has nothing to do with actual traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  So which one should you pick?
&lt;/h2&gt;

&lt;p&gt;My actual recommendations after shipping all three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pick Umami if&lt;/strong&gt; you want to self-host on a budget, value the MIT license, or need richer custom events. It's my default recommendation for technical teams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick Plausible if&lt;/strong&gt; you want a polished hosted experience and don't mind paying, or if you're already running ClickHouse and want a more battle-tested backend. The Plausible team also publishes a lot of useful research on web analytics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick Fathom if&lt;/strong&gt; you want zero infrastructure and a simple dashboard, and don't need self-hosting. It's the most "set it and forget it" of the three.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One caveat about hedging your bets
&lt;/h2&gt;

&lt;p&gt;I ran Plausible and Umami side-by-side on a client site for a month last year. The visitor numbers were within ~3% of each other, which was reassuring. They're both counting things in roughly the same way — the differences come down to how each handles edge cases like prerendering and ad blockers.&lt;/p&gt;

&lt;p&gt;If you want to verify the methodology, both projects publish their counting logic. Umami's docs are at &lt;a href="https://umami.is/docs" rel="noopener noreferrer"&gt;umami.is/docs&lt;/a&gt; and Plausible's data policy is at &lt;a href="https://plausible.io/data-policy" rel="noopener noreferrer"&gt;plausible.io/data-policy&lt;/a&gt;. Worth reading before you commit to either.&lt;/p&gt;

&lt;p&gt;The TL;DR: any of these will be a meaningful upgrade over GA4 for most sites. The migration is genuinely not that hard. The hardest part is convincing whoever owns the marketing dashboard that the numbers are different but not wrong.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>privacy</category>
      <category>analytics</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why Your PyTorch Training Crawls on a Beefy GPU (And How to Fix It)</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Sun, 24 May 2026 22:36:09 +0000</pubDate>
      <link>https://dev.to/alanwest/why-your-pytorch-training-crawls-on-a-beefy-gpu-and-how-to-fix-it-52g8</link>
      <guid>https://dev.to/alanwest/why-your-pytorch-training-crawls-on-a-beefy-gpu-and-how-to-fix-it-52g8</guid>
      <description>&lt;p&gt;Last month I was helping a friend debug a training loop that was running at maybe 15% GPU utilization on an A100. Fifteen percent. On a card that costs more than my first car. He'd already tried bumping the batch size, swapping the optimizer, and rewriting the data loader — nothing moved the needle.&lt;/p&gt;

&lt;p&gt;This is one of those frustrating problems where the obvious knobs do nothing, because the obvious knobs aren't where the bottleneck lives. So let's actually walk through how to figure out &lt;em&gt;why&lt;/em&gt; your model is slow, instead of just throwing batch sizes at the wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three regimes nobody tells you about
&lt;/h2&gt;

&lt;p&gt;When a deep learning workload is slow, it's almost always slow for one of three reasons. Horace He laid this out really clearly in his &lt;a href="https://horace.io/brrr_intro.html" rel="noopener noreferrer"&gt;"Making Deep Learning Go Brrrr From First Principles"&lt;/a&gt; post back in 2022, and the framing has stuck with me ever since:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Compute-bound&lt;/strong&gt; — you're actually saturating the matmul units. Rare. Usually only happens with huge dense layers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory-bandwidth-bound&lt;/strong&gt; — the GPU is mostly waiting on data to move between HBM and the SMs. Way more common than people realize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overhead-bound&lt;/strong&gt; — Python, the framework dispatcher, or kernel launch latency is dominating. Death by a thousand papercuts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The punchline: most "my model is slow" problems are not compute-bound, even though that's where everyone instinctively looks first. If you're running a transformer with a bunch of small ops between the big matmuls, you're probably stuck in regime two or three.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Figure out which regime you're in
&lt;/h2&gt;

&lt;p&gt;Don't guess. Profile. PyTorch's built-in profiler will tell you most of what you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.profiler&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProfilerActivity&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Warm up — first iterations include cudnn autotuning and allocator setup
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;activities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ProfilerActivity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProfilerActivity&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CUDA&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;record_shapes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;prof&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;backward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Sort by self CUDA time to see what's actually burning GPU cycles
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prof&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;key_averages&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sort_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;self_cuda_time_total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row_limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you're looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a few big &lt;code&gt;gemm&lt;/code&gt; / &lt;code&gt;conv&lt;/code&gt; kernels dominate → likely compute-bound, and you should be happy.&lt;/li&gt;
&lt;li&gt;If you see a sea of tiny kernels (&lt;code&gt;add&lt;/code&gt;, &lt;code&gt;mul&lt;/code&gt;, &lt;code&gt;relu&lt;/code&gt;, &lt;code&gt;layer_norm&lt;/code&gt; components, etc.) eating real time → memory-bandwidth-bound.&lt;/li&gt;
&lt;li&gt;If CPU time is way higher than CUDA time, or kernel launches are spaced out with gaps → overhead-bound.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the friend's model, the profile showed hundreds of tiny pointwise kernels per step. Classic memory bandwidth problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: The arithmetic intensity check
&lt;/h2&gt;

&lt;p&gt;Here's the back-of-the-envelope check that explains &lt;em&gt;why&lt;/em&gt; small ops are murder. For each operation, ask: how many FLOPs am I doing per byte of memory I touch?&lt;/p&gt;

&lt;p&gt;A modern GPU like an A100 does roughly 312 TFLOPs of fp16 matmul but only has about 2 TB/s of HBM bandwidth. That's a ratio of ~150 FLOPs per byte. If your operation does fewer FLOPs per byte than that, you're memory-bound — full stop. No amount of bigger batches will help if the math isn't there.&lt;/p&gt;

&lt;p&gt;A pointwise &lt;code&gt;relu&lt;/code&gt; on an fp32 tensor? You read 4 bytes, write 4 bytes, do 1 FLOP. That's 0.125 FLOPs per byte. You are &lt;em&gt;wildly&lt;/em&gt; memory-bound. The GPU spends 99% of its time waiting on memory and 1% doing the actual work.&lt;/p&gt;

&lt;p&gt;A dense matmul on big enough matrices? Hundreds of FLOPs per byte. Now you're cooking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Fuse the small stuff
&lt;/h2&gt;

&lt;p&gt;The fix for memory-bandwidth-bound code is almost always &lt;strong&gt;operator fusion&lt;/strong&gt;. Instead of running ten separate kernels that each round-trip through HBM, you run one kernel that keeps intermediate values in registers or shared memory.&lt;/p&gt;

&lt;p&gt;The easiest win in modern PyTorch is &lt;code&gt;torch.compile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# mode='reduce-overhead' uses CUDA graphs to also chip away at launch overhead
# mode='max-autotune' spends more time compiling but can fuse more aggressively
&lt;/span&gt;&lt;span class="n"&gt;compiled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;reduce-overhead&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# First call is slow — it's tracing and compiling
&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compiled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Subsequent calls hit the cached compiled graph
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;compiled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I've seen this give 1.5x–3x speedups on transformer-ish workloads with basically zero code changes. Your mileage varies a lot based on how dynamic your shapes are — if every batch has a different sequence length, you'll trigger recompiles and lose most of the win. See the &lt;a href="https://pytorch.org/docs/stable/generated/torch.compile.html" rel="noopener noreferrer"&gt;torch.compile docs&lt;/a&gt; for the dynamic-shape options.&lt;/p&gt;

&lt;p&gt;If you need more control, you can write fused kernels yourself in &lt;a href="https://triton-lang.org/" rel="noopener noreferrer"&gt;Triton&lt;/a&gt;. For a pointwise chain, it's usually not worth it — &lt;code&gt;torch.compile&lt;/code&gt; will fuse those for you. For attention or other patterns with cross-element communication, hand-written kernels (or things like &lt;a href="https://github.com/Dao-AILab/flash-attention" rel="noopener noreferrer"&gt;FlashAttention&lt;/a&gt;) are still where the big wins live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Crush overhead with CUDA graphs
&lt;/h2&gt;

&lt;p&gt;If your profile shows lots of small gaps between kernels on the GPU timeline, you're overhead-bound. Each kernel launch has fixed CPU-side cost — Python, the dispatcher, CUDA itself. With small kernels, that overhead can be bigger than the kernel runtime.&lt;/p&gt;

&lt;p&gt;CUDA graphs let you record a sequence of kernel launches once and replay them as a single submission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyModel&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;static_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;224&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Warm up on a side stream before capture
&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current_stream&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;static_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;static_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;current_stream&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;wait_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Capture the graph
&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CUDAGraph&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;static_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;static_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# To run: copy new data into static_input in place, then replay
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;batch&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataloader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;static_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy_&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# in-place copy, same buffer
&lt;/span&gt;    &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replay&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="c1"&gt;# static_output now holds the result
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The big gotcha: input tensors have to live at the same memory addresses each call. You're reusing buffers, not allocating new ones. That's why we &lt;code&gt;copy_&lt;/code&gt; instead of reassigning. &lt;code&gt;torch.compile(mode='reduce-overhead')&lt;/code&gt; does basically this for you under the hood, which is why I usually reach for that first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention tips
&lt;/h2&gt;

&lt;p&gt;A few habits that have saved me a lot of grief:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Profile before optimizing.&lt;/strong&gt; Always. I've wasted entire afternoons "optimizing" things that were 2% of the runtime.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Watch your shapes.&lt;/strong&gt; Dynamic shapes break &lt;code&gt;torch.compile&lt;/code&gt; caches and CUDA graphs. If you can pad to a few bucket sizes, do it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stop sprinkling &lt;code&gt;.cpu()&lt;/code&gt; and &lt;code&gt;.item()&lt;/code&gt; calls.&lt;/strong&gt; Each one forces a sync and stalls the pipeline. If you're doing it inside the training loop for logging, batch it up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check &lt;code&gt;nvidia-smi&lt;/code&gt; while training.&lt;/strong&gt; If utilization is below ~70%, something's wrong. That's your signal to break out the profiler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the assembly when it really matters.&lt;/strong&gt; For hot kernels, Triton lets you dump the PTX and see what actually got generated. Sometimes the autoscheduler does something silly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The meta-lesson here is that GPU performance is a first-principles problem, not a vibes-based one. Once you know whether you're starved for FLOPs, bandwidth, or launches, the fix usually becomes obvious. The frustrating part is just resisting the urge to skip the profiling step.&lt;/p&gt;

</description>
      <category>pytorch</category>
      <category>performance</category>
      <category>machinelearning</category>
      <category>gpu</category>
    </item>
    <item>
      <title>How to sandbox AI coding agents without crippling them</title>
      <dc:creator>Alan West</dc:creator>
      <pubDate>Sun, 24 May 2026 20:05:55 +0000</pubDate>
      <link>https://dev.to/alanwest/how-to-sandbox-ai-coding-agents-without-crippling-them-116c</link>
      <guid>https://dev.to/alanwest/how-to-sandbox-ai-coding-agents-without-crippling-them-116c</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: Your AI Agent Has Root
&lt;/h2&gt;

&lt;p&gt;A few months back I was helping a team set up a self-hosted AI coding agent. Standard setup — an LLM with tool access, running on a shared dev server, able to read files, execute commands, hit APIs. The usual.&lt;/p&gt;

&lt;p&gt;Then someone ran a prompt that included pasted output from an untrusted webpage. The agent dutifully interpreted some embedded instructions and started &lt;code&gt;rm -rf&lt;/code&gt;'ing a directory it had no business touching.&lt;/p&gt;

&lt;p&gt;Nothing critical was lost. But it could have been.&lt;/p&gt;

&lt;p&gt;This is the dirty secret of running agents that execute code — by default, they run with whatever permissions your process has. If that process is your dev environment, your agent has access to your SSH keys, your cloud credentials, your git history. Everything.&lt;/p&gt;

&lt;p&gt;Let me walk through how to actually sandbox these things properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Just Use Docker" Isn't Enough
&lt;/h2&gt;

&lt;p&gt;The obvious answer is to stick the agent in a container. And yes, that's a start. But naive Docker setups still have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Root inside the container by default (escapable through several known paths)&lt;/li&gt;
&lt;li&gt;Full network access to your internal services&lt;/li&gt;
&lt;li&gt;Bind mounts you didn't think hard enough about&lt;/li&gt;
&lt;li&gt;No syscall filtering — kernel exploits exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've seen "sandboxed" setups where &lt;code&gt;docker.sock&lt;/code&gt; was mounted in for convenience. That's not a sandbox. That's a hot tub.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Layered Approach
&lt;/h2&gt;

&lt;p&gt;The way I've come around to thinking about this: defense in depth. Each layer assumes the previous one was bypassed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Drop root
&lt;/h3&gt;

&lt;p&gt;Containers should not run as root. Basic, but skipped constantly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04&lt;/span&gt;

&lt;span class="c"&gt;# Dedicated user, no sudo, no shell escalation&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; /bin/bash agent

&lt;span class="c"&gt;# Switch BEFORE any app setup so caches/files are owned correctly&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; agent&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /home/agent&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=agent:agent ./app /home/agent/app&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Layer 2: User namespaces
&lt;/h3&gt;

&lt;p&gt;Even as non-root inside the container, you want the container's UIDs remapped on the host. So even if the agent somehow becomes root in the container, it's an unprivileged UID on the outside.&lt;/p&gt;

&lt;p&gt;Configure it in &lt;code&gt;/etc/docker/daemon.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"userns-remap"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After restarting the daemon, container UIDs get shifted to a high host range. A "root" process inside has zero privileges against the host filesystem. See the &lt;a href="https://docs.docker.com/engine/security/userns-remap/" rel="noopener noreferrer"&gt;Docker user namespace docs&lt;/a&gt; for the full setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Seccomp filtering
&lt;/h3&gt;

&lt;p&gt;This is the layer most people skip. &lt;a href="https://man7.org/linux/man-pages/man2/seccomp.2.html" rel="noopener noreferrer"&gt;seccomp&lt;/a&gt; lets you whitelist syscalls — so even if the agent compromises the container, it can't make syscalls you haven't allowed.&lt;/p&gt;

&lt;p&gt;Docker ships a default seccomp profile that blocks around 40 dangerous syscalls. For agent workloads I tighten it further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaultAction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ERRNO"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"syscalls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"names"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"close"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"stat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fstat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lstat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mmap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"brk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"rt_sigaction"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execve"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exit_group"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"futex"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"clone"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fork"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wait4"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SCMP_ACT_ALLOW"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt; &lt;span class="nv"&gt;seccomp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./agent-seccomp.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt; no-new-privileges &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL &lt;span class="se"&gt;\&lt;/span&gt;
  agent-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--cap-drop=ALL&lt;/code&gt; strips every Linux capability. &lt;code&gt;--no-new-privileges&lt;/code&gt; blocks setuid binaries from elevating. Together they shrink the attack surface inside the container down to almost nothing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4: Network egress control
&lt;/h3&gt;

&lt;p&gt;Agents need to make HTTP calls. They do not need to scan your internal network.&lt;/p&gt;

&lt;p&gt;The cleanest pattern I've found is routing the container through a proxy that whitelists destinations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agent-image&lt;/span&gt;
    &lt;span class="c1"&gt;# Agent shares the proxy's network namespace — no direct egress&lt;/span&gt;
    &lt;span class="na"&gt;network_mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;service:proxy"&lt;/span&gt;

  &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nginx:alpine&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./proxy.conf:/etc/nginx/nginx.conf:ro&lt;/span&gt;
    &lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;egress&lt;/span&gt;

&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;egress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy allows only the endpoints the agent legitimately needs. The agent has no network interface of its own — every packet has to go through nginx, which has to recognize the host.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 5: Filesystem isolation
&lt;/h3&gt;

&lt;p&gt;Mount points are where I see the most mistakes. The agent needs to work on code, but the principle is: mount exactly what's needed, read-only where possible, and never anything sensitive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--read-only&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt;                             &lt;span class="c"&gt;# Root FS is immutable&lt;/span&gt;
  &lt;span class="nt"&gt;--tmpfs&lt;/span&gt; /tmp:size&lt;span class="o"&gt;=&lt;/span&gt;100M &lt;span class="se"&gt;\ &lt;/span&gt;                  &lt;span class="c"&gt;# Scratch space, capped&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_DIR&lt;/span&gt;&lt;span class="s2"&gt;:/workspace:rw"&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt;         &lt;span class="c"&gt;# The actual work dir&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT_FILE&lt;/span&gt;&lt;span class="s2"&gt;:/input/prompt:ro"&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt;      &lt;span class="c"&gt;# Read-only inputs&lt;/span&gt;
  agent-image
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what is NOT mounted: no &lt;code&gt;~/.ssh&lt;/code&gt;, no &lt;code&gt;~/.aws&lt;/code&gt;, no &lt;code&gt;docker.sock&lt;/code&gt;, no parent directories that happen to contain a &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Multi-Session Workloads
&lt;/h2&gt;

&lt;p&gt;If multiple developers share the agent infrastructure, isolation between sessions becomes its own problem. The fix is straightforward — one container per session, lifecycle tied to the session.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_agent_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;container_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;container_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--rm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                              &lt;span class="c1"&gt;# Auto-cleanup on stop
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2g&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                    &lt;span class="c1"&gt;# Hard memory cap
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--cpus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                     &lt;span class="c1"&gt;# CPU quota
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--pids-limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               &lt;span class="c1"&gt;# Prevent fork bombs
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;project_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:/workspace&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cgroup limits (&lt;code&gt;--memory&lt;/code&gt;, &lt;code&gt;--cpus&lt;/code&gt;, &lt;code&gt;--pids-limit&lt;/code&gt;) are the unsung heroes. Without them, one runaway agent can take down the host. I learned this one the hard way after an agent got stuck in a loop spawning subprocesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prevention Tips
&lt;/h2&gt;

&lt;p&gt;A few things I've learned that weren't obvious to me at first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Treat the agent's environment as untrusted.&lt;/strong&gt; Anything in its filesystem or env vars can be exfiltrated via prompt injection.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit your mounts every single time.&lt;/strong&gt; Bind mounts are the #1 source of escapes I've actually witnessed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log every command the agent runs.&lt;/strong&gt; When something goes wrong, you'll want the trail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set timeouts on everything.&lt;/strong&gt; Agents that should take 30 seconds sometimes try to run for 30 hours. Kill them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use ephemeral containers.&lt;/strong&gt; Reusing the same container across sessions invites state pollution and credential leakage between users.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mental shift that helped me — stop thinking of the agent as "code I trust running on infra I trust." Think of it as a stranger you handed a terminal. Then design accordingly.&lt;/p&gt;

&lt;p&gt;The layers above won't make an agent invulnerable. But they'll turn a single bad prompt from a catastrophe into a footnote.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
