<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hiroshi Toyama</title>
    <description>The latest articles on DEV Community by Hiroshi Toyama (@toyama0919).</description>
    <link>https://dev.to/toyama0919</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F41804%2F7c54b968-6837-4276-9b8a-ac8f060c6f18.png</url>
      <title>DEV Community: Hiroshi Toyama</title>
      <link>https://dev.to/toyama0919</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/toyama0919"/>
    <language>en</language>
    <item>
      <title>Why TPUs Aren't Popular (Even Though They're Cheaper Per Token)</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Fri, 05 Jun 2026 08:35:50 +0000</pubDate>
      <link>https://dev.to/toyama0919/why-tpus-arent-popular-even-though-theyre-cheaper-per-token-188g</link>
      <guid>https://dev.to/toyama0919/why-tpus-arent-popular-even-though-theyre-cheaper-per-token-188g</guid>
      <description>&lt;p&gt;If you only look at the spec sheet, the TPU story is overwhelming: lower cost-per-token, dramatically better watts-per-token, deterministic latency. Trainium tells the same story. And yet a large share of the industry — including most of the inference traffic behind consumer chat UIs like ChatGPT — still runs on NVIDIA. The gap between "cheaper on paper" and "what people actually deploy" is not a marketing failure. It's an architectural tax that systolic-array silicon charges you in code, pipelines, and org structure. This post is about where that tax comes from and why only a handful of companies can afford to pay it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one architectural fact that explains everything: static shapes
&lt;/h2&gt;

&lt;p&gt;NVIDIA GPUs are SIMT (Single Instruction, Multiple Threads) processors. They schedule threads dynamically at runtime and page memory on demand. TPUs and AWS Trainium are not GPUs — they are &lt;strong&gt;systolic arrays&lt;/strong&gt;: a grid of multiply-accumulate units wired directly to their neighbors, fed by an ahead-of-time compiler (XLA for TPU, the Neuron compiler for Trainium).&lt;/p&gt;

&lt;p&gt;A systolic array hits peak utilization only when the shape of the data flowing through it is &lt;strong&gt;fixed at compile time&lt;/strong&gt;. Weights are loaded once and stay stationary in the processing elements; activations slide through like a bucket brigade. Change the sequence length or batch size by even one token and the data routes and memory addresses have to be recomputed — which means the compiler has to generate a &lt;em&gt;new binary&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That single constraint is the source of every downstream pain. Here's what it forces on you at inference time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Runtime input&lt;/th&gt;
&lt;th&gt;NVIDIA (dynamic)&lt;/th&gt;
&lt;th&gt;TPU / Trainium (static)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Larger than the compiled bucket&lt;/td&gt;
&lt;td&gt;Handled by dynamic allocation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Shape-mismatch crash&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Smaller than the bucket&lt;/td&gt;
&lt;td&gt;Handled with no waste&lt;/td&gt;
&lt;td&gt;JIT recompile stall (minutes) &lt;strong&gt;or&lt;/strong&gt; zero-pad waste&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;New, unseen length&lt;/td&gt;
&lt;td&gt;Just runs&lt;/td&gt;
&lt;td&gt;New binary must exist, or it stalls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;So before any token reaches the chip, you need an answer to: "what shape is this, and which precompiled binary does it route to?" On NVIDIA you never ask that question.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dynamic vs. static analogy: Python vs. Java
&lt;/h2&gt;

&lt;p&gt;The cleanest mental model: &lt;strong&gt;NVIDIA is Python, TPU/Trainium is Java.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA = Python.&lt;/strong&gt; Dynamic typing ≈ dynamic shapes. The runtime absorbs chaos. You throw a 100-token prompt or a 50,000-token prompt at the same &lt;code&gt;forward&lt;/code&gt; and it just works, "good enough" fast, with no compile step in your face.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TPU/Trainium = Java.&lt;/strong&gt; Static typing ≈ static shapes. Nothing runs until it's compiled to a fixed binary (&lt;code&gt;NEFF&lt;/code&gt; for Neuron, an XLA executable for TPU). In exchange for boilerplate and rigid discipline, you get extreme execution efficiency — once everything fits the contract.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AMD's Instinct line (CDNA, ROCm) sits firmly on the &lt;strong&gt;NVIDIA/Python side&lt;/strong&gt;: SIMT, dynamic shapes, &lt;code&gt;PagedAttention&lt;/code&gt; support, and a &lt;code&gt;HIPIFY&lt;/code&gt; toolchain whose entire purpose is to run your existing CUDA code unchanged. The static/dynamic split is the real fault line — not the vendor logos.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "handle dynamic input on static hardware" actually costs you in code
&lt;/h2&gt;

&lt;p&gt;Suppose three users hit your endpoint at once: 3,000 / 4,000 / 1,000 tokens. On NVIDIA you don't pad and you don't build a mask. You concatenate them into one flat 8,000-token buffer and hand &lt;code&gt;FlashAttention&lt;/code&gt; a &lt;code&gt;cu_seqlens&lt;/code&gt; index marking the boundaries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NVIDIA: variable-length attention. No padding, no mask matrix.
# Just a flat buffer + cumulative sequence lengths [0, 3000, 7000, 8000].
&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;flash_attn_varlen_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;cu_seqlens_q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cu_seqlens_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_seqlen_q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_seqlen_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The kernel reads the boundary index and isolates each user's context in hardware. No wasted FLOPs on cross-user attention. The code is "just the model logic."&lt;/p&gt;

&lt;p&gt;On a TPU you can't reshape the systolic array, so you do the opposite: force everything into one fixed &lt;code&gt;[batch, STATIC_SEQ_LEN]&lt;/code&gt; rectangle and use math to erase the parts you don't want computed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch.nn.functional&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch_xla.core.xla_model&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;xm&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StaticShapeAttention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Module&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="n"&gt;n_heads&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Linear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d_model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# x is ALWAYS [batch, STATIC_SEQ_LEN, d_model]. The shape never varies.
&lt;/span&gt;        &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;q&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;v&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;n_heads&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matmul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# The systolic array DID compute every cell, including padding and
&lt;/span&gt;        &lt;span class="c1"&gt;# other users' regions. We retroactively delete them: e^(-1e9) -&amp;gt; 0.
&lt;/span&gt;        &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;masked_fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attention_mask&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;attn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matmul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;attn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;transpose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;contiguous&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;out&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things about running this on XLA are pure consequences of static silicon:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;xm.mark_step()&lt;/code&gt; is the real execution trigger.&lt;/strong&gt; That &lt;code&gt;import torch_xla&lt;/code&gt; at the top isn't decoration. Unlike CUDA's eager mode, calling &lt;code&gt;model(x)&lt;/code&gt; on XLA only &lt;em&gt;accumulates a graph&lt;/em&gt;. Nothing runs on the chip until &lt;code&gt;mark_step()&lt;/code&gt; — called in your serving loop, not inside &lt;code&gt;forward&lt;/code&gt; — compiles the accumulated graph into one fixed binary and ships it. New shape → new compile. (Recent PyTorch/XLA adds an eager mode that hides this, but the underlying compile-per-shape model is unchanged.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;masked_fill(..., -1e9)&lt;/code&gt; is a hack, not an optimization.&lt;/strong&gt; NVIDIA's &lt;code&gt;varlen&lt;/code&gt; path &lt;em&gt;skips&lt;/em&gt; the cross-user multiplications entirely. The systolic array can't skip — it must multiply every cell of the rectangle, including the zeros, and then you mathematically null them out in softmax afterward. You burn the watts, then throw the result away.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The "smallest input" trap
&lt;/h3&gt;

&lt;p&gt;The crash-on-overflow case is intuitive: feed 1,025 tokens into a binary compiled for 1,024 and you get a shape mismatch. The nastier case is &lt;em&gt;underflow&lt;/em&gt; — a 100-token request hitting a 1,024 system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Let it through:&lt;/strong&gt; XLA sees a new shape and triggers a JIT recompile. In production that's a multi-minute freeze. Stall.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pad to 1,024:&lt;/strong&gt; the array dutifully runs &lt;code&gt;0 × 0 + 0&lt;/code&gt; across ~90% of its cells, consuming full power to compute nothing. Utilization collapses.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The escape hatch is &lt;strong&gt;packing&lt;/strong&gt;: instead of one user per bucket, tile multiple users' requests into a fixed rectangle like Tetris, and generate a segment-ID mask so attention can't bleed across users.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fixed bucket [ 8192 tokens ]
├─ User A query (3000)
├─ User B query (4000)
├─ User C query (1000)
└─ padding      (192)   &amp;lt;-- the only waste
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It helps to be concrete about what "the rectangle" physically is. When you compile with &lt;code&gt;BATCH_SIZE = 4, STATIC_SEQ_LEN = 8192&lt;/code&gt;, XLA reserves &lt;strong&gt;one contiguous &lt;code&gt;[4, 8192]&lt;/code&gt; static region&lt;/strong&gt; in the TPU's HBM — not four independent "rooms," but one big sheet the compiler hard-wires the array routes for. A single user rarely fills even one 8,192 lane, so the serving layer packs &lt;em&gt;multiple&lt;/em&gt; users across the four lanes at once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ One TPU processor: one static [4 x 8192] sheet ]

lane[0] (8192): [ A(2000) + B(5000) + C(1000) + pad(192) ]
lane[1] (8192): [ D(8000)                      + pad(192) ]
lane[2] (8192): [ E(3000) + F(3000) + G(2100)  + pad(92)  ]
lane[3] (8192): [ H(4000) + I(4000)            + pad(192) ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Physically there are 4 lanes (32K of space); logically the proxy just crammed &lt;strong&gt;9 ragged users (A–I)&lt;/strong&gt; into them. From the application side it looks like one TPU is concurrently servicing many small requests in parallel — but underneath it's one rigid sheet with a segment mask drawn over it. The reason the hardware wants one fat sheet instead of pre-carved small rooms is pure systolic-array physics: the bigger the matrix, the higher the array's fill rate and the fewer idle cycles between feeds.&lt;/p&gt;

&lt;p&gt;Done right, MFU (Model FLOPs Utilization) climbs into the 50–60% band that well-tuned LLM serving actually achieves (PyTorch/XLA reports ~53% training MFU for Llama 2 70B on TPU) — versus the single digits a naive one-user-per-bucket scheme collapses to. 100% is a ceiling nobody touches; the point is that packing recovers most of the loss. But notice what you just built: a high-throughput Go/C++ proxy in front of the cluster whose only job is to catch ragged input and pack it into rectangles in real time. On NVIDIA, that entire layer &lt;strong&gt;does not exist&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not one function — the whole pipeline forks
&lt;/h2&gt;

&lt;p&gt;People assume &lt;code&gt;torch_xla&lt;/code&gt; abstracts the hardware away because &lt;code&gt;xm.xla_device()&lt;/code&gt; transparently targets both TPU and Trainium (thanks to the shared OpenXLA/PJRT runtime — &lt;code&gt;libtpu.so&lt;/code&gt; for TPU, &lt;code&gt;libneuronpjrt.so&lt;/code&gt; for Neuron). That's true for &lt;code&gt;model.to(device)&lt;/code&gt; and basic ops. It is emphatically &lt;em&gt;not&lt;/em&gt; true for the parts that matter.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;forward&lt;/code&gt; signature itself diverges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NVIDIA forward: ragged data + boundary index. Length is arbitrary every call.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cu_seqlens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_seqlen&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flash_attn_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cu_seqlens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_seqlen&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Static forward: fixed rectangle + a mask matrix you must build yourself.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;  &lt;span class="c1"&gt;# input_ids is [batch, FixedSeqLen]
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;static_attn_func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;attention_mask&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it cascades all the way down:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;NVIDIA pipeline&lt;/th&gt;
&lt;th&gt;Trainium pipeline&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Inference engine&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;vLLM&lt;/code&gt; (CUDA), &lt;code&gt;TensorRT-LLM&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;NxD&lt;/code&gt; / &lt;code&gt;vllm-neuron&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom kernels&lt;/td&gt;
&lt;td&gt;Triton, CUDA C++ (&lt;code&gt;FlashAttention&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;NKI (Neuron Kernel Interface), rewritten from scratch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Base image&lt;/td&gt;
&lt;td&gt;&lt;code&gt;nvcr.io/nvidia/pytorch&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;AWS Neuron DLC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI build artifact&lt;/td&gt;
&lt;td&gt;weights + CUDA/Triton binaries&lt;/td&gt;
&lt;td&gt;weights + &lt;strong&gt;NEFF static binaries per bucket&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy target&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;g5&lt;/code&gt; / &lt;code&gt;p5&lt;/code&gt; instances&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;trn1&lt;/code&gt; / &lt;code&gt;inf2&lt;/code&gt; instances&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;nvidia-smi&lt;/code&gt;, DCGM exporter&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;neuron-top&lt;/code&gt;, Neuron exporter&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two completely parallel worlds. Your CUDA container, your eval scripts, your autoscaling triggers — none of it carries over. vLLM's hardware-plugin mechanism gives you "one skin" at the business-logic layer, but the engine underneath is 100% separate code with separate bugs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precision makes it worse
&lt;/h3&gt;

&lt;p&gt;The data-type story isn't symmetric either. BF16 (which Google's TPU pioneered) is stable on both sides — its FP32-range exponent survives the &lt;code&gt;-1e9&lt;/code&gt; mask values without going NaN. But FP8, the current throughput play, favors NVIDIA: FP8 attention scores swing hard and need &lt;strong&gt;dynamic scaling&lt;/strong&gt; at runtime to avoid clipping. A static compiler has to bake in a fixed scale factor at compile time, so on TPU/Trainium aggressive FP8 attention risks clipping that degrades model quality. "Just switch to FP8" is a one-liner on NVIDIA and a research project on static silicon.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden cost: your org chart breaks
&lt;/h2&gt;

&lt;p&gt;This is the part that kills adoption and nobody puts on a slide. On NVIDIA there's a clean abstraction boundary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ AI engineer / data scientist ]
   architecture, hyperparams, eval
        │
        ▼  boundary: Hugging Face weights / standard PyTorch
        │
[ MLOps / LLMOps engineer ]
   drop into vLLM, configure PagedAttention, scale out
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data scientist never thinks about memory layout. The MLOps engineer never reads the attention math. They ship artifacts across a clean interface.&lt;/p&gt;

&lt;p&gt;On TPU that wall &lt;strong&gt;disappears&lt;/strong&gt;, because model structure is directly coupled to physical constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The packing scheme (MLOps) and the segment-mask logic inside &lt;code&gt;forward&lt;/code&gt; (AI engineer) are two halves of one design. Change the batching strategy and the math has to change in lockstep. You cannot split that across a spec doc.&lt;/li&gt;
&lt;li&gt;An AI engineer casually adding an &lt;code&gt;if&lt;/code&gt; branch or changing layer count alters the compiled graph topology — and triggers JIT stalls or OOM in production. Debugging that requires dumping the XLA HLO graph, which pulls the AI engineer into an "infra" incident.&lt;/li&gt;
&lt;li&gt;"BF16 → FP8 for 2x throughput" (MLOps) collides head-on with "FP8 static scaling causes hallucinations on certain tasks" (data scientist). On NVIDIA the runtime negotiates this for you. On TPU the two humans have to negotiate it face to face.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The organizations furthest along on TPU/Trainium — Google's Gemini team (custom silicon end to end), Anthropic's Claude team, and increasingly Meta, which began renting Google TPUs in 2026 to test Llama on both training and inference — lean away from the horizontal "data science dept / infra dept" split entirely. They run a single vertically-integrated team of people fluent in &lt;em&gt;both&lt;/em&gt; the attention math and the compiler internals. Most companies cannot staff that, and the projects that try to keep the old division of labor die in a pile of compile errors and OOMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  So why does anyone use them? Because the input is locked
&lt;/h2&gt;

&lt;p&gt;The whole calculus flips when &lt;strong&gt;you control the input channel&lt;/strong&gt; so the shapes are predictable. Two clean examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google / YouTube summaries.&lt;/strong&gt; The exact internal pipeline isn't public, but the shape is forced by the constraints: Google doesn't re-watch the video. At upload time, an async batch job (on spare TPU cycles) runs ASR and stores timestamped text in storage like Bigtable. When you ask for a summary, the exact text length is &lt;em&gt;already known down to the token&lt;/em&gt; — so the router picks a just-right bucket, packing waste is near zero, and a light model like Gemini Flash scans pre-computed text. The "summarize a 2-hour video instantly" magic is really "scan a tiny text index that was built months ago for nearly free."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic / Claude Code.&lt;/strong&gt; A CLI coding agent has an almost fully determined input: repo structure, tool definitions, git diffs, system prompt. The first ~90% of the context is invariant, which is exactly what static compilation and prompt caching love. Anthropic in fact serves Claude across a &lt;em&gt;mix&lt;/em&gt; of Trainium, TPU, and NVIDIA — matching workloads to the most suitable chip — and runs Trainium fleets at scale (&lt;code&gt;neuronx-distributed&lt;/code&gt;); a high-throughput Go/C++ packing proxy is the natural front-end for the static path, though Anthropic hasn't published the exact per-product split. Claude Code is — read cynically — close to the perfect input-locking channel that makes a Java-style chip worth the pain. Long-context workloads help too: a 200K-token prefill packs many buckets back-to-back, so the &lt;em&gt;relative&lt;/em&gt; padding waste shrinks toward zero — the static array's weakness fades exactly where Claude is strongest.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The inverse is just as logical, and it explains why the &lt;em&gt;chat&lt;/em&gt; UIs lean hardest on dynamic SIMT hardware. ChatGPT and Claude.ai's web frontends accept arbitrary text, surprise image uploads, and topic switches mid-conversation. The system can't predict the shape until the user hits send. That chaos is precisely what dynamic SIMT + &lt;code&gt;PagedAttention&lt;/code&gt; were built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TPUs aren't unpopular because they're slow or expensive — they're cheaper per token.&lt;/strong&gt; They're unpopular because cheapness is conditional on a discipline most teams can't enforce: every tensor shape fixed at compile time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The cost moved, it didn't vanish.&lt;/strong&gt; Static silicon pushes all the uncertainty out of the hardware and onto your software (packing, masking, bucket routing) and your people (collapsed dev/ops boundary). You trade CapEx (silicon, power) for OpEx (elite engineers maintaining hack layers).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The decision rule is about the channel, not the chip.&lt;/strong&gt; If you own the input — a CLI, a fixed business workflow, your own storage pipeline — TPU/Trainium are a weapon. If your input is a free-form chat box or a third-party API integration, NVIDIA (or AMD) is the only sane choice, and reaching for TPU on EC2-sticker-price alone is how MFU quietly collapses to single digits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The spec sheet was never lying about cost-per-token. It just wasn't pricing in the engineers, the forked pipeline, and the org redesign you have to buy first.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>python</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Upgrading Google ADK to 2.0 on a Cloud SQL Postgres Backend: The Three Things That Bit Us</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Thu, 04 Jun 2026 12:21:39 +0000</pubDate>
      <link>https://dev.to/toyama0919/upgrading-google-adk-to-20-on-a-cloud-sql-postgres-backend-the-three-things-that-bit-us-43ff</link>
      <guid>https://dev.to/toyama0919/upgrading-google-adk-to-20-on-a-cloud-sql-postgres-backend-the-three-things-that-bit-us-43ff</guid>
      <description>&lt;p&gt;We run an agent built on &lt;a href="https://pypi.org/project/google-adk/" rel="noopener noreferrer"&gt;Google's Agent Development Kit (ADK)&lt;/a&gt;, deployed on Cloud Run with a Cloud SQL (PostgreSQL) session store via ADK's &lt;code&gt;DatabaseSessionService&lt;/code&gt;. Bumping &lt;code&gt;google-adk&lt;/code&gt; from 1.x to &lt;code&gt;&amp;gt;=2.0.0&lt;/code&gt; looked like a one-line dependency change. It wasn't.&lt;/p&gt;

&lt;p&gt;Three things bit us, in increasing order of subtlety:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ADK 2.0 talks to Postgres through &lt;strong&gt;asyncpg&lt;/strong&gt;, which forces a connection-URL change — and that URL is shared with sync code.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;events&lt;/code&gt; table needs &lt;strong&gt;two new columns&lt;/strong&gt; that ADK 2.0 reads unconditionally. Deploy without them and chat silently 500s.&lt;/li&gt;
&lt;li&gt;The legacy &lt;strong&gt;v0 (Pickle) schema&lt;/strong&gt; still works, but throws a deprecation warning. Migrating to v1 (JSON) is optional and &lt;em&gt;cannot&lt;/em&gt; be done in place.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's the field report.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The async driver switch — and the URL you now share with sync code
&lt;/h2&gt;

&lt;p&gt;ADK 2.0's session service is async and expects an async Postgres driver. In practice that means your &lt;code&gt;DATABASE_URL&lt;/code&gt; changes scheme:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;appuser&lt;/span&gt;&lt;span class="p"&gt;:...&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;host&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;          &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;
&lt;span class="n"&gt;postgresql&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;asyncpg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;appuser&lt;/span&gt;&lt;span class="p"&gt;:...&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;host&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;   &lt;span class="o"&gt;#&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Easy enough — update the secret, redeploy. The catch is that the &lt;em&gt;same&lt;/em&gt; URL is read by code that is &lt;strong&gt;not&lt;/strong&gt; async. We have custom storage (token storage, pending-state storage) built on plain synchronous SQLAlchemy, and &lt;code&gt;create_engine()&lt;/code&gt; does not understand &lt;code&gt;+asyncpg&lt;/code&gt;. Feed it the 2.0 URL and it tries to import an async driver into a sync engine and falls over.&lt;/p&gt;

&lt;p&gt;The fix is a tiny normalization layer: store the async URL (because ADK is the primary consumer), and strip the driver suffix at the point where sync engines are created.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.engine&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_sync_db_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Normalize an async-driver URL for use with a sync SQLAlchemy engine.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_db_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Engine&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;_sync_db_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_url&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;pool_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_overflow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pool_pre_ping&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pool_recycle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The design decision worth calling out: &lt;strong&gt;one URL, normalized at the edge&lt;/strong&gt; rather than two secrets. ADK gets the &lt;code&gt;+asyncpg&lt;/code&gt; form it wants; every sync consumer goes through &lt;code&gt;create_db_engine()&lt;/code&gt; and gets the driver suffix stripped. The &lt;code&gt;replace(..., 1)&lt;/code&gt; only touches the scheme, so passwords containing the literal substring are safe. If you have any synchronous DB access alongside ADK 2.0, you need a shim like this — otherwise the async URL leaks into &lt;code&gt;create_engine()&lt;/code&gt; and you get an import error at startup that looks unrelated to the upgrade.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The missing event columns — a silent 500 in production
&lt;/h2&gt;

&lt;p&gt;This is the one that actually took the service down in our dev environment before we caught it.&lt;/p&gt;

&lt;p&gt;ADK 2.0 added two columns to the &lt;code&gt;events&lt;/code&gt; table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;input_transcription&lt;/span&gt;  &lt;span class="n"&gt;jsonb&lt;/span&gt;
&lt;span class="n"&gt;output_transcription&lt;/span&gt;  &lt;span class="n"&gt;jsonb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ADK 2.0 reads these columns unconditionally on session GET and on the &lt;code&gt;/run_sse&lt;/code&gt; streaming endpoint. If your database was created under 1.x, the columns don't exist, and Postgres raises &lt;code&gt;UndefinedColumnError&lt;/code&gt;. The symptom is not a clear startup crash — the container boots fine, &lt;code&gt;/health&lt;/code&gt; returns 200 — but &lt;strong&gt;every chat turn 500s&lt;/strong&gt; and session reads fail. We reproduced it in dev as exactly that: healthy container, dead chat.&lt;/p&gt;

&lt;p&gt;The fix is a forward-compatible &lt;code&gt;ALTER TABLE&lt;/code&gt; that you must run &lt;strong&gt;before&lt;/strong&gt; deploying the 2.0 image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;input_transcription&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;output_transcription&lt;/span&gt; &lt;span class="n"&gt;jsonb&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;IF NOT EXISTS&lt;/code&gt; makes it idempotent, and adding nullable columns is non-blocking on Postgres — no table rewrite, safe on a live DB. The ordering matters: patch the DB first, then deploy. Do it the other way and you have a window where the new image is live against the old schema and chat is down.&lt;/p&gt;

&lt;p&gt;Connecting through the Cloud SQL Auth Proxy, the whole patch is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cloud_sql_proxy &lt;span class="nt"&gt;-instances&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;PROJECT:asia-northeast1:INSTANCE&lt;span class="o"&gt;=&lt;/span&gt;tcp:127.0.0.1:15433 &amp;amp;

&lt;span class="nv"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DB_PASSWORD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; psql &lt;span class="nt"&gt;-h&lt;/span&gt; 127.0.0.1 &lt;span class="nt"&gt;-p&lt;/span&gt; 15433 &lt;span class="nt"&gt;-U&lt;/span&gt; appuser &lt;span class="nt"&gt;-d&lt;/span&gt; appdb &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;SQL&lt;/span&gt;&lt;span class="sh"&gt;'
ALTER TABLE events ADD COLUMN IF NOT EXISTS input_transcription jsonb;
ALTER TABLE events ADD COLUMN IF NOT EXISTS output_transcription jsonb;
SELECT column_name FROM information_schema.columns
WHERE table_name = 'events'
  AND column_name IN ('input_transcription', 'output_transcription');
-- expect 2 rows
&lt;/span&gt;&lt;span class="no"&gt;SQL
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good news for rollback: these columns are ignored by ADK 1.x, so adding them doesn't break the old version. You can patch ahead of time without committing to the upgrade.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The v0 → v1 schema migration is optional (and you probably want to defer it)
&lt;/h2&gt;

&lt;p&gt;On startup, ADK 2.0 logs this if your DB was created under 1.x:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The database is using the legacy v0 schema, which uses Pickle to serialize
event actions. The v0 schema will not be supported going forward and will be
deprecated in a few rollouts. Please migrate to the v1 schema which uses JSON
serialization for event data.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key realization: &lt;strong&gt;ADK 2.0 reads and writes v0 fine.&lt;/strong&gt; This is a deprecation warning, not a hard requirement. We chose to run 2.0 on the v0 schema and defer the migration — the upgrade and the migration are independent decisions, and decoupling them shrinks the risky deploy.&lt;/p&gt;

&lt;p&gt;When you do migrate, the important constraint is that it &lt;strong&gt;cannot be done in place.&lt;/strong&gt; The schemas are structurally different:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;
&lt;code&gt;events&lt;/code&gt; column&lt;/th&gt;
&lt;th&gt;v0&lt;/th&gt;
&lt;th&gt;v1&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;actions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;bytea&lt;/code&gt; (Pickle)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;event_data&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;jsonb&lt;/code&gt; (all event data)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;metadata table&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;&lt;code&gt;adk_internal_metadata&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;v0 stores event actions as individual columns plus a pickled blob; v1 collapses everything into one &lt;code&gt;event_data&lt;/code&gt; JSONB column. Because the column set changes, ADK ships a migration command that reads from one DB and writes to a freshly created one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CREATE DATABASE can't run inside a transaction — separate statement&lt;/span&gt;
psql ... &lt;span class="nt"&gt;-d&lt;/span&gt; postgres &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"CREATE DATABASE appdb_v1;"&lt;/span&gt;

&lt;span class="nv"&gt;SOURCE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgresql://appuser:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PW&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@127.0.0.1:15433/appdb"&lt;/span&gt;
&lt;span class="nv"&gt;DEST_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"postgresql://appuser:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PW&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;@127.0.0.1:15433/appdb_v1"&lt;/span&gt;

uv run adk migrate session &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source_db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SOURCE_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dest_db_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DEST_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;adk migrate session&lt;/code&gt; covers ADK's own four tables: &lt;code&gt;app_states&lt;/code&gt;, &lt;code&gt;user_states&lt;/code&gt;, &lt;code&gt;sessions&lt;/code&gt;, &lt;code&gt;events&lt;/code&gt;. Anything you added yourself (OAuth tokens, app-specific state) is &lt;em&gt;not&lt;/em&gt; touched and has to be copied separately — but that's outside ADK's scope and outside this post.&lt;/p&gt;

&lt;p&gt;Verify the destination after migrating:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1 means v1&lt;/span&gt;
psql ... &lt;span class="nt"&gt;-d&lt;/span&gt; appdb_v1 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"SELECT value FROM adk_internal_metadata WHERE key='schema_version';"&lt;/span&gt;

&lt;span class="c"&gt;# event_data present, actions gone&lt;/span&gt;
psql ... &lt;span class="nt"&gt;-d&lt;/span&gt; appdb_v1 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"SELECT column_name FROM information_schema.columns WHERE table_name='events';"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cut over by repointing the connection secret at the new DB and redeploying. Because you migrated into a &lt;em&gt;new&lt;/em&gt; database, the original is untouched — rollback is just repointing the secret back. No data loss, no destructive step until you're confident.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deploy order that actually works
&lt;/h2&gt;

&lt;p&gt;Pulling it together, the sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Patch the DB&lt;/strong&gt; (&lt;code&gt;ALTER TABLE events ...&lt;/code&gt;) — before anything else, to prevent the 500 window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch the URL&lt;/strong&gt; to &lt;code&gt;postgresql+asyncpg://&lt;/code&gt; (and make sure sync consumers normalize it back).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy&lt;/strong&gt; the 2.0 image.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smoke test&lt;/strong&gt;: &lt;code&gt;/health&lt;/code&gt; → 200, an existing session GET → not 500, a new &lt;code&gt;/run_sse&lt;/code&gt; chat → streams a response.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;(Optional, later)&lt;/em&gt; migrate v0 → v1 into a new DB and cut over.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Gotchas worth pinning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;pg_dump&lt;/code&gt; version skew.&lt;/strong&gt; Don't reach for &lt;code&gt;pg_dump&lt;/code&gt; to copy data if your local client is older than the Cloud SQL server (e.g. client 16 vs server 17) — it just refuses. Either match versions or copy via a script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;CREATE DATABASE&lt;/code&gt; outside a transaction.&lt;/strong&gt; It can't run inside one, so it has to be its own statement — not bundled into a &lt;code&gt;BEGIN ... COMMIT&lt;/code&gt; block with the grants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session compatibility across versions.&lt;/strong&gt; Sessions written by 2.0 may not be readable by 1.x (especially older 1.x). Treat the version downgrade as lossy for any session created after cutover, and keep the old image only as a short-term escape hatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/health&lt;/code&gt; lies.&lt;/strong&gt; A 200 from your health check says nothing about whether the schema matches. Smoke-test an actual session read and a real chat turn.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;google-adk&lt;/code&gt; 2.0 bump is small on paper and sharp in practice. The async driver switch ripples into any sync DB code sharing the URL; the new &lt;code&gt;events&lt;/code&gt; columns turn a healthy-looking container into a chat outage if you deploy before patching; and the v0 deprecation warning is loud but not load-bearing — you can stay on v0 and migrate on your own schedule into a fresh DB. Patch first, normalize the URL at the edge, smoke-test the real path, and treat the schema migration as a separate project.&lt;/p&gt;

</description>
      <category>python</category>
      <category>gcp</category>
      <category>postgres</category>
      <category>ai</category>
    </item>
    <item>
      <title>Chrome 126+ Broke My WXT Extension Dev Setup — Here's What Changed and How to Fix It</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Tue, 02 Jun 2026 12:12:47 +0000</pubDate>
      <link>https://dev.to/toyama0919/chrome-126-broke-my-wxt-extension-dev-setup-heres-what-changed-and-how-to-fix-it-24ak</link>
      <guid>https://dev.to/toyama0919/chrome-126-broke-my-wxt-extension-dev-setup-heres-what-changed-and-how-to-fix-it-24ak</guid>
      <description>&lt;p&gt;I spent a weekend debugging a Chrome extension dev environment that stopped working after a Chrome update. No error messages. The extension loaded — I could open its options page — but content scripts never ran, service workers never started, and the UI stayed unchanged.&lt;/p&gt;

&lt;p&gt;This post is about three separate failure modes I hit, why each happens, and the minimal fix for each.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The project uses &lt;a href="https://wxt.dev/" rel="noopener noreferrer"&gt;WXT&lt;/a&gt; (a Chrome extension framework built on Vite) with a custom &lt;code&gt;dev.sh&lt;/code&gt; script that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Starts the WXT dev server for hot reload&lt;/li&gt;
&lt;li&gt;Launches a dedicated Chrome instance with the extension loaded&lt;/li&gt;
&lt;li&gt;Exposes Chrome's remote debugging port for MCP/DevTools access&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A clean dev-build → chrome-start → edit-reload loop. Or so it was.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 1: &lt;code&gt;--load-extension&lt;/code&gt; No Longer Starts Service Workers in Chrome 126+
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What broke
&lt;/h3&gt;

&lt;p&gt;Chrome has a flag &lt;code&gt;--load-extension=/path/to/ext&lt;/code&gt; that loads an unpacked extension at startup. Before Chrome 126, this worked well for local development. After Chrome 126, the extension appears to load — &lt;code&gt;chrome-extension://ID/options.html&lt;/code&gt; is accessible — but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The background service worker never appears in &lt;code&gt;Target.getTargets&lt;/code&gt; CDP results&lt;/li&gt;
&lt;li&gt;Content scripts declared in &lt;code&gt;manifest.content_scripts&lt;/code&gt; are not injected&lt;/li&gt;
&lt;li&gt;The extension is not written to the Chrome profile's &lt;code&gt;Secure Preferences&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last point is the key. With &lt;code&gt;--load-extension&lt;/code&gt;, Chrome 126+ treats the extension as ephemeral. It's accessible as a filesystem resource but not actually "installed" in the profile, so Chrome's normal extension machinery (service worker lifecycle, content script injection) doesn't activate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why web-ext switched to &lt;code&gt;Extensions.loadUnpacked&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/mozilla/web-ext" rel="noopener noreferrer"&gt;web-ext&lt;/a&gt; project documented this in their issue tracker and switched away from &lt;code&gt;--load-extension&lt;/code&gt; for Chrome 126+. Their new approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start Chrome with &lt;code&gt;--remote-debugging-pipe&lt;/code&gt; and &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Call the &lt;code&gt;Extensions.loadUnpacked&lt;/code&gt; CDP command via the pipe&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;Extensions.loadUnpacked&lt;/code&gt; writes to &lt;code&gt;Secure Preferences&lt;/code&gt; just like clicking "Load unpacked" in &lt;code&gt;chrome://extensions/&lt;/code&gt;. Once written, the extension is a real installed extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critical caveat&lt;/strong&gt;: &lt;code&gt;Extensions.loadUnpacked&lt;/code&gt; is only available via pipe-based CDP (&lt;code&gt;--remote-debugging-pipe&lt;/code&gt;), not via WebSocket-based CDP (&lt;code&gt;--remote-debugging-port&lt;/code&gt;). Connecting via port returns &lt;code&gt;"Method not available."&lt;/code&gt; even with &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The simpler fix
&lt;/h3&gt;

&lt;p&gt;After trying the pipe approach, I discovered something: if you already have the extension registered in &lt;code&gt;Secure Preferences&lt;/code&gt; from a previous pipe-based install, you can start Chrome normally (no &lt;code&gt;--load-extension&lt;/code&gt;) with &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt; and it loads from the profile automatically.&lt;/p&gt;

&lt;p&gt;But there's an even simpler path: &lt;code&gt;--load-extension&lt;/code&gt; + &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt; together. Testing showed that when &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt; is present, Chrome treats &lt;code&gt;--load-extension&lt;/code&gt; extensions as real installs and injects content scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHROME_BINARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user-data-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;USER_DATA_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--remote-debugging-port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;9222 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load-extension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EXTENSION_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-unsafe-extension-debugging&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire fix for service worker / content script injection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 2: WXT Dev Server Exits Immediately When Backgrounded
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What broke
&lt;/h3&gt;

&lt;p&gt;The dev script ran WXT in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev &amp;amp;
&lt;span class="nv"&gt;WXT_PID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$!&lt;/span&gt;
&lt;span class="c"&gt;# ... wait for build, start Chrome ...&lt;/span&gt;
&lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WXT_PID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WXT would build the extension, print "Load manually", and exit. The &lt;code&gt;wait&lt;/code&gt; returned. The dev server was gone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why
&lt;/h3&gt;

&lt;p&gt;WXT uses Node.js &lt;code&gt;readline&lt;/code&gt; for its interactive keyboard shortcuts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;rl&lt;/span&gt; &lt;span class="o"&gt;??=&lt;/span&gt; &lt;span class="nx"&gt;readline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createInterface&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;npm run dev &amp;amp;&lt;/code&gt; backgrounds the process, &lt;code&gt;process.stdin&lt;/code&gt; is connected to &lt;code&gt;/dev/null&lt;/code&gt;. &lt;code&gt;readline&lt;/code&gt; immediately gets EOF, emits &lt;code&gt;close&lt;/code&gt;, and WXT exits.&lt;/p&gt;

&lt;p&gt;This only manifests when WXT is backgrounded — running it in the foreground is fine. But backgrounding is necessary because you need to start Chrome after the build completes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Feed a non-closing stream to the process's stdin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev &amp;lt; &amp;lt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /dev/null&lt;span class="o"&gt;)&lt;/span&gt; &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tail -f /dev/null&lt;/code&gt; follows an empty file indefinitely, never sending data and never closing. WXT's stdin stays open. readline never gets EOF. WXT keeps running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem 3: web-ext Injects Flags That Break Google Login
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What broke
&lt;/h3&gt;

&lt;p&gt;WXT's runner (web-ext) was starting Chrome correctly — but Google account login was consistently logged out on every restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why
&lt;/h3&gt;

&lt;p&gt;web-ext uses &lt;code&gt;chrome-launcher&lt;/code&gt; for Chrome startup. &lt;code&gt;chrome-launcher&lt;/code&gt;'s &lt;code&gt;defaultFlags()&lt;/code&gt; includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--disable-sync&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--use-mock-keychain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--use-mock-keychain&lt;/code&gt; is the destructive one. On macOS, Chrome encrypts cookies using the system keychain. &lt;code&gt;--use-mock-keychain&lt;/code&gt; substitutes a fake keychain with a different encryption key. Cookies encrypted with the real keychain cannot be decrypted with the mock one and vice versa.&lt;/p&gt;

&lt;p&gt;Once Chrome writes cookies with the mock keychain, subsequent Chrome starts (without the flag) cannot read them. Login state is destroyed.&lt;/p&gt;

&lt;p&gt;web-ext excludes &lt;code&gt;--disable-extensions&lt;/code&gt;, &lt;code&gt;--mute-audio&lt;/code&gt;, and &lt;code&gt;--disable-component-update&lt;/code&gt; from chrome-launcher's defaults — but not &lt;code&gt;--disable-sync&lt;/code&gt; or &lt;code&gt;--use-mock-keychain&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Disable WXT's web-ext runner entirely and launch Chrome manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// wxt.config.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;webExt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start Chrome from &lt;code&gt;dev.sh&lt;/code&gt; with exactly the flags you need and nothing else.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bonus: WXT Dev Mode Removes &lt;code&gt;content_scripts&lt;/code&gt; from the Manifest
&lt;/h2&gt;

&lt;p&gt;This one isn't a Chrome regression — it's WXT's intentional behavior that became a problem once service workers stopped starting.&lt;/p&gt;

&lt;p&gt;In dev mode, WXT strips &lt;code&gt;content_scripts&lt;/code&gt; from &lt;code&gt;manifest.json&lt;/code&gt; and relies on the background service worker to register them dynamically via &lt;code&gt;chrome.scripting.registerContentScripts()&lt;/code&gt;. The service worker connects to WXT's dev server and WXT sends reload commands.&lt;/p&gt;

&lt;p&gt;When the service worker doesn't start (Problem 1), this entire chain breaks. Content scripts are never registered.&lt;/p&gt;

&lt;p&gt;Fix: use WXT's &lt;code&gt;build:manifestGenerated&lt;/code&gt; hook to add &lt;code&gt;content_scripts&lt;/code&gt; back to the dev manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;build:manifestGenerated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;wxt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Also strip the localhost CSP entry WXT adds for HMR — Chrome MV3&lt;/span&gt;
    &lt;span class="c1"&gt;// rejects http:// origins in extension_pages CSP.&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;csp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content_security_policy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;csp&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;extension_pages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;csp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extension_pages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;csp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extension_pages&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*http:&lt;/span&gt;&lt;span class="se"&gt;\/\/&lt;/span&gt;&lt;span class="sr"&gt;localhost:&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;0-9&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;wxt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;serve&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content_scripts&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;manifest&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;content_scripts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="na"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-target-site.com/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;run_at&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;document_end&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;js&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-scripts/content.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;}];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This also strips &lt;code&gt;http://localhost:3000&lt;/code&gt; from the &lt;code&gt;extension_pages&lt;/code&gt; CSP. Chrome MV3 forbids HTTP origins in that directive; WXT adds it for Vite HMR, but it may silently break extension loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Picture
&lt;/h2&gt;

&lt;p&gt;Three independent changes colliding:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Service worker / content scripts not running&lt;/td&gt;
&lt;td&gt;Chrome 126+ changed &lt;code&gt;--load-extension&lt;/code&gt; to not register extensions in profile&lt;/td&gt;
&lt;td&gt;Add &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WXT dev server exits immediately&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;readline&lt;/code&gt; on backgrounded stdin gets EOF&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm run dev &amp;lt; &amp;lt;(tail -f /dev/null) &amp;amp;&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google login lost on restart&lt;/td&gt;
&lt;td&gt;web-ext injects &lt;code&gt;--use-mock-keychain&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Disable WXT runner, launch Chrome manually&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content scripts not injected in dev&lt;/td&gt;
&lt;td&gt;WXT removes &lt;code&gt;content_scripts&lt;/code&gt; from dev manifest&lt;/td&gt;
&lt;td&gt;Restore via &lt;code&gt;build:manifestGenerated&lt;/code&gt; hook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these produce meaningful error messages. The extension "loads" in the sense that its static pages are accessible. Everything else silently fails. The debugging path was: CDP &lt;code&gt;Target.getTargets&lt;/code&gt; to check for service workers, &lt;code&gt;Secure Preferences&lt;/code&gt; inspection to check if the extension was actually installed, and process-level stdin inspection to find the WXT exit cause.&lt;/p&gt;

&lt;p&gt;The startup order and flags are what matter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start the dev server in the background, feeding stdin from &lt;code&gt;tail -f /dev/null&lt;/code&gt; so it stays alive&lt;/li&gt;
&lt;li&gt;Wait for the build to finish&lt;/li&gt;
&lt;li&gt;Wait for the dev server to initialize before launching Chrome (so the service worker can connect)&lt;/li&gt;
&lt;li&gt;Launch Chrome with both &lt;code&gt;--load-extension&lt;/code&gt; and &lt;code&gt;--enable-unsafe-extension-debugging&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev &amp;lt; &amp;lt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /dev/null&lt;span class="o"&gt;)&lt;/span&gt; &amp;amp;
&lt;span class="nv"&gt;WXT_PID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$!&lt;/span&gt;

&lt;span class="c"&gt;# wait for build → wait for dev server → launch Chrome&lt;/span&gt;
&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CHROME_BINARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user-data-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;USER_DATA_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--remote-debugging-port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DEBUG_PORT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--load-extension&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;EXTENSION_DIR&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-unsafe-extension-debugging&lt;/span&gt; &amp;amp;

&lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$WXT_PID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>chrome</category>
      <category>webextension</category>
      <category>wxt</category>
      <category>debugging</category>
    </item>
    <item>
      <title>Cursor vs Claude: The Business Models Behind the 10x Price Gap</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Thu, 07 May 2026 08:25:35 +0000</pubDate>
      <link>https://dev.to/toyama0919/cursor-vs-claude-the-business-models-behind-the-10x-price-gap-3lj7</link>
      <guid>https://dev.to/toyama0919/cursor-vs-claude-the-business-models-behind-the-10x-price-gap-3lj7</guid>
      <description>&lt;p&gt;The &lt;a href="https://dev.to/hiroshi/cursor-composer-2-the-cache-economy-behind-a-10x-cheaper-coding-agent-3600094"&gt;previous post&lt;/a&gt; covered Composer 2's cache mechanics and the Standard/Fast split. This one goes one level deeper: &lt;em&gt;why&lt;/em&gt; the price gap exists structurally, and what it predicts about where AI model markets are heading.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Business Models
&lt;/h2&gt;

&lt;p&gt;The $0.50 vs $5.00 price gap between Composer 2 Standard and Claude Opus isn't primarily about model size. It's about two fundamentally different business models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic/OpenAI:&lt;/strong&gt; Build the most capable general-purpose model possible. License it as an API to anyone who wants to use it—enterprises, startups, individual developers. The general-purpose nature requires maintaining capabilities across every domain: legal reasoning, creative writing, mathematics, programming, ethics, philosophy. Margin on each API call covers model development, infrastructure, and business overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor/Anysphere:&lt;/strong&gt; Build a model &lt;em&gt;only&lt;/em&gt; for one product—Cursor. No external API to sell. No licensing fees to pay. No reason to maintain capabilities outside software development. The specialized training means stripping out everything that isn't code, resulting in a dramatically smaller model that's cheaper to serve.&lt;/p&gt;

&lt;p&gt;The math follows directly. Composer 2 is trained exclusively on coding data via continued pre-training and reinforcement learning. Claude Opus maintains the ability to pass bar exams, write poetry, explain quantum mechanics, and argue ethics. You're paying for all of that whether you use it or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cache Write Tax
&lt;/h2&gt;

&lt;p&gt;This business model difference shows up most concretely in cache write pricing.&lt;/p&gt;

&lt;p&gt;Claude's prompt caching has three cost components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cache write&lt;/strong&gt;: 1.25× the base input price&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache read&lt;/strong&gt;: ~10% of the base input price&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normal input&lt;/strong&gt;: base price&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That cache write surcharge exists because Anthropic is taking on the cost and risk of maintaining cached data for an external customer. They don't know what you'll cache, how long it'll stay relevant, or whether you'll return in 5 minutes or 5 days. The 1.25× write rate is essentially an infrastructure risk premium embedded in the API pricing.&lt;/p&gt;

&lt;p&gt;Composer 2's actual usage data tells a completely different story:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input (w/ Cache Write)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input (w/o Cache Write)&lt;/td&gt;
&lt;td&gt;15,018&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache Read&lt;/td&gt;
&lt;td&gt;391,424&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;Input (w/ Cache Write)&lt;/code&gt; column is zero across every single Composer 2 request. Anysphere runs Composer 2 on their own servers, optimized for exactly one workload: Cursor's codebase-heavy sessions. There's no external API infrastructure risk to price in. The cache write surcharge simply doesn't exist.&lt;/p&gt;

&lt;p&gt;For Claude Opus users on Cursor, the same column is non-zero. Even though Cursor proxies the request, it still hits Anthropic's API and incurs the write premium.&lt;/p&gt;

&lt;p&gt;The practical effect: on a new session with a large codebase, Claude Opus users pay an entry fee (cache write at 1.25× rate) that Composer 2 users never encounter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Luxury Engineering
&lt;/h2&gt;

&lt;p&gt;A useful framing emerges from analyzing actual usage patterns: &lt;strong&gt;Luxury Engineering&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Using Claude Opus for routine coding tasks is the AI equivalent of hiring a full professor to write unit tests. The professor is qualified—arguably overqualified. They could do it. But you're paying for decades of expertise in domains completely irrelevant to the task: literature, philosophy, ethics, history. That overhead is embedded in every token.&lt;/p&gt;

&lt;p&gt;Composer 2 is more like a developer who has done nothing but code their entire career. No breadth, extraordinary depth in the one domain that matters. Because of that specialization, cost is 1/10th.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Model Landscape
&lt;/h2&gt;

&lt;p&gt;Looking at the complete pricing picture (2026):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Cache Write&lt;/th&gt;
&lt;th&gt;Cache Read&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Composer 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;none&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2.50&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.3 Codex&lt;/td&gt;
&lt;td&gt;$1.75&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;$0.175&lt;/td&gt;
&lt;td&gt;$14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4.20&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 4.6 Sonnet&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$3.75 (1h: $6.00)&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.5&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 4.7 Opus&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$6.25 (1h: $10.00)&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$25&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GPT-5.3 Codex being significantly cheaper than GPT-5.5 follows the same logic: Codex uses continued pre-training on code data to reduce model weight, and the price difference is essentially "the cost of maintaining the ability to write poetry."&lt;/p&gt;

&lt;p&gt;Two patterns stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Claude cache write anomaly.&lt;/strong&gt; Only Claude models carry an explicit cache write surcharge. Every other model in this list (including Composer 2) absorbs the write cost into the base price or waives it entirely. This isn't a product limitation—it's a reflection of Claude's external API business model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Composer 2's output price.&lt;/strong&gt; $2.50/1M output is 10× cheaper than Claude Opus and 12× cheaper than GPT-5.5. Code generation produces significant output token volume. Composer 2's extreme output pricing means that long agentic sessions—the exact workloads it's designed for—don't hit a cost ceiling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code: The Name Is Misleading
&lt;/h2&gt;

&lt;p&gt;The name "Claude Code" implies a coding-specialized model. It isn't. Claude Code is Claude 4.6 Opus or Sonnet—the same general-purpose models available in Cursor—packaged as a CLI tool. The underlying architecture hasn't been pruned for code; it retains the full weight of a general-purpose frontier model.&lt;/p&gt;

&lt;p&gt;The cost implications are direct. Claude Code uses Anthropic's standard Prompt Caching, which means the cache write premium (1.25×) applies. The default cache TTL is 5 minutes—long enough to expire while you're running tests or reading docs between prompts. The &lt;code&gt;ENABLE_PROMPT_CACHING_1H=1&lt;/code&gt; flag extends it to one hour, but doubles the write cost in exchange.&lt;/p&gt;

&lt;p&gt;The "autonomous loop" (run tests → read failure → fix code → rerun) is frequently cited as a Claude Code advantage. It isn't unique to Claude Code. Cursor's agent mode executes the same loop via its sandboxed terminal integration. The practical difference is that Cursor's loop doesn't incur a cache write penalty on session start, and runs cache reads at $0.20/1M rather than Claude's ~$0.50/1M.&lt;/p&gt;

&lt;p&gt;Where Claude Code has a genuine edge: terminal-native workflows for developers using Vim, JetBrains, or any editor outside the Cursor ecosystem. If you're not using Cursor, Claude Code is the most capable CLI agent available. Within Cursor, the economic case for Claude Code over Composer 2 is thin for standard coding tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for the Future
&lt;/h2&gt;

&lt;p&gt;This structure predicts where AI model markets go.&lt;/p&gt;

&lt;p&gt;General-purpose frontier models have a structural cost floor. They have to maintain broad capabilities to justify API pricing across diverse customers. They have to earn margins on external licensing. They have to maintain the "impressive demo" factor that drives enterprise adoption.&lt;/p&gt;

&lt;p&gt;Specialized models built for a specific product have none of those constraints. Strip capability, reduce model size, optimize serving infrastructure, eliminate external API margins. The only question is whether sufficient domain quality can be achieved.&lt;/p&gt;

&lt;p&gt;Composer 2 answered that question for software development in March 2026. SWE-bench Multilingual score of 73.7, at 1/10th the cost of Claude Opus.&lt;/p&gt;

&lt;p&gt;The same economics will play out in other domains: legal AI products trained exclusively on case law and contracts; medical AI running on clinical literature with zero consumer chat capability; financial models stripped of everything except numerical reasoning and accounting standards. None of them need to know how to write a sonnet.&lt;/p&gt;

&lt;p&gt;The structural enabler in each case is the same: building a model for &lt;em&gt;one product&lt;/em&gt;, not for external licensing. That eliminates the margin layer and enables the infrastructure optimizations that make 5-10× price reduction possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Rational Selection Framework
&lt;/h2&gt;

&lt;p&gt;Given this analysis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Composer 2 Standard&lt;/strong&gt; for any multi-turn session against a codebase. Cache compound interest works in your favor: higher turn count → higher cache read ratio → lower effective cost per token. No cache write entry fee on session start.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Composer 2 Fast&lt;/strong&gt; for interactive sessions where latency matters more than per-token cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Opus or Claude 4.7&lt;/strong&gt; when you genuinely need cross-domain reasoning—architecture decisions involving organizational and technical trade-offs simultaneously, debugging scenarios requiring external systems understanding outside your loaded context, or when Composer 2 hits an explicit capability ceiling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From actual usage data: 88.3% cache read ratio on Composer 2 Standard, $0.19 average cost per request on ~390K token requests. The same request volume on Claude Opus: $0.90 average. The top Opus request cost $4.25—enough for 22 equivalent Composer 2 Standard sessions.&lt;/p&gt;

&lt;p&gt;The price gap isn't a temporary marketing discount. It's structural, rooted in business model differences that won't close without a fundamental change in how Anthropic operates. As long as Claude is an external API product, the cache write premium and the overhead of general-purpose training remain embedded in the price.&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>ai</category>
      <category>productivity</category>
      <category>codingtools</category>
    </item>
    <item>
      <title>Using llms.txt with Cursor and Claude Code: a concrete playbook</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Sun, 03 May 2026 11:56:30 +0000</pubDate>
      <link>https://dev.to/toyama0919/using-llmstxt-with-cursor-and-claude-code-a-concrete-playbook-4jln</link>
      <guid>https://dev.to/toyama0919/using-llmstxt-with-cursor-and-claude-code-a-concrete-playbook-4jln</guid>
      <description>&lt;p&gt;&lt;strong&gt;llms.txt&lt;/strong&gt; is a small text file on a documentation site—usually lists what the product is and links to the important Markdown pages. For coding agents, treat it as &lt;strong&gt;the canonical URL to open first&lt;/strong&gt; when upstream behavior is unclear. This post is mostly &lt;strong&gt;setup and workflow&lt;/strong&gt;, not theory.&lt;/p&gt;

&lt;h2&gt;
  
  
  What goes where
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;th&gt;Put this there&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Official doc server&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;https://example.com/llms.txt&lt;/code&gt; (maintained by the library/vendor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your repo&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;URLs only&lt;/strong&gt; (and short protocols), in agent rules—not a copy of their docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your repo &lt;code&gt;.cursor/rules/&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Project map, conventions, &lt;em&gt;your&lt;/em&gt; architecture—not Next.js’s full manual&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you paste thousands of tokens of upstream docs into rules, every chat pays for them. Keeping &lt;strong&gt;pointers&lt;/strong&gt; in rules and loading docs &lt;strong&gt;on demand&lt;/strong&gt; avoids that.&lt;/p&gt;

&lt;h2&gt;
  
  
  One-time setup: a dedicated rules file
&lt;/h2&gt;

&lt;p&gt;Create something like &lt;code&gt;.cursor/rules/external-llms-docs.md&lt;/code&gt; (name does not matter; keep it scoped). Paste a &lt;strong&gt;stable list&lt;/strong&gt; of llms.txt URLs your stack actually uses, grouped so humans and agents scan quickly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# External docs — fetch on demand&lt;/span&gt;

Use web fetch / browser / search tools to load these when implementing or debugging
third-party behavior. Do not paste full upstream docs into the chat.

&lt;span class="gu"&gt;## Index URLs (read these first)&lt;/span&gt;

| Area | llms.txt |
| --- | --- |
| Next.js | https://nextjs.org/llms.txt |
| Tailwind | https://tailwindcss.com/llms.txt |
| Lucide | https://lucide.dev/llms.txt |
| Google ADK | https://adk.dev/llms.txt |

&lt;span class="gu"&gt;## Read order&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Fetch the &lt;span class="gs"&gt;**llms.txt**&lt;/span&gt; for the dependency that owns the question.
&lt;span class="p"&gt;2.&lt;/span&gt; Follow &lt;span class="gs"&gt;**only**&lt;/span&gt; links from that file (or obvious &lt;span class="sb"&gt;`/docs/*.md`&lt;/span&gt; siblings) for depth.
&lt;span class="p"&gt;3.&lt;/span&gt; Prefer Markdown sources over scraping marketing HTML.
&lt;span class="p"&gt;4.&lt;/span&gt; If types exist locally (&lt;span class="sb"&gt;`node_modules`&lt;/span&gt;, stubs), use them &lt;span class="gs"&gt;**after**&lt;/span&gt; you know which API surface applies (avoids guessing wrong symbols).

&lt;span class="gu"&gt;## Scope&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Questions about &lt;span class="gs"&gt;**our**&lt;/span&gt; repo layout → use &lt;span class="sb"&gt;`repo-map`&lt;/span&gt; rule / codebase search, not llms.txt.
&lt;span class="p"&gt;-&lt;/span&gt; Questions about &lt;span class="gs"&gt;**their**&lt;/span&gt; API/version/docs → use the table above.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why a separate file: Cursor injects rules by context; a fat global rule file makes unrelated edits heavier. Split &lt;strong&gt;internal&lt;/strong&gt; vs &lt;strong&gt;external&lt;/strong&gt; pointers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent protocol (copy into the same file or AGENTS.md)
&lt;/h2&gt;

&lt;p&gt;Make the sequence explicit so the model does not default to “grep &lt;code&gt;node_modules&lt;/code&gt; for an hour.”&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## External SDK protocol&lt;/span&gt;

When the user asks for behavior that depends on an external library version or API:
&lt;span class="p"&gt;
1.&lt;/span&gt; Identify which dependency owns the feature (package.json / imports).
&lt;span class="p"&gt;2.&lt;/span&gt; If this file lists an llms.txt for that dependency, &lt;span class="gs"&gt;**fetch it before**&lt;/span&gt; writing code.
&lt;span class="p"&gt;3.&lt;/span&gt; Summarize in ≤10 lines: version assumptions, file names, and APIs you will use—then implement.
&lt;span class="p"&gt;4.&lt;/span&gt; Do not quote entire upstream pages back to the user; cite chapter/section or URL path only.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Concrete workflows
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Implement a feature (e.g. App Router auth middleware).&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User: “Add middleware-based auth with Next.js App Router.”&lt;/li&gt;
&lt;li&gt;Agent: fetch &lt;code&gt;https://nextjs.org/llms.txt&lt;/code&gt;, open the linked page that describes &lt;code&gt;middleware.ts&lt;/code&gt; / matcher patterns.&lt;/li&gt;
&lt;li&gt;Implement using &lt;strong&gt;current&lt;/strong&gt; filenames and signatures from that fetch—not memory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Debug “works on my machine” / deprecation.&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User: “Tailwind v4 class names stopped working after upgrade.”&lt;/li&gt;
&lt;li&gt;Agent: fetch Tailwind’s llms.txt first; confirm breaking-change notes and config file names, then open repo &lt;code&gt;tailwind.config.*&lt;/code&gt; / CSS entry.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SDK with tiered dumps (example pattern).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some sites expose a short index and a long bundle (names vary). Rule of thumb: &lt;strong&gt;start short&lt;/strong&gt;, upgrade to full only if the stub did not answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# hypothetical layout on a docs host
/llms.txt          → links + overview
/llms-small.txt    → minimal surface (cheap)
/llms-full.txt     → everything (expensive)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point your rules at the &lt;strong&gt;entry&lt;/strong&gt; (&lt;code&gt;llms.txt&lt;/code&gt;); let the fetched content tell the agent whether &lt;code&gt;*-full&lt;/code&gt; exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompts that reinforce good habits
&lt;/h2&gt;

&lt;p&gt;You can nudge behavior per task without editing rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Before editing: fetch Next.js llms.txt and confirm middleware filename and export shape.”&lt;/li&gt;
&lt;li&gt;“Use ADK llms.txt; don’t rely on training cutoff for API names.”&lt;/li&gt;
&lt;li&gt;“After fetching Tailwind llms.txt, list which doc URLs you used (paths only).”&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Minimal internal llms.txt (optional)
&lt;/h2&gt;

&lt;p&gt;If &lt;strong&gt;you&lt;/strong&gt; ship an internal library or architecture handbook on HTTPS, you can publish your own index at &lt;code&gt;https://internal-docs.example.com/llms.txt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Internal platform — LLM index&lt;/span&gt;

&lt;span class="gu"&gt;## Auth&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Overview: https://internal-docs.example.com/auth/overview.md
&lt;span class="p"&gt;-&lt;/span&gt; Breaking changes 2026: https://internal-docs.example.com/auth/changelog.md

&lt;span class="gu"&gt;## Data layer&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; API conventions: https://internal-docs.example.com/db/conventions.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add one line to &lt;code&gt;.cursor/rules/external-llms-docs.md&lt;/code&gt;: &lt;code&gt;Internal platform | https://internal-docs.example.com/llms.txt&lt;/code&gt;. Same mechanics as vendor docs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tooling reality check
&lt;/h2&gt;

&lt;p&gt;This pattern assumes the agent can &lt;strong&gt;retrieve HTTPS text&lt;/strong&gt; (built-in fetch, browser tool, MCP &lt;code&gt;fetch&lt;/code&gt;, etc.). Air-gapped machines need a fallback (mirror snippets in rules, local static server, or vendor tarball—but accept resident token cost).&lt;/p&gt;

&lt;p&gt;Do not put &lt;strong&gt;authenticated&lt;/strong&gt; URLs with secrets in rules; use public docs or internal SSO-aware tooling outside plain markdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anti-patterns
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Dumping full upstream Markdown into &lt;code&gt;.cursorrules&lt;/code&gt; “so the agent always knows.”&lt;/li&gt;
&lt;li&gt;Skipping llms.txt and crawling random marketing pages (noisy HTML, wasted tokens).&lt;/li&gt;
&lt;li&gt;Duplicating vendor docs under &lt;code&gt;docs/vendor/&lt;/code&gt; and indexing everything unless you truly need offline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  SEO note (short)
&lt;/h2&gt;

&lt;p&gt;Search-engine teams have questioned llms.txt as an SEO lever; that is largely orthogonal. &lt;strong&gt;For coding agents&lt;/strong&gt;, the win is predictable Markdown entrypoints and smaller always-on context—not rankings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Add &lt;code&gt;.cursor/rules/external-llms-docs.md&lt;/code&gt; with a &lt;strong&gt;table of llms.txt URLs&lt;/strong&gt; plus &lt;strong&gt;read order&lt;/strong&gt; and &lt;strong&gt;scope&lt;/strong&gt; (external vs internal repo map).&lt;/li&gt;
&lt;li&gt;Teach agents: &lt;strong&gt;fetch index → follow linked Markdown → then local types&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use tiered files &lt;strong&gt;shallow-first&lt;/strong&gt; when the provider offers them.&lt;/li&gt;
&lt;li&gt;Optionally host &lt;strong&gt;your own&lt;/strong&gt; llms.txt for internal platforms; still keep rules as pointers only.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>cursor</category>
      <category>llm</category>
      <category>documentation</category>
    </item>
    <item>
      <title>Cursor Composer 2: The Cache Economy Behind a 10x Cheaper Coding Agent</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Sat, 02 May 2026 12:53:01 +0000</pubDate>
      <link>https://dev.to/toyama0919/cursor-composer-2-the-cache-economy-behind-a-10x-cheaper-coding-agent-15cj</link>
      <guid>https://dev.to/toyama0919/cursor-composer-2-the-cache-economy-behind-a-10x-cheaper-coding-agent-15cj</guid>
      <description>&lt;p&gt;Cursor's Composer 2 shipped in March 2026 as the centerpiece of the Cursor 2.0 overhaul. The headline numbers—$0.50/1M input tokens, outperforming frontier models on SWE-bench Multilingual—look like marketing. The cache read mechanism is where the real story is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Specialized Model at All
&lt;/h2&gt;

&lt;p&gt;Prior Cursor versions proxied Claude or GPT-4. Composer 2 is trained exclusively on coding data via continued pre-training and reinforcement learning. The obvious question is: what's cut?&lt;/p&gt;

&lt;p&gt;Everything that isn't code. Composer 2 has no meaningful capability for poetry, history, ethics debates, or anything outside software development. That constraint lets Anysphere run a model that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understands intra-repo dependency graphs (if you fix A, B also needs updating)&lt;/li&gt;
&lt;li&gt;Navigates hundreds of files in a single long-horizon task&lt;/li&gt;
&lt;li&gt;Runs natively in sandboxed terminals and a built-in browser loop&lt;/li&gt;
&lt;li&gt;Costs a fraction of what a general-purpose frontier model costs to serve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pricing reflects this. As of May 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (1M tokens)&lt;/th&gt;
&lt;th&gt;Output (1M tokens)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2 Standard&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2 Fast&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;td&gt;$7.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 4.6 Opus&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.4&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Standard vs Fast: Same Weights, Different Queue
&lt;/h2&gt;

&lt;p&gt;Anysphere's own language is unambiguous: "Same intelligence." The two variants share identical model weights and parameters. Fast gets priority queue on high-end GPUs (H800/B200 class); Standard runs on lower-priority compute with higher latency tolerance.&lt;/p&gt;

&lt;p&gt;This is a deliberate architectural choice. Inference cost scales with compute priority, not model capability. If you can tolerate a 10–30 second response delay, you get the same output for 1/3 the price.&lt;/p&gt;

&lt;p&gt;The practical split that Cursor power users have settled on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Interactive sessions (Fast):&lt;/strong&gt; You're watching the output in real time. Latency kills flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fire-and-forget tasks (Standard):&lt;/strong&gt; Refactor 100 test files, generate JSDoc across the repo, migrate an entire API surface. Start it, close the laptop, come back to results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Cache Read Economy
&lt;/h2&gt;

&lt;p&gt;This is the mechanism that makes Standard compelling for large codebases.&lt;/p&gt;

&lt;p&gt;Every request to Composer 2 sends context: directory structure, recently opened files, conversation history. On the second, fifth, tenth turn of the same session, the majority of that context is identical to what was already sent. That's the cache.&lt;/p&gt;

&lt;p&gt;Cache read rates as of May 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;New input&lt;/th&gt;
&lt;th&gt;Cache read&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;$0.50/1M&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.20/1M&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;$1.50/1M&lt;/td&gt;
&lt;td&gt;$0.35/1M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By turn 5 of a non-trivial session, 80%+ of your input tokens are cache reads, not fresh input. Standard's cache read rate ($0.20) is 43% cheaper than Fast's ($0.35), and 60% cheaper than Standard's own new input rate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concrete impact:&lt;/strong&gt; A refactoring session with 10 back-and-forth turns on a large codebase might consume 10M tokens. With Standard and healthy cache hits, that lands around $1.50–$2.00. The same session on Fast: $4.00–$5.00. On Claude 4.6 Opus: potentially $20+.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cache Bug (March–April 2026)
&lt;/h2&gt;

&lt;p&gt;The cache story has a footnote worth documenting.&lt;/p&gt;

&lt;p&gt;From late March through early April 2026, a backend bug caused Composer 2 Standard to emit cache read counts of zero—every request treated as fresh input at $0.50/1M even when the context was identical to the previous turn. Users reported credit burn rates 10x higher than expected. The irony: switching to Fast (which costs 3x more per token) actually resulted in lower total cost because cache was functioning there.&lt;/p&gt;

&lt;p&gt;Cursor's team (Dean and Mohit on the forum thread) acknowledged the bug and pushed a fix around April 7. As of v2.1.116+, the behavior appears stable.&lt;/p&gt;

&lt;p&gt;The diagnostic check: open &lt;code&gt;cursor.com/settings&lt;/code&gt; → Usage. If &lt;code&gt;Cache Read&lt;/code&gt; tokens are consistently below 40% on a multi-turn session against the same codebase, something is wrong. Expected range is 40–90% depending on how varied your requests are.&lt;/p&gt;

&lt;p&gt;If you hit zero cache read consistently, copy the Request ID from the chat header and contact support. Cursor has been issuing credit refunds for the overbilling period.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparing with Claude Code's Cache
&lt;/h2&gt;

&lt;p&gt;Claude Code (Anthropic's CLI tool) has its own prompt caching via &lt;code&gt;cache_control&lt;/code&gt; markers, but with a key structural difference: TTL.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Write cost&lt;/th&gt;
&lt;th&gt;Read cost&lt;/th&gt;
&lt;th&gt;TTL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Default&lt;/td&gt;
&lt;td&gt;1.25× input&lt;/td&gt;
&lt;td&gt;~10% of input&lt;/td&gt;
&lt;td&gt;5 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ENABLE_PROMPT_CACHING_1H=1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2.0× input&lt;/td&gt;
&lt;td&gt;~10% of input&lt;/td&gt;
&lt;td&gt;1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 5-minute default is brutal for any session where you read documentation, test code, or think between turns. The 1-hour option (available since Claude Code v2.1.108) adds to the write cost but eliminates repeated cache misses across the kind of natural pauses that happen in real work.&lt;/p&gt;

&lt;p&gt;To enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.zshrc or ~/.bashrc&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ENABLE_PROMPT_CACHING_1H&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify with &lt;code&gt;usage&lt;/code&gt; output during a session—look for &lt;code&gt;ephemeral_1h_input_tokens&lt;/code&gt; in the log. If you only see &lt;code&gt;ephemeral_5m_&lt;/code&gt;, the variable isn't being picked up.&lt;/p&gt;

&lt;p&gt;Note: there were also TTL-related bugs in this period that forced resets to 5-minute behavior. Keep Claude Code at the latest version.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Usage Data
&lt;/h2&gt;

&lt;p&gt;I exported my own Cursor usage history and analyzed it. Here's what a month looks like across models (442 requests):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Requests&lt;/th&gt;
&lt;th&gt;Avg cost/request&lt;/th&gt;
&lt;th&gt;Cache read ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2 Standard&lt;/td&gt;
&lt;td&gt;73&lt;/td&gt;
&lt;td&gt;$0.19&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;88.3%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Composer 2 Fast&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;$0.32&lt;/td&gt;
&lt;td&gt;78.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 4.6 Sonnet&lt;/td&gt;
&lt;td&gt;212&lt;/td&gt;
&lt;td&gt;$0.37&lt;/td&gt;
&lt;td&gt;84.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude 4.6 Opus&lt;/td&gt;
&lt;td&gt;93&lt;/td&gt;
&lt;td&gt;$0.90&lt;/td&gt;
&lt;td&gt;79.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 88.3% cache read ratio on Standard is the headline. For an average request consuming ~390K tokens, 88% of those are cache reads at $0.20/1M rather than fresh input at $0.50/1M. Without that cache hit rate, the average cost per request would be ~$0.40 instead of $0.19.&lt;/p&gt;

&lt;p&gt;The top Opus requests peaked at $4.25/request (3.9M total tokens, 3.8M of which were cache reads). Even with excellent cache ratios, Opus's higher base rates mean the same cache-heavy session costs 4–5× more than Composer 2 Standard.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Decision
&lt;/h2&gt;

&lt;p&gt;Composer 2 is not "Claude but cheap." It's a purpose-built agent runtime that has traded general intelligence for deep coding capability and cost efficiency at the infrastructure level. The Standard/Fast split exists because long-horizon agentic tasks don't need millisecond response times—and charging for that latency premium on 10-turn refactoring sessions is wasteful.&lt;/p&gt;

&lt;p&gt;The model choice that makes sense given this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default to Standard&lt;/strong&gt; for any multi-file task where you'll have more than 3–4 turns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch to Fast&lt;/strong&gt; for interactive chat where you're watching output incrementally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use frontier models (Opus, Claude 4.7)&lt;/strong&gt; only when Composer 2 hits a genuine capability ceiling—complex algorithmic reasoning, architecture decisions that span non-code domains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cache makes Standard not just "slower Fast," but a qualitatively different operational mode: background processing with cost amortized over a long context window that grows cheaper the more you reuse it.&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>ai</category>
      <category>productivity</category>
      <category>codingtools</category>
    </item>
    <item>
      <title>Two Nasty Gotchas When Building Multi-Agent Systems with Google ADK</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Tue, 28 Apr 2026 09:30:43 +0000</pubDate>
      <link>https://dev.to/toyama0919/two-nasty-gotchas-when-building-multi-agent-systems-with-google-adk-3d05</link>
      <guid>https://dev.to/toyama0919/two-nasty-gotchas-when-building-multi-agent-systems-with-google-adk-3d05</guid>
      <description>&lt;p&gt;Google's Agent Development Kit (ADK) makes it straightforward to compose &lt;code&gt;LlmAgent&lt;/code&gt; instances into multi-agent hierarchies. But two bugs bit me hard in production that aren't documented anywhere. Here's what happened and how to fix them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;A root router &lt;code&gt;LlmAgent&lt;/code&gt; with two sub-agents. Both sub-agents are module-level singletons — instantiated at import time, referenced from the root agent's constructor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Agents/my_app/root_agent.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;Agents.my_app.sub_agent_a.agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sub_agent_a&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;Agents.my_app.sub_agent_b.agent&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sub_agent_b&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_sub_agents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sub_agent_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sub_agent_b&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;_build_sub_agents&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Worked fine locally with &lt;code&gt;adk web&lt;/code&gt;. Blew up on Cloud Run.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bug 1: &lt;code&gt;Agent already has a parent agent&lt;/code&gt; on module reload
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The error
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pydantic_core._pydantic_core.ValidationError: 1 validation error for LlmAgent
  Value error, Agent `SubAgentA` already has a parent agent,
  current parent: `my_app`, trying to add: `my_app`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What's happening
&lt;/h3&gt;

&lt;p&gt;ADK's &lt;code&gt;agent_loader&lt;/code&gt; calls &lt;code&gt;importlib.import_module(agent_name)&lt;/code&gt; on &lt;strong&gt;every request&lt;/strong&gt;. On the first request, it loads the module fresh and creates &lt;code&gt;root_agent&lt;/code&gt;. The &lt;code&gt;LlmAgent&lt;/code&gt; constructor sets &lt;code&gt;sub_agent.parent_agent = root_agent&lt;/code&gt; for each sub-agent.&lt;/p&gt;

&lt;p&gt;On the second request, &lt;code&gt;agent_loader&lt;/code&gt; reloads the module. Because &lt;code&gt;sub_agent_a&lt;/code&gt; and &lt;code&gt;sub_agent_b&lt;/code&gt; are module-level singletons, &lt;strong&gt;they're the same Python objects&lt;/strong&gt; from the previous load — still carrying their &lt;code&gt;parent_agent&lt;/code&gt; reference. When the new &lt;code&gt;LlmAgent&lt;/code&gt; tries to assign the parent again, pydantic's validator rejects it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Inside ADK's LlmAgent.__init__ (simplified)
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent_agent&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent `&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;` already has a parent agent ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This never surfaces locally because &lt;code&gt;adk web&lt;/code&gt; loads the module only once per session. Cloud Run's request-per-reload behavior is what triggers it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Reset &lt;code&gt;parent_agent&lt;/code&gt; to &lt;code&gt;None&lt;/code&gt; before passing sub-agents to the constructor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_sub_agents&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sub_agent_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sub_agent_b&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;  &lt;span class="c1"&gt;# reset before each reload
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is safe because the assignment happens synchronously before the new parent is set.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bug 2: &lt;code&gt;Context variable not found&lt;/code&gt; in instruction strings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The error
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;KeyError: 'Context variable not found: `hostname`.'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Traceback points here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;File&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.../google/adk/utils/instructions_utils.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="mi"&gt;124&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;inject_session_state&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;_async_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{+[^{}]*}+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_replace_match&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What's happening
&lt;/h3&gt;

&lt;p&gt;ADK injects session state into agent instructions at runtime. The mechanism scans the instruction string with the regex &lt;code&gt;r'{+[^{}]*}+'&lt;/code&gt; and replaces every &lt;code&gt;{var_name}&lt;/code&gt; with the corresponding session state value.&lt;/p&gt;

&lt;p&gt;If your instruction contains an example URL or any template-like text with curly braces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;The URL format is &lt;span class="sb"&gt;`https://{hostname}/api/{resource_id}/`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ADK sees &lt;code&gt;{hostname}&lt;/code&gt;, looks it up in session state, finds nothing, raises &lt;code&gt;KeyError&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My first instinct was to double-brace escape like Python's &lt;code&gt;.format()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;https://{{hostname}}/api/{{resource_id}}/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;This does not work.&lt;/strong&gt; The regex is &lt;code&gt;{+[^{}]*}+&lt;/code&gt; — it matches one or more &lt;code&gt;{&lt;/code&gt; characters followed by non-brace characters followed by one or more &lt;code&gt;}&lt;/code&gt; characters. &lt;code&gt;{{hostname}}&lt;/code&gt; still matches.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;

&lt;p&gt;Don't use curly braces for literal placeholder text in instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;The URL format is &lt;span class="sb"&gt;`https://&amp;lt;hostname&amp;gt;/api/&amp;lt;resource_id&amp;gt;/`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More broadly: &lt;strong&gt;any &lt;code&gt;{word}&lt;/code&gt; pattern in an ADK instruction string is treated as a session state variable&lt;/strong&gt;, regardless of how many braces you use. Use angle brackets, square brackets, or prose for template-like text in prompts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bug&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;parent_agent&lt;/code&gt; collision&lt;/td&gt;
&lt;td&gt;Module-level singleton sub-agents + ADK module reload per request&lt;/td&gt;
&lt;td&gt;Reset &lt;code&gt;agent.parent_agent = None&lt;/code&gt; before passing to constructor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Context variable not found&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;{word}&lt;/code&gt; patterns in instruction strings&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;&amp;lt;word&amp;gt;&lt;/code&gt; or square brackets instead&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both are easy to fix once you know what's happening, but the error messages don't immediately point to the root cause. The &lt;code&gt;parent_agent&lt;/code&gt; one is especially sneaky — it only appears in production where the module is reloaded per request, never in &lt;code&gt;adk web&lt;/code&gt; during local development.&lt;/p&gt;

</description>
      <category>googleadk</category>
      <category>llm</category>
      <category>python</category>
      <category>multiagent</category>
    </item>
    <item>
      <title>Managing AI Agent Skills with `npx skills`: A Practical Guide</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Sat, 11 Apr 2026 08:04:45 +0000</pubDate>
      <link>https://dev.to/toyama0919/managing-ai-agent-skills-with-npx-skills-a-practical-guide-2an8</link>
      <guid>https://dev.to/toyama0919/managing-ai-agent-skills-with-npx-skills-a-practical-guide-2an8</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;AI agents like Claude Code, Cursor, and GitHub Copilot don't inherently know how to use every tool in your stack. You need a way to teach them. That's what &lt;code&gt;npx skills&lt;/code&gt; does — it's a package manager for AI agent behaviors, built by Vercel Labs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add microsoft/playwright-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command fetches a &lt;code&gt;SKILL.md&lt;/code&gt; from the specified GitHub repository and installs it into your agent's config directory (&lt;code&gt;.agents/skills/&lt;/code&gt; or &lt;code&gt;.claude/skills/&lt;/code&gt; depending on the agent).&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GitHub as the Registry
&lt;/h3&gt;

&lt;p&gt;Unlike npm which uses npmjs.com, &lt;code&gt;skills&lt;/code&gt; uses GitHub as its registry. The &lt;code&gt;microsoft/playwright-cli&lt;/code&gt; argument maps directly to &lt;code&gt;https://github.com/microsoft/playwright-cli&lt;/code&gt;. Any public GitHub repo with a &lt;code&gt;SKILL.md&lt;/code&gt; at root is a valid skill source.&lt;/p&gt;

&lt;p&gt;You can also install by full URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add https://github.com/microsoft/playwright-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SKILL.md as the Package Entry Point
&lt;/h3&gt;

&lt;p&gt;Each skill repo contains a &lt;code&gt;SKILL.md&lt;/code&gt; — the equivalent of &lt;code&gt;index.js&lt;/code&gt; in an npm package. It contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata&lt;/strong&gt;: name and description of the skill&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool definitions&lt;/strong&gt;: commands the AI can invoke (e.g. &lt;code&gt;playwright test&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt instructions&lt;/strong&gt;: when and how the AI should use the tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  .skills.json + skills-lock.json = package.json + package-lock.json
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;npm&lt;/th&gt;
&lt;th&gt;skills CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Dependency manifest&lt;/td&gt;
&lt;td&gt;&lt;code&gt;package.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.skills.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lock file&lt;/td&gt;
&lt;td&gt;&lt;code&gt;package-lock.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;skills-lock.json&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Install directory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;node_modules/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.agents/skills/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Registry&lt;/td&gt;
&lt;td&gt;npmjs.com&lt;/td&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Install command&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npm install&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npx skills experimental_install&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;After &lt;code&gt;npx skills add&lt;/code&gt;, your &lt;code&gt;.skills.json&lt;/code&gt; will look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"skills"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"playwright-cli"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"remote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"microsoft/playwright-cli"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"latest"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add a skill&lt;/span&gt;
npx skills add vercel-labs/agent-skills

&lt;span class="c"&gt;# Add globally (user-level, not project-level)&lt;/span&gt;
npx skills add vercel-labs/agent-skills &lt;span class="nt"&gt;-g&lt;/span&gt;

&lt;span class="c"&gt;# Target specific agents&lt;/span&gt;
npx skills add vercel-labs/agent-skills &lt;span class="nt"&gt;--agent&lt;/span&gt; claude-code cursor

&lt;span class="c"&gt;# List installed skills&lt;/span&gt;
npx skills list
npx skills &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt;           &lt;span class="c"&gt;# global skills&lt;/span&gt;
npx skills &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-a&lt;/span&gt; cursor    &lt;span class="c"&gt;# filter by agent&lt;/span&gt;

&lt;span class="c"&gt;# Search the registry&lt;/span&gt;
npx skills find typescript

&lt;span class="c"&gt;# Update all skills&lt;/span&gt;
npx skills update

&lt;span class="c"&gt;# Restore from lock file (equivalent of npm ci)&lt;/span&gt;
npx skills experimental_install

&lt;span class="c"&gt;# Sync from node_modules to agent directories&lt;/span&gt;
npx skills experimental_sync

&lt;span class="c"&gt;# Scaffold a new skill&lt;/span&gt;
npx skills init my-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;remove&lt;/code&gt; Doesn't Update the Lock File
&lt;/h3&gt;

&lt;p&gt;This is the biggest footgun:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills &lt;span class="nb"&gt;rm &lt;/span&gt;microsoft/playwright-cli &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This removes the skill files from your agent directories, but &lt;strong&gt;leaves the entry in &lt;code&gt;skills-lock.json&lt;/code&gt;&lt;/strong&gt;. The next time someone runs &lt;code&gt;experimental_install&lt;/code&gt;, the skill comes back.&lt;/p&gt;

&lt;p&gt;Workaround:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;npx skills remove&lt;/code&gt; as usual&lt;/li&gt;
&lt;li&gt;Manually edit &lt;code&gt;.skills.json&lt;/code&gt; to remove the entry&lt;/li&gt;
&lt;li&gt;Delete &lt;code&gt;skills-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx skills update&lt;/code&gt; or &lt;code&gt;add&lt;/code&gt; remaining skills to regenerate a clean lock file&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;experimental_&lt;/code&gt; Prefix is Real
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;experimental_install&lt;/code&gt; and &lt;code&gt;experimental_sync&lt;/code&gt; are genuinely experimental. The &lt;code&gt;sync&lt;/code&gt; command in the current version is not &lt;code&gt;npx skills sync&lt;/code&gt; — it's &lt;code&gt;npx skills experimental_install&lt;/code&gt; to restore from lock file, and &lt;code&gt;npx skills experimental_sync&lt;/code&gt; to sync from node_modules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cache Behavior with npx
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;npx skills&lt;/code&gt; may run a cached older version. Force latest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills@latest add &amp;lt;repo&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For projects where everyone needs the same CLI version, add it as a devDependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CI/CD Integration
&lt;/h2&gt;

&lt;p&gt;Add to your CI setup to restore skills on each run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restore AI agent skills&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npx skills experimental_install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures every developer and CI environment uses exactly the same skill versions as defined in &lt;code&gt;skills-lock.json&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Your Own Skill
&lt;/h2&gt;

&lt;p&gt;Any GitHub repo with a &lt;code&gt;SKILL.md&lt;/code&gt; is installable. Create one with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills init my-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This scaffolds a &lt;code&gt;SKILL.md&lt;/code&gt; that you push to GitHub. Anyone can then install it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add yourusername/my-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Browse existing skills at &lt;a href="https://skills.sh/" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;npx skills&lt;/code&gt; is npm for AI agent capabilities. The mental model maps cleanly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; = &lt;code&gt;index.js&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.skills.json&lt;/code&gt; = &lt;code&gt;package.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;skills-lock.json&lt;/code&gt; = &lt;code&gt;package-lock.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;experimental_install&lt;/code&gt; = &lt;code&gt;npm ci&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;GitHub = npm registry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tooling is still experimental — particularly the lock file management on remove — but it's already useful for ensuring consistent AI behavior across team environments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>cli</category>
      <category>agents</category>
    </item>
    <item>
      <title>Deploying a Google ADK Agent to Vertex AI Agent Engine with Terraform</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Mon, 30 Mar 2026 12:45:33 +0000</pubDate>
      <link>https://dev.to/toyama0919/deploying-a-google-adk-agent-to-vertex-ai-agent-engine-with-terraform-83b</link>
      <guid>https://dev.to/toyama0919/deploying-a-google-adk-agent-to-vertex-ai-agent-engine-with-terraform-83b</guid>
      <description>&lt;p&gt;Most documentation for Vertex AI Agent Engine focuses on the Python SDK (&lt;code&gt;vertexai.agent_engines.create&lt;/code&gt;). That works fine for one-off deployments, but if you want your agent infrastructure managed declaratively alongside the rest of your GCP resources, Terraform is the right tool.&lt;/p&gt;

&lt;p&gt;This post walks through a complete Terraform setup for deploying a Google ADK agent to Vertex AI Agent Engine using &lt;code&gt;google_vertex_ai_reasoning_engine&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Terraform &amp;gt;= 1.5&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;google&lt;/code&gt; or &lt;code&gt;google-beta&lt;/code&gt; provider&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;aiplatform.googleapis.com&lt;/code&gt; enabled&lt;/li&gt;
&lt;li&gt;A Google ADK agent wrapped in &lt;code&gt;AdkApp&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Agent Engine Deployment Works
&lt;/h2&gt;

&lt;p&gt;The deployment model is straightforward: &lt;strong&gt;tar.gz your source code, base64-encode it, and pass it to the API via &lt;code&gt;inline_source&lt;/code&gt;&lt;/strong&gt;. The runtime handles dependency installation, session management, and streaming — you just provide the entrypoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Entrypoint
&lt;/h2&gt;

&lt;p&gt;The key requirement is an &lt;code&gt;AdkApp&lt;/code&gt; instance at the module level. This is what Terraform's &lt;code&gt;entrypoint_object&lt;/code&gt; points to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# src/myagent/agent.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai.agent_engines&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AdkApp&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Returns weather for a given city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;city&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weather_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answers weather questions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Answer the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s weather question using the get_weather tool.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# This is what entrypoint_object references
&lt;/span&gt;&lt;span class="n"&gt;agent_engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AdkApp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wrapping in &lt;code&gt;AdkApp&lt;/code&gt; automatically exposes &lt;code&gt;create_session&lt;/code&gt;, &lt;code&gt;stream_query&lt;/code&gt;, and other ADK methods as callable endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Source Archive
&lt;/h2&gt;

&lt;p&gt;Agent Engine expects a base64-encoded tar.gz containing your source files and a &lt;code&gt;requirements.txt&lt;/code&gt;. Here's a minimal build script that uses only the Python standard library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# scripts/build_source.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tarfile&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;project_root&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;project_root&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;src_dir&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project_root&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;requirements&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project_root&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requirements.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;buf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tarfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fileobj&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w:gz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;src_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rglob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
            &lt;span class="n"&gt;arcname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;relative_to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_root&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;tar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;arcname&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;requirements.txt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;b64&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;buf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getvalue&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dump&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base64&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;b64&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;requirements.txt&lt;/code&gt; lists PyPI package names — the Agent Engine runtime installs them at deploy time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;google-adk&amp;gt;=1.0.0&lt;/span&gt;
&lt;span class="err"&gt;google-cloud-aiplatform[agent_engines]&amp;gt;=1.93.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Terraform Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;provider.tf&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;google&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/google"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;google-beta&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"hashicorp/google-beta"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"~&amp;gt; 6.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"google-beta"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;project&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;project_id&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"asia-northeast1"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Wiring the Archive Build
&lt;/h3&gt;

&lt;p&gt;Use the &lt;code&gt;external&lt;/code&gt; data source to invoke the build script during &lt;code&gt;terraform plan&lt;/code&gt;/&lt;code&gt;apply&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"external"&lt;/span&gt; &lt;span class="s2"&gt;"agent_source"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;program&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"python3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"${path.module}/scripts/build_source.py"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

  &lt;span class="nx"&gt;query&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;project_root&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"${path.module}/../.."&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every &lt;code&gt;apply&lt;/code&gt; picks up the latest source automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Agent Engine Resource
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_vertex_ai_reasoning_engine"&lt;/span&gt; &lt;span class="s2"&gt;"my_agent"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;provider&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google-beta&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-agent-${var.env}"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ADK weather agent"&lt;/span&gt;
  &lt;span class="nx"&gt;region&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"asia-northeast1"&lt;/span&gt;

  &lt;span class="nx"&gt;spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;agent_framework&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"google-adk"&lt;/span&gt;
    &lt;span class="nx"&gt;service_account&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service_account_email&lt;/span&gt;

    &lt;span class="nx"&gt;class_methods&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"create_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"get_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"list_sessions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"delete_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"stream_query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"stream"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="nx"&gt;source_code_spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;inline_source&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;source_archive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;external&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent_source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;base64&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;python_spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;entrypoint_module&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"src.myagent.agent"&lt;/span&gt;
        &lt;span class="nx"&gt;entrypoint_object&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"agent_engine"&lt;/span&gt;
        &lt;span class="nx"&gt;requirements_file&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"requirements.txt"&lt;/span&gt;
        &lt;span class="nx"&gt;version&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"3.12"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;deployment_spec&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"LOG_LEVEL"&lt;/span&gt;
        &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"INFO"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;secret_env&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"API_TOKEN"&lt;/span&gt;
        &lt;span class="nx"&gt;secret_ref&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;secret&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"my-api-token"&lt;/span&gt;
          &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"latest"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Block Reference
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Block&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agent_framework&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Must be &lt;code&gt;"google-adk"&lt;/code&gt; — tells the runtime which framework to use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;class_methods&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enumerates callable methods; &lt;code&gt;api_mode = "stream"&lt;/code&gt; enables SSE&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inline_source&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Embeds the base64 tar.gz directly — no GCS bucket needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;python_spec&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Specifies the entrypoint and Python version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;deployment_spec&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Injects env vars and Secret Manager secrets at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;class_methods&lt;/code&gt; in Detail
&lt;/h3&gt;

&lt;p&gt;Every ADK method you want to expose must be declared explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;class_methods&lt;/span&gt; &lt;span class="err"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="err"&gt;(&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"create_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"get_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"list_sessions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"delete_session"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"stream_query"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="nx"&gt;api_mode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"stream"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;api_mode = "stream"&lt;/code&gt; makes the method return a Server-Sent Events stream. Only &lt;code&gt;stream_query&lt;/code&gt; needs this — the rest are standard request/response.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Provider choice matters.&lt;/strong&gt; &lt;code&gt;google_vertex_ai_reasoning_engine&lt;/code&gt; is available in both &lt;code&gt;google&lt;/code&gt; and &lt;code&gt;google-beta&lt;/code&gt; providers. Make sure the &lt;code&gt;provider&lt;/code&gt; attribute matches whichever you configure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The &lt;code&gt;external&lt;/code&gt; data source re-runs on every plan.&lt;/strong&gt; This is by design — you always get the latest source. If your build script is slow, consider caching or only calling it on &lt;code&gt;apply&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;entrypoint_module&lt;/code&gt; uses dot notation, not file paths.&lt;/strong&gt; &lt;code&gt;src.myagent.agent&lt;/code&gt; maps to &lt;code&gt;src/myagent/agent.py&lt;/code&gt; in the archive. Match this to your actual directory structure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secret Manager secrets must already exist.&lt;/strong&gt; Terraform reads existing secrets via &lt;code&gt;data&lt;/code&gt; sources — it doesn't create them. Provision secrets separately before running &lt;code&gt;apply&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy and Verify
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform init
terraform plan
terraform apply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Export the resource name to call the agent from Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="s2"&gt;"agent_engine_resource_name"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google_vertex_ai_reasoning_engine&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;my_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;vertexai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vertexai.agent_engines&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AdkApp&lt;/span&gt;

&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asia-northeast1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AdkApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_resource_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/my-project/locations/asia-northeast1/reasoningEngines/&amp;lt;ID&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Tokyo?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;google_vertex_ai_reasoning_engine&lt;/code&gt; is available in both &lt;code&gt;google&lt;/code&gt; and &lt;code&gt;google-beta&lt;/code&gt; providers&lt;/li&gt;
&lt;li&gt;Source code is delivered as a base64 tar.gz via &lt;code&gt;inline_source&lt;/code&gt; — no GCS required&lt;/li&gt;
&lt;li&gt;A minimal Python script using only stdlib is enough to produce the archive&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;class_methods&lt;/code&gt; must explicitly enumerate every ADK method you want to expose&lt;/li&gt;
&lt;li&gt;Secret Manager integration is declarative via &lt;code&gt;secret_env&lt;/code&gt; blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Terraforming Agent Engine makes cleanup (&lt;code&gt;terraform destroy&lt;/code&gt;), environment promotion, and drift detection straightforward. The ADK + Terraform combination has sparse documentation, so hopefully this fills the gap.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>gcp</category>
      <category>googlecloud</category>
      <category>ai</category>
    </item>
    <item>
      <title>AI-Driven Chrome Extension Development with WXT and Chrome DevTools MCP</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Sun, 29 Mar 2026 12:21:39 +0000</pubDate>
      <link>https://dev.to/toyama0919/ai-driven-chrome-extension-development-with-wxt-and-chrome-devtools-mcp-4109</link>
      <guid>https://dev.to/toyama0919/ai-driven-chrome-extension-development-with-wxt-and-chrome-devtools-mcp-4109</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Building a Chrome extension that modifies a third-party web app is a unique challenge. The DOM structure is opaque, class names are minified and change between deployments, and there's no official API to hook into. Traditional extension development looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Inspect the DOM manually in DevTools&lt;/li&gt;
&lt;li&gt;Write selectors and content scripts&lt;/li&gt;
&lt;li&gt;Reload the extension&lt;/li&gt;
&lt;li&gt;Check if it works&lt;/li&gt;
&lt;li&gt;Repeat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This cycle is slow. I wanted an AI coding agent that could &lt;strong&gt;see the actual browser state&lt;/strong&gt; and &lt;strong&gt;verify its own changes&lt;/strong&gt; — not just generate code blindly.&lt;/p&gt;

&lt;p&gt;That's how I arrived at this stack: &lt;strong&gt;WXT&lt;/strong&gt; for the extension framework, &lt;strong&gt;Chrome DevTools MCP&lt;/strong&gt; for giving the AI agent browser access, and &lt;strong&gt;Cursor&lt;/strong&gt; as the IDE tying it all together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://wxt.dev" rel="noopener noreferrer"&gt;WXT&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Chrome extension framework (TypeScript, hot reload, Manifest V3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/nichochar/chrome-devtools-mcp" rel="noopener noreferrer"&gt;Chrome DevTools MCP&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;MCP server that exposes Chrome DevTools Protocol to AI agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://cursor.sh" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;AI-powered IDE with native MCP support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Step 1: WXT with a Fixed CDP Port
&lt;/h2&gt;

&lt;p&gt;WXT is a framework that wraps Chrome extension development with file-based routing, hot reload, and TypeScript support out of the box. The key insight is that WXT's runner can launch Chrome with &lt;strong&gt;custom Chromium args&lt;/strong&gt; — including &lt;code&gt;--remote-debugging-port&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// wxt.config.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wxt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;defineConfig&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;My Extension&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;storage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;extensionApi&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chrome&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;chromiumArgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--remote-debugging-port=9222&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;`--user-data-dir=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;/.chrome-debug-profile`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--exclude-switches=enable-automation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;startUrls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://example.com&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to note:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--remote-debugging-port=9222&lt;/code&gt;&lt;/strong&gt; — Exposes the Chrome DevTools Protocol on a fixed port. The MCP server connects here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--user-data-dir&lt;/code&gt;&lt;/strong&gt; — A dedicated profile directory, separate from your daily Chrome. Login sessions persist across dev restarts. &lt;strong&gt;Add this to &lt;code&gt;.gitignore&lt;/code&gt;&lt;/strong&gt; — it contains cookies and session tokens that must not be pushed to a repository.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--exclude-switches=enable-automation&lt;/code&gt;&lt;/strong&gt; — Without this, some sites detect the "automated" browser and block sign-in.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;When you run &lt;code&gt;wxt&lt;/code&gt; (or &lt;code&gt;npm run dev&lt;/code&gt;), WXT launches Chrome with these args, loads your extension, and watches for file changes — all in one command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Chrome DevTools MCP Configuration
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) lets AI agents call external tools. Chrome DevTools MCP is an MCP server that wraps the Chrome DevTools Protocol — giving your AI agent the ability to navigate pages, evaluate JavaScript, take screenshots, and inspect the DOM.&lt;/p&gt;

&lt;p&gt;Configuration lives in &lt;code&gt;.cursor/mcp.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"chrome-devtools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"chrome-devtools-mcp@latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--browserUrl=http://127.0.0.1:9222"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. When Cursor starts, it spins up the MCP server, which connects to Chrome on port 9222. The AI agent can now see and interact with your browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: The Dev Script (Optional but Recommended)
&lt;/h2&gt;

&lt;p&gt;A small shell script wrapping the two main workflows keeps things ergonomic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./dev.sh dev      &lt;span class="c"&gt;# WXT dev server + Chrome (hot reload + MCP)&lt;/span&gt;
./dev.sh start    &lt;span class="c"&gt;# Load built extension (no hot reload, MCP only)&lt;/span&gt;
./dev.sh stop     &lt;span class="c"&gt;# Kill debug Chrome&lt;/span&gt;
./dev.sh status   &lt;span class="c"&gt;# Check CDP connection&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;dev&lt;/code&gt; is the primary mode — WXT handles everything and you get hot reload. &lt;code&gt;start&lt;/code&gt; is for testing production builds with MCP inspection. The script handles edge cases like port conflicts, PID management, and connection verification via &lt;code&gt;curl http://localhost:9222/json/version&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Workflow in Practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Start the environment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev   &lt;span class="c"&gt;# or ./dev.sh dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chrome opens automatically with the extension loaded. WXT watches for file changes. The MCP server connects to port 9222. Cursor's AI agent can now see the browser.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. AI inspects current state
&lt;/h3&gt;

&lt;p&gt;The AI agent evaluates JavaScript in the browser context to understand the current DOM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The agent runs this via MCP&lt;/span&gt;
&lt;span class="nf"&gt;evaluate_script&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`() =&amp;gt; ({
    injectedStyle: !!document.getElementById('my-extension-styles'),
    buttonCount: document.querySelectorAll('.my-custom-button').length,
    panelVisible: !!document.getElementById('my-panel'),
  })`&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent can &lt;strong&gt;verify its own changes without you switching context&lt;/strong&gt;. It writes code, WXT hot-reloads, and the agent checks if the DOM updated correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. AI verifies changes through the browser
&lt;/h3&gt;

&lt;p&gt;Here's the key difference from normal AI-assisted coding. Instead of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I've added the panel. Please refresh and check if it works."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI does this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I've added the panel. Let me verify... [evaluates script via MCP] ... The &lt;code&gt;#my-panel&lt;/code&gt; element exists, has 5 child entries, and is positioned correctly. Rendering looks good."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Text-based DOM verification is preferred over screenshots — it's faster, cheaper, and more precise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Good: structured verification&lt;/span&gt;
&lt;span class="nf"&gt;evaluate_script&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buttons&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelectorAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.my-button&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;buttons&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;firstButton&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;buttons&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;outerHTML&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;substring&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Screenshots only when you need visual layout confirmation&lt;/span&gt;
&lt;span class="nf"&gt;take_screenshot&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tips and Gotchas
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Content Script Isolation
&lt;/h3&gt;

&lt;p&gt;Chrome extension content scripts run in an &lt;strong&gt;isolated world&lt;/strong&gt;. Variables set on &lt;code&gt;window&lt;/code&gt; in the content script are invisible to &lt;code&gt;evaluate_script&lt;/code&gt; via MCP, because MCP evaluates in the &lt;strong&gt;page context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The workaround: verify through DOM side effects, not global variables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Won't work: window globals are in the isolated world&lt;/span&gt;
&lt;span class="nf"&gt;evaluate_script&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;myExtensionState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// → undefined&lt;/span&gt;

&lt;span class="c1"&gt;// Works: check the DOM changes the extension made&lt;/span&gt;
&lt;span class="nf"&gt;evaluate_script&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;styleInjected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-extension-styles&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;panelExists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-panel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Selector Strategy for Third-Party UIs
&lt;/h3&gt;

&lt;p&gt;When building extensions for sites you don't control, selectors break frequently. A fallback chain helps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[aria-label*="Submit"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[data-test-id="submit"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
  &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.submit-btn&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Priority:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ARIA attributes&lt;/strong&gt; (&lt;code&gt;aria-label&lt;/code&gt;, &lt;code&gt;role&lt;/code&gt;) — most stable across updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic attributes&lt;/strong&gt; (&lt;code&gt;data-test-id&lt;/code&gt;) — moderately stable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Class names&lt;/strong&gt; — last resort, always provide as fallback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can even build a DOM analyzer shortcut (&lt;code&gt;Ctrl+Shift+D&lt;/code&gt;) that exports the page structure in a format the AI agent can consume. When selectors break, press the shortcut, paste the output into Cursor, and the agent updates the fallback selectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Async DOM Waiting
&lt;/h3&gt;

&lt;p&gt;SPA elements appear asynchronously. Rather than fragile &lt;code&gt;setTimeout&lt;/code&gt; chains, use polling with bounded retries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;retries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;interval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setInterval&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;querySelector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;selector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;retries&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;clearInterval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the element never appears, fail silently — no console spam.&lt;/p&gt;

&lt;h3&gt;
  
  
  Google Login Gotcha
&lt;/h3&gt;

&lt;p&gt;When Chrome launches with &lt;code&gt;--remote-debugging-port&lt;/code&gt;, Google sometimes detects it as an "unsafe browser" and blocks sign-in. The &lt;code&gt;--exclude-switches=enable-automation&lt;/code&gt; flag helps, but if it's not enough:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Launch Chrome with the dedicated profile (without WXT)&lt;/li&gt;
&lt;li&gt;Sign in manually&lt;/li&gt;
&lt;li&gt;Close Chrome&lt;/li&gt;
&lt;li&gt;Now run &lt;code&gt;npm run dev&lt;/code&gt; — WXT reuses the same profile with the valid session&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The dedicated &lt;code&gt;--user-data-dir&lt;/code&gt; persists your login across dev sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;--user-data-dir&lt;/code&gt; and Security
&lt;/h3&gt;

&lt;p&gt;The dedicated profile serves two purposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation from your daily Chrome&lt;/strong&gt;: The dev browser doesn't touch your bookmarks, extensions, or sessions — and your personal credentials don't leak into the dev environment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal credentials&lt;/strong&gt;: Only log into what you need for development. Don't sign into personal Gmail or other unrelated accounts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Keep in mind:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Always add the profile directory to &lt;code&gt;.gitignore&lt;/code&gt;&lt;/strong&gt;. It contains cookies, session tokens, and LocalStorage.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.chrome-debug-profile/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CDP port 9222 is accessible from localhost&lt;/strong&gt;. &lt;code&gt;--remote-debugging-port&lt;/code&gt; binds to &lt;code&gt;127.0.0.1&lt;/code&gt; by default, but any process on your machine can access all open tabs. Only run it during active development.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't use this on shared machines&lt;/strong&gt;. While CDP is open, anyone on the same machine can control the browser session.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;If you want to try this workflow:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a WXT project
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm create wxt@latest my-extension
&lt;span class="nb"&gt;cd &lt;/span&gt;my-extension
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Add CDP port to &lt;code&gt;wxt.config.ts&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;chromiumArgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--remote-debugging-port=9222&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;`--user-data-dir=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;/.chrome-debug-profile`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--exclude-switches=enable-automation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Create &lt;code&gt;.cursor/mcp.json&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"chrome-devtools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chrome-devtools-mcp@latest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--browserUrl=http://127.0.0.1:9222"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Run
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chrome launches with CDP enabled, Cursor's agent connects. Your AI agent can now see your browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Workflow Matters
&lt;/h2&gt;

&lt;p&gt;The traditional Chrome extension development loop is &lt;strong&gt;write → reload → manually check → repeat&lt;/strong&gt;. With WXT + Chrome DevTools MCP, it becomes &lt;strong&gt;write → auto-reload → AI verifies → iterate&lt;/strong&gt; — and the AI agent can do the first and last steps too.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Debugging&lt;/strong&gt; goes from "read console logs, set breakpoints, manually reproduce" to "AI evaluates scripts in the live browser and reports what's happening."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Selector maintenance&lt;/strong&gt; goes from "open DevTools, inspect element, copy selector, paste into code" to "AI reads the DOM and updates fallback selectors."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature development&lt;/strong&gt; goes from "code blind, test manually" to "AI writes code, checks DOM state, fixes issues — all in one turn."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This doesn't replace understanding your own extension. But it dramatically shortens the feedback loop, especially for the tedious parts of third-party DOM manipulation.&lt;/p&gt;

</description>
      <category>chromeextension</category>
      <category>ai</category>
      <category>webdev</category>
      <category>cursor</category>
    </item>
    <item>
      <title>BigQuery Global Queries: Join Data Across Regions Without ETL</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Sun, 22 Mar 2026 12:38:06 +0000</pubDate>
      <link>https://dev.to/toyama0919/bigquery-global-queries-join-data-across-regions-without-etl-1ho1</link>
      <guid>https://dev.to/toyama0919/bigquery-global-queries-join-data-across-regions-without-etl-1ho1</guid>
      <description>&lt;p&gt;As of February 2026, Google released &lt;strong&gt;BigQuery Global Queries&lt;/strong&gt; in Preview. It lets you join tables from completely different geographic regions — say, &lt;code&gt;asia-northeast1&lt;/code&gt; (Tokyo) and &lt;code&gt;us-central1&lt;/code&gt; (Iowa) — in a &lt;strong&gt;single SQL statement&lt;/strong&gt;. No ETL, no data movement pipelines, no manual copying.&lt;/p&gt;

&lt;p&gt;This post covers how it actually works under the hood, what it costs, and the gotchas you need to know before using it in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Old Problem
&lt;/h2&gt;

&lt;p&gt;BigQuery historically required all datasets referenced in a single query to live in the same location. If your sales data was in Tokyo and your user master was in the US, you had two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy one dataset to the other region (ETL pipeline, operational overhead).&lt;/li&gt;
&lt;li&gt;Run two separate queries and join the results in application code.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Global Queries eliminates this constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works: 4-Stage Execution
&lt;/h2&gt;

&lt;p&gt;When you run a global query, BigQuery orchestrates the execution across regions transparently:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Distributed Execution
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Query Optimizer&lt;/strong&gt; analyzes the query, identifies which tables live in which regions, and assigns the querying region as the &lt;strong&gt;Primary Region&lt;/strong&gt; (the "leader"). Workers in each remote region receive their execution assignments in parallel.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Data Pushdown
&lt;/h3&gt;

&lt;p&gt;This is the most critical stage — and the one that makes global queries economically viable.&lt;/p&gt;

&lt;p&gt;Before any data crosses the network, BigQuery applies three types of pushdown to minimize transfer size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Predicate Pushdown&lt;/strong&gt;: &lt;code&gt;WHERE&lt;/code&gt; clause filters run &lt;em&gt;in the remote region&lt;/em&gt;, before the data moves. A 100M-row table filtered to 100 rows transfers 100 rows — not 100M.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Projection Pushdown&lt;/strong&gt;: Only the columns named in &lt;code&gt;SELECT&lt;/code&gt; are read from remote storage. BigQuery's columnar storage (Capacitor) makes this efficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregation Pushdown&lt;/strong&gt;: &lt;code&gt;GROUP BY&lt;/code&gt;/&lt;code&gt;SUM&lt;/code&gt;/&lt;code&gt;COUNT&lt;/code&gt; operations run as partial aggregations in the remote region. A billion-row transaction table can be summarized to 365 rows (daily totals) before transfer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Data Transfer
&lt;/h3&gt;

&lt;p&gt;Filtered, minimized results travel over Google's internal network to the Primary Region, where they're stored in &lt;strong&gt;temporary internal tables&lt;/strong&gt; for up to 8 hours. This is where cross-region &lt;strong&gt;egress charges&lt;/strong&gt; are incurred.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Final Join
&lt;/h3&gt;

&lt;p&gt;The Primary Region merges local data with the temporary remote data, as if everything were in one place. The query result returned to the user looks like any normal BigQuery result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Executed from asia-northeast1 (Tokyo)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_global_sales&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.japan_dataset.sales`&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;   &lt;span class="c1"&gt;-- local&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="nv"&gt;`project.us_dataset.sales`&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;      &lt;span class="c1"&gt;-- remote (auto-transferred)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;t1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;               &lt;span class="c1"&gt;-- pushed down to both regions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  IAM Permissions
&lt;/h2&gt;

&lt;p&gt;Global Queries require two layers of setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project-level opt-in (admin task)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Enable execution from the primary region&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="n"&gt;PROJECT&lt;/span&gt; &lt;span class="nv"&gt;`your-project-id`&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;OPTIONS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;`region-asia-northeast1.enable_global_queries_execution`&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Enable data access from the remote region&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="n"&gt;PROJECT&lt;/span&gt; &lt;span class="nv"&gt;`your-project-id`&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="k"&gt;OPTIONS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nv"&gt;`region-us-central1.enable_global_queries_data_access`&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  User-level permissions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;bigquery.jobs.createGlobalQuery&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required to initiate a global query. Currently only included in &lt;code&gt;roles/bigquery.admin&lt;/code&gt; — create a custom role for regular users.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;roles/bigquery.dataViewer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required on every dataset being referenced, in every region.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Cost Structure
&lt;/h2&gt;

&lt;p&gt;Global queries have &lt;strong&gt;three billing components&lt;/strong&gt; instead of the usual one:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;th&gt;Approximate Price (2026)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bytes scanned across all regions&lt;/td&gt;
&lt;td&gt;$6.25 / 1 TB (on-demand)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Egress&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data transferred from remote to primary region&lt;/td&gt;
&lt;td&gt;~$0.08–$0.12 / 1 GB (intercontinental)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Temporary Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intermediate data stored for up to 8 hours&lt;/td&gt;
&lt;td&gt;~$0.02/GB-month (prorated)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Cost simulation
&lt;/h3&gt;

&lt;p&gt;Scenario: Query from Tokyo, scanning a 1 TB table in &lt;code&gt;us-central1&lt;/code&gt;, with a &lt;code&gt;WHERE&lt;/code&gt; clause that reduces the data transferred to 1 GB.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compute: 1 TB × $6.25 = &lt;strong&gt;$6.25&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Egress: 1 GB × $0.12 = &lt;strong&gt;$0.12&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$6.37&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip the &lt;code&gt;WHERE&lt;/code&gt; clause and transfer the full 1 TB: egress alone exceeds &lt;strong&gt;$100&lt;/strong&gt;. Pushdown is not optional — it's the entire cost model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dry run before executing
&lt;/h3&gt;

&lt;p&gt;Use the BigQuery Console (it shows estimated bytes scanned before you click Run) or the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bq query &lt;span class="nt"&gt;--dry_run&lt;/span&gt; &lt;span class="nt"&gt;--use_legacy_sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="s1"&gt;'SELECT ...'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: As of the current preview, dry runs may not accurately estimate egress (only compute bytes). Budget conservatively.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Latency
&lt;/h3&gt;

&lt;p&gt;Cross-region queries are &lt;strong&gt;always slower&lt;/strong&gt; than single-region queries. Physical distance adds hundreds of milliseconds of network latency, plus multi-region orchestration overhead. Expect a minimum of &lt;strong&gt;5–10 seconds&lt;/strong&gt; even for modest cross-region joins. Real-time dashboards are not a good fit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Residency
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;Primary Region is where remote data lands temporarily&lt;/strong&gt;. If GDPR or local privacy laws prohibit data from Region A leaving Region A, you must run the query &lt;em&gt;from&lt;/em&gt; Region A as the primary — not from a region outside it. VPC Service Controls perimeters are also respected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Limitations (Preview, March 2026)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  No Query Cache
&lt;/h3&gt;

&lt;p&gt;Global queries never use the query cache. Since data can change in any remote region at any time, BigQuery always reads fresh data. Every execution incurs full compute and egress costs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workaround&lt;/strong&gt;: For frequently-used cross-region joins, materialize results into a local table using &lt;code&gt;CREATE TABLE AS SELECT&lt;/code&gt; and query that instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  No INFORMATION_SCHEMA from Remote Regions
&lt;/h3&gt;

&lt;p&gt;You cannot query &lt;code&gt;INFORMATION_SCHEMA&lt;/code&gt; views from a remote region within a global query. Joining metadata across regions requires first exporting that metadata into regular tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unsupported Table Types
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BigLake Apache Iceberg tables&lt;/strong&gt; in remote regions are not supported as remote sources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition pseudo-columns&lt;/strong&gt; (&lt;code&gt;_PARTITIONTIME&lt;/code&gt;, &lt;code&gt;_PARTITIONDATE&lt;/code&gt;) may not pushdown correctly (more on this below).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  No Sandbox Support
&lt;/h3&gt;

&lt;p&gt;Billing Account required. The Sandbox (free tier) does not support Global Queries because egress charges can exceed the free quota.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Partition Pseudo-Column Trap
&lt;/h2&gt;

&lt;p&gt;This is the most dangerous limitation in production, and deserves its own section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background: Pseudo-columns vs. Physical columns
&lt;/h3&gt;

&lt;p&gt;BigQuery offers two partitioning strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Partition Key&lt;/th&gt;
&lt;th&gt;Access&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ingestion-time partitioned&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Arrival timestamp, managed by BigQuery&lt;/td&gt;
&lt;td&gt;Via &lt;code&gt;_PARTITIONTIME&lt;/code&gt; / &lt;code&gt;_PARTITIONDATE&lt;/code&gt; (pseudo-columns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Column-based partitioned&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;An actual column in your table schema (e.g., &lt;code&gt;event_date&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Via the column name directly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Pseudo-columns are not part of the formal table schema. They're metadata-level constructs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why pushdown fails for pseudo-columns
&lt;/h3&gt;

&lt;p&gt;When the Query Optimizer sends execution instructions to a remote region, it works from the table's schema definition. Pseudo-columns aren't in that definition, so the optimizer can't reliably communicate partition pruning constraints to the remote worker.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worst case&lt;/strong&gt;: A filter like &lt;code&gt;WHERE _PARTITIONDATE = '2026-03-01'&lt;/code&gt; is silently ignored in the remote region. The remote worker scans the entire table across all partitions and begins transferring everything to the primary region. Your query either times out or generates a very large bill.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix: Migrate to column-based partitioning
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create a new table with an explicit physical partition column&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="nv"&gt;`project.dataset.new_table`&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_PARTITIONDATE&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;event_date&lt;/span&gt;  &lt;span class="c1"&gt;-- materialize the pseudo-column&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.dataset.old_table`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a physical column, the optimizer sees it in the schema, understands the partition structure, and confidently applies pushdown in the remote region.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workaround B: Aliasing via Views (use with caution)
&lt;/h3&gt;

&lt;p&gt;If migrating the table isn't possible, you can create a view in the remote region that aliases the pseudo-column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- View in us-central1&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;VIEW&lt;/span&gt; &lt;span class="nv"&gt;`project.us_dataset.v_sales`&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;_PARTITIONDATE&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;partition_date_col&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.us_dataset.ingestion_time_partitioned_table`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query the view from the primary region:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="nv"&gt;`project.us_dataset.v_sales`&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;partition_date_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2026-03-01'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;em&gt;sometimes&lt;/em&gt; works for simple queries, but pushdown is not guaranteed. In complex queries with JOINs or aggregations, the optimizer often loses the connection between the aliased column and the underlying partition structure, falls back to full-scan, and transfers everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always verify&lt;/strong&gt; that pushdown is working by checking the Query Execution Plan and confirming the remote &lt;code&gt;READ&lt;/code&gt; stage shows filtered row counts — not the full table row count.&lt;/p&gt;




&lt;h2&gt;
  
  
  Operational Best Practices
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No query cache&lt;/td&gt;
&lt;td&gt;Materialize frequent cross-region joins into local intermediate tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Need metadata across regions&lt;/td&gt;
&lt;td&gt;Export metadata to regular tables on a schedule&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ingestion-time partitioned tables&lt;/td&gt;
&lt;td&gt;Migrate to column-based partitioning before using as remote sources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unclear cost pre-execution&lt;/td&gt;
&lt;td&gt;Use dry run + estimate egress separately; add a buffer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;BigQuery Global Queries is a genuinely useful feature that eliminates an entire category of ETL pipelines. The execution model is well-designed — pushdown at the predicate, projection, and aggregation levels means you're typically only transferring the data you actually need.&lt;/p&gt;

&lt;p&gt;The key things to internalize:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pushdown is the cost model.&lt;/strong&gt; Filter early, select only the columns you need, push aggregations to the remote side.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion-time partitioned tables are a liability&lt;/strong&gt; in global queries. Migrate to column-based partitioning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's Preview&lt;/strong&gt; — no query cache, no &lt;code&gt;INFORMATION_SCHEMA&lt;/code&gt; cross-region, no BigLake Iceberg remotes. Design your architecture around these constraints.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Check the &lt;a href="https://cloud.google.com/bigquery/docs/global-queries" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt; for the latest changes as this feature moves toward GA.&lt;/p&gt;

</description>
      <category>bigquery</category>
      <category>gcp</category>
      <category>dataengineering</category>
      <category>sql</category>
    </item>
    <item>
      <title>The Cloud is No Longer Virtual: The Harsh Physical Reality of AI Infra in 2026</title>
      <dc:creator>Hiroshi Toyama</dc:creator>
      <pubDate>Mon, 23 Feb 2026 08:42:02 +0000</pubDate>
      <link>https://dev.to/toyama0919/the-cloud-is-no-longer-virtual-the-harsh-physical-reality-of-ai-infra-in-2026-27ba</link>
      <guid>https://dev.to/toyama0919/the-cloud-is-no-longer-virtual-the-harsh-physical-reality-of-ai-infra-in-2026-27ba</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;The "Virtual" in Cloud is fading. In 2026, AI infrastructure is dominated by three physical constraints: &lt;strong&gt;power grid capacity&lt;/strong&gt;, &lt;strong&gt;tax legislations&lt;/strong&gt;, and &lt;strong&gt;liquid cooling&lt;/strong&gt;. If you are still picking regions based solely on latency, you are overpaying by at least 20%.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Death of the "Sales Tax Holiday"
&lt;/h2&gt;

&lt;p&gt;For a decade, states like Virginia attracted data centers with massive sales tax exemptions. That era ended in February 2026 with &lt;strong&gt;Virginia HB 897&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this matters for your bill:
&lt;/h3&gt;

&lt;p&gt;In the US, "Sales Tax" works differently from Japan's VAT or Europe's VAT. It is a &lt;strong&gt;sunken cost&lt;/strong&gt; with no tax credit for businesses. When a state removes a 6-10% tax exemption on hardware:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An NVIDIA B200 cluster worth $100M suddenly costs &lt;strong&gt;$110M&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;This extra Capex is directly passed to you as higher hourly instance rates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Move:&lt;/strong&gt; We are seeing a "Great Migration" to the &lt;strong&gt;Midwest AI Belt&lt;/strong&gt; (Indiana, Ohio, Iowa), where 20-30 year tax holidays are still guaranteed. &lt;/p&gt;




&lt;h2&gt;
  
  
  2. Why "Power" is the New "Latency"
&lt;/h2&gt;

&lt;p&gt;We used to care about milliseconds. Now, we care about Megawatts. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Virginia Gridlock
&lt;/h3&gt;

&lt;p&gt;In North Virginia (&lt;code&gt;us-east-1&lt;/code&gt;), data centers now consume over &lt;strong&gt;25% of the total state power&lt;/strong&gt;. The grid is saturated. To build new AI capacity, AWS and Google are now forced to become &lt;strong&gt;Energy Producers&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Nuclear is the New "Default Gateway"
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SMRs (Small Modular Reactors):&lt;/strong&gt; AWS is deploying SMRs as "Microservices for Energy"—factory-built reactors that can be dropped next to a data center.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct-to-Plant:&lt;/strong&gt; Microsoft and Azure are restarting decommissioned plants (like Three Mile Island) just to keep their GPUs humming.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. The "Jevons Paradox" of NVIDIA GPUs
&lt;/h2&gt;

&lt;p&gt;People often ask: &lt;em&gt;"Why doesn't NVIDIA make low-power GPUs?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The answer is &lt;strong&gt;Tokens per Watt&lt;/strong&gt;. NVIDIA's Blackwell (B200) consumes a massive &lt;strong&gt;1,200W&lt;/strong&gt;, but it is &lt;strong&gt;25x more efficient&lt;/strong&gt; at generating tokens than the previous generation.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Thermal Wall
&lt;/h3&gt;

&lt;p&gt;Because one rack now pulls &lt;strong&gt;120kW+&lt;/strong&gt;, traditional air cooling is dead. 2026 is the year of &lt;strong&gt;Liquid Cooling&lt;/strong&gt;. If your DC doesn't have pipes, it can't run the latest AI models. This creates a "Performance Gap" between old regions and new AI-native regions.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Tokyo Context: Why so expensive?
&lt;/h2&gt;

&lt;p&gt;Many Japanese developers wonder why &lt;code&gt;ap-northeast-1&lt;/code&gt; costs more than &lt;code&gt;us-east-1&lt;/code&gt; despite Japan's "cheaper" cost of living.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Imported Energy:&lt;/strong&gt; Japan's industrial electricity is &lt;strong&gt;2-3x more expensive&lt;/strong&gt; than the US.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Dollar-Denominated Silicon:&lt;/strong&gt; Everything from the GPU to the fuel for the power plant is priced in USD. The weak Yen makes these "imported" cloud resources luxury items.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Humidity:&lt;/strong&gt; Tokyo’s humid summers make PUE (Power Usage Effectiveness) worse than the dry, flat plains of Ohio.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  5. FinOps 2026: Actions for Engineers
&lt;/h2&gt;

&lt;p&gt;"Turning off idle instances" is FinOps 101. To be a Senior Infrastructure Engineer in 2026, you need &lt;strong&gt;Regional Arbitrage&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Move Training to the Midwest:&lt;/strong&gt; Shift non-latency-sensitive training jobs from &lt;code&gt;us-east-1&lt;/code&gt; to &lt;code&gt;us-west-2&lt;/code&gt; (Oregon) or the new Indiana regions to save &lt;strong&gt;10-15%&lt;/strong&gt; on tax and power alone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Token-Specific Hardware:&lt;/strong&gt; Evaluate &lt;strong&gt;TPU v7&lt;/strong&gt; (Google Cloud) or &lt;strong&gt;Trainium 2&lt;/strong&gt; (AWS). In 2026, specialized ASICs are often &lt;strong&gt;3x more cost-effective&lt;/strong&gt; than general-purpose GPUs for specific LLM workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as Code (IaC) for Regions:&lt;/strong&gt; Don't hardcode regions. Use variables that allow you to follow the "Tax-Free Energy" across the globe.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The cloud is no longer an invisible layer of abstraction. It is a physical plant that breathes energy and exhales heat. The best engineers in 2026 will be those who understand the &lt;strong&gt;physics and economics&lt;/strong&gt; behind the API call.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What are your thoughts?&lt;/strong&gt; Are you planning to migrate your workloads out of Virginia? Let's discuss in the comments!&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>ai</category>
      <category>aws</category>
      <category>finops</category>
    </item>
  </channel>
</rss>
