<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Juan Torchia</title>
    <description>The latest articles on DEV Community by Juan Torchia (@jtorchia).</description>
    <link>https://dev.to/jtorchia</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F885942%2F5b3b3860-d364-4de0-a335-cb7c251109d9.jpeg</url>
      <title>DEV Community: Juan Torchia</title>
      <link>https://dev.to/jtorchia</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jtorchia"/>
    <language>en</language>
    <item>
      <title>Show HN: Needle distilled Gemini tool calling into 26M parameters — technical read, zero hype</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 12:30:43 +0000</pubDate>
      <link>https://dev.to/jtorchia/show-hn-needle-distilled-gemini-tool-calling-into-26m-parameters-technical-read-zero-hype-46jo</link>
      <guid>https://dev.to/jtorchia/show-hn-needle-distilled-gemini-tool-calling-into-26m-parameters-technical-read-zero-hype-46jo</guid>
      <description>&lt;h1&gt;
  
  
  Show HN: Needle distilled Gemini tool calling into 26M parameters — technical read, zero hype
&lt;/h1&gt;

&lt;p&gt;I was in the middle of reviewing my Ollama pipeline when the HN post appeared: &lt;em&gt;Needle&lt;/em&gt;, a 26M parameter model distilled from Gemini specifically for tool calling. My first reaction was skeptical. 26M sounds like a toy. Then I read more carefully and understood that the interesting point isn't the size — it's the problem they're actually attacking.&lt;/p&gt;

&lt;p&gt;Here's my technical read. No euphoria, no easy dismissal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem behind Needle and Gemini tool calling distillation
&lt;/h2&gt;

&lt;p&gt;My thesis is this: &lt;strong&gt;the bottleneck in systems with external tools isn't the LLM's general reasoning — it's the parsability of the output&lt;/strong&gt;. If the model produces malformed JSON, calls functions with wrong arguments, or hallucinates tool names that don't exist, the whole system breaks — doesn't matter how "intelligent" the model is at other tasks.&lt;/p&gt;

&lt;p&gt;I ran into this directly while building agent loops with Claude Code. The most fragile part was never the reasoning; it was the reliability of the data contract. It reminded me of when I resisted TypeScript for years thinking types were bureaucracy. Then I understood that most avoidable failures start as poorly expressed data contracts. Tool calling is exactly the same: a model can be brilliant in prose and terrible at respecting a strict JSON schema under latency pressure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Needle attacks that specific point&lt;/strong&gt;: it takes Gemini's tool calling behavior — which is consistent and well-structured — and distills it into a small, specialized model. The hypothesis is that for &lt;em&gt;this specific task&lt;/em&gt;, 26M parameters trained on the right behavior can outperform giant generalist models that were never fine-tuned to respect function schemas with precision.&lt;/p&gt;

&lt;p&gt;Is it true? In their own benchmarks, according to the project repo, yes. In my own real production, I don't know yet — and that difference matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  What knowledge distillation is and why it matters here
&lt;/h2&gt;

&lt;p&gt;Knowledge distillation is a technique where a large model — the &lt;em&gt;teacher&lt;/em&gt; — generates outputs that are then used to train a smaller model — the &lt;em&gt;student&lt;/em&gt;. The student doesn't learn from raw data: it learns to imitate the teacher's behavior on the distributions that matter most.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Simplified concept of the distillation pipeline for tool calling:&lt;/span&gt;
&lt;span class="c"&gt;# 1. Teacher (Gemini) generates thousands of correct tool calling examples&lt;/span&gt;
&lt;span class="c"&gt;# 2. Student (Needle, 26M) trains on those examples&lt;/span&gt;
&lt;span class="c"&gt;# 3. The student learns the teacher's output distribution, not hand-written rules&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tool calling, this makes particular sense. You don't need the model to know universal history. You need it to, when you hand it this schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Tool definition — the model has to respect this 100%&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_product&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Searches for a product by ID in the catalog&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;include_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;boolean&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;product_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Produce exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search_product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"product_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SKU-4821"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"include_stock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not some creative variation with renamed keys, wrong types, or invented fields. Small generalist models fail at this constantly. If Needle solves it reliably, the use case exists.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to test it in Ollama: a reproducible checklist
&lt;/h2&gt;

&lt;p&gt;If you want to validate whether a model like Needle has a place in your stack, the criterion shouldn't be someone else's benchmark. It should be your own set of tools under your system's real conditions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Install Ollama if you haven't&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Step 2: When the model is available in the Ollama registry, pull directly&lt;/span&gt;
&lt;span class="c"&gt;# (check availability at https://ollama.com/search)&lt;/span&gt;
ollama pull needle  &lt;span class="c"&gt;# tentative name — verify the official registry&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: Prepare your own tool calling test suite&lt;/span&gt;
&lt;span class="c"&gt;# Don't use the model README's examples; use YOUR real tools&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tool-calling-test.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Validation criteria I'd use to evaluate any small model&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;TestResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;case&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;validJson&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;schemaRespected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;evaluateToolCallingModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TestResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TestResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;cases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Call the model via Ollama API&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="c1"&gt;// Pass tools as part of the request&lt;/span&gt;
        &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate if the JSON is parseable and if it respects the schema&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;validJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;schemaRespected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// The tool_call should be in message.tool_calls[0]&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;validJson&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="c1"&gt;// Basic schema validation: required keys must be present&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;schemaRespected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;parse error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;case&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;testCase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;expectedSchema&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;received&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;validJson&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;schemaRespected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My minimum acceptance criteria for any tool calling model in a real system:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Minimum acceptable&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Valid JSON&lt;/td&gt;
&lt;td&gt;99%+&lt;/td&gt;
&lt;td&gt;A parse error in production breaks the entire flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema respected&lt;/td&gt;
&lt;td&gt;95%+&lt;/td&gt;
&lt;td&gt;Wrong arguments are silently dangerous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;p95 latency&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms local&lt;/td&gt;
&lt;td&gt;If it's slower than an external API, you've lost the point&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool name hallucination&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;An invented name is a non-recoverable error&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The limits that the hype doesn't mention
&lt;/h2&gt;

&lt;p&gt;There are three limitations that don't show up in the headlines and that I consider essential before betting on a distilled model in a real system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, the teacher's distribution defines the ceiling.&lt;/strong&gt; If Gemini has biases in how it generates tool calls — certain argument patterns, certain naming conventions — the student inherits them unfiltered. This matters if your API has conventions that drift from Gemini's style.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, generalization to unseen schemas is an open question.&lt;/strong&gt; A distilled model can be excellent on the patterns it learned and brittle against complex schemas with &lt;code&gt;anyOf&lt;/code&gt;, nested &lt;code&gt;$ref&lt;/code&gt;s, or conditional validations. You have to test it explicitly against your own schemas — don't assume the general benchmark applies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, 26M parameters implies limited context capacity.&lt;/strong&gt; In systems where the prompt includes many tools simultaneously — common in backends with dozens of endpoints exposed as tools — degradation can be significant. That's a hypothesis to validate, not assume.&lt;/p&gt;

&lt;p&gt;None of this invalidates the project. It locates it. The same discipline I applied when reviewing &lt;a href="https://juanchi.dev/en/blog/pnpm-workspaces-ci-cache-github-actions-40-minutes-fix" rel="noopener noreferrer"&gt;pnpm workspaces cache issues in CI&lt;/a&gt; applies here: understand the limit first, then decide if it fits.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where Needle makes sense and where it doesn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where it makes sense to try Needle:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local agent pipelines where network latency to external APIs is the bottleneck&lt;/li&gt;
&lt;li&gt;Edge devices or resource-constrained environments where a 26M model fits in memory comfortably&lt;/li&gt;
&lt;li&gt;Systems with a &lt;em&gt;bounded and stable&lt;/em&gt; set of tools — not dozens of shifting schemas&lt;/li&gt;
&lt;li&gt;As a local fallback when external APIs are unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenarios where it probably doesn't cut it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Systems where reasoning between tool calling steps is complex — deciding &lt;em&gt;when&lt;/em&gt; to call which tool, not just &lt;em&gt;how&lt;/em&gt; to call it&lt;/li&gt;
&lt;li&gt;APIs with deeply nested or polymorphic schemas&lt;/li&gt;
&lt;li&gt;Flows where long conversational context matters — the 26M context limit is going to hurt&lt;/li&gt;
&lt;li&gt;Environments that need auditable safety guarantees — a privately distilled model is a considerably more opaque box&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tension that surfaced in the &lt;a href="https://juanchi.dev/en/blog/spring-boot-actuator-production-endpoints-hardening-checklist" rel="noopener noreferrer"&gt;Spring Boot Actuator in production&lt;/a&gt; post applies differently here: the comfort of "it works in the demo" can hide surface risks that only show up under load or with unexpected inputs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this signals for the small model ecosystem
&lt;/h2&gt;

&lt;p&gt;The uncomfortable thing about Needle isn't the model itself. It's what it confirms: &lt;strong&gt;functional specialization is going to pressure the hegemony of large general models on structured tasks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Tool calling, intent classification, entity extraction with fixed schemas — these are tasks where a well-trained distilled model can beat GPT-4 or Claude on cost and latency without sacrificing reliability. That changes the architecture calculation.&lt;/p&gt;

&lt;p&gt;In my current stack with Claude Code for complex reasoning and Ollama for local tasks, there's a gap exactly where Needle would aim: the tool router that decides which function to call and with what arguments, without needing the overhead of a 70B model for that. I'm not saying I'll adopt it tomorrow. I'm saying the category makes sense and the experiment deserves follow-through.&lt;/p&gt;

&lt;p&gt;Same as when I evaluated &lt;a href="https://juanchi.dev/en/blog/spring-security-spring-boot-actuator-authorization-model-production" rel="noopener noreferrer"&gt;Jakarta EE vs Spring Boot tradeoffs&lt;/a&gt; or compared &lt;a href="https://juanchi.dev/en/blog/pnpm-vs-npm-vs-yarn-2026-monorepo-real-benchmark" rel="noopener noreferrer"&gt;package managers in real monorepos&lt;/a&gt;, the honest answer isn't "adopt it now" or "ignore it" — it's "test it against your own criteria before committing."&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Needle, distillation, and tool calling in small models
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What exactly is model distillation in the LLM context?&lt;/strong&gt;&lt;br&gt;
It's a process where a large model (&lt;em&gt;teacher&lt;/em&gt;) generates a dataset of correct behavior — in this case, well-formed tool calling examples — which is used to train a small model (&lt;em&gt;student&lt;/em&gt;). The student learns to imitate the teacher's output distribution on the specific tasks it was distilled for, without needing the teacher's full architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is 26M parameters enough for reliable tool calling?&lt;/strong&gt;&lt;br&gt;
Depends on the scope. For a bounded set of tools with simple schemas, probably yes. For systems with dozens of complex tools, long contexts, or multi-step reasoning, it's an open hypothesis. The project's own benchmark is optimistic; validation against your own schemas is mandatory before betting on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I test it locally without risking a production system?&lt;/strong&gt;&lt;br&gt;
With Ollama, if the model is available in the registry, it's as simple as &lt;code&gt;ollama pull [name]&lt;/code&gt; and then evaluating with your own script against the schemas you already use. The validation checklist in this post is a starting point. Always against your real tools — never against the README examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the practical difference between Needle and using function calling from OpenAI or Anthropic?&lt;/strong&gt;&lt;br&gt;
Latency, cost, and privacy. A local model has no network RTT, no per-token cost, and doesn't send your tool schemas to an external API. The tradeoff is that reliability depends entirely on the local model's training quality, without the backing of a provider with an SLA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it worth it for an individual stack or only for companies with infrastructure?&lt;/strong&gt;&lt;br&gt;
A 26M model runs on a MacBook with 8GB of RAM without drama. This isn't enterprise infrastructure. If you're already using Ollama for other tasks — like I am — adding a specialized model is operationally trivial. The real cost is evaluation time, not hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens if the model hallucinates a tool name that doesn't exist in my system?&lt;/strong&gt;&lt;br&gt;
That's the worst case and you have to design for it as an expected failure. The routing layer that consumes the model's output has to validate that the tool call &lt;code&gt;name&lt;/code&gt; corresponds to a registered tool before executing anything. If it doesn't exist, the error has to be explicit and not silent. This is basic defensive design, independent of which model you use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: test it with your eyes open
&lt;/h2&gt;

&lt;p&gt;I'm not going to say Needle is the future or that it's noise. My position is more specific: &lt;strong&gt;functional distillation of large model behavior into small specialized models is a legitimate direction, and tool calling is a use case where it makes genuine technical sense&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;What I don't buy is enthusiasm without friction. A 26M model has real limits around context, generalization, and reliability on unseen schemas. Those limits don't appear in the HN post and they will appear in production.&lt;/p&gt;

&lt;p&gt;My concrete recommendation: if you have an agent pipeline with a stable set of tools and latency is a problem, build a test harness with your own schemas, run it against the acceptance criteria in this post, and measure. If it clears 99% valid JSON and 95% schema respected on your own cases, you have something useful. If not, you know exactly why.&lt;/p&gt;

&lt;p&gt;That's more useful than any benchmark someone else wrote.&lt;/p&gt;

&lt;p&gt;Are you using local models for tool calling? Tell me at &lt;a href="https://juanchi.dev" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt; what stack you built and where you hit the limits.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/needle-gemini-tool-calling-26m-parameters-technical-read" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>typescript</category>
      <category>llm</category>
      <category>ialocal</category>
    </item>
    <item>
      <title>Show HN: Needle distilled Gemini tool calling en 26M parámetros — lectura técnica sin hype</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sun, 17 May 2026 12:30:38 +0000</pubDate>
      <link>https://dev.to/jtorchia/show-hn-needle-distilled-gemini-tool-calling-en-26m-parametros-lectura-tecnica-sin-hype-1i7b</link>
      <guid>https://dev.to/jtorchia/show-hn-needle-distilled-gemini-tool-calling-en-26m-parametros-lectura-tecnica-sin-hype-1i7b</guid>
      <description>&lt;h1&gt;
  
  
  Show HN: Needle distilled Gemini tool calling en 26M parámetros — lectura técnica sin hype
&lt;/h1&gt;

&lt;p&gt;Estaba revisando mi pipeline de Ollama cuando apareció el post en HN: &lt;em&gt;Needle&lt;/em&gt;, un modelo de 26M de parámetros destilado desde Gemini específicamente para tool calling. Mi primera reacción fue escéptica. 26M suena a juguete. Después leí con más calma y entendí que el punto interesante no es el tamaño: es el problema que están atacando.&lt;/p&gt;

&lt;p&gt;Acá va mi lectura técnica, sin euforia y sin descarte fácil.&lt;/p&gt;




&lt;h2&gt;
  
  
  El problema real detrás de Needle y la destilación de Gemini para tool calling
&lt;/h2&gt;

&lt;p&gt;Mi tesis es esta: &lt;strong&gt;el cuello de botella en sistemas con herramientas externas no es el razonamiento general del LLM, sino la parsabilidad del output&lt;/strong&gt;. Si el modelo produce JSON mal formado, llama funciones con argumentos incorrectos o alucina nombres de tools que no existen, el sistema entero se rompe — no importa qué tan "inteligente" sea el modelo en otras tareas.&lt;/p&gt;

&lt;p&gt;Esto lo experimenté directamente mientras armaba loops de agentes con Claude Code. La parte más frágil nunca fue el razonamiento; fue la confiabilidad del contrato de datos. Me acordé de cuando me resistí a TypeScript durante años pensando que los tipos eran burocracia. Después entendí que muchas fallas evitables empiezan como contratos de datos mal expresados. Con tool calling pasa exactamente lo mismo: un modelo puede ser brillante en prosa y pésimo para respetar un esquema JSON estricto bajo presión de latencia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Needle ataca ese punto específico&lt;/strong&gt;: toma el comportamiento de tool calling de Gemini — que es consistente y bien estructurado — y lo destila en un modelo pequeño y especializado. La hipótesis es que para &lt;em&gt;esta tarea concreta&lt;/em&gt;, 26M entrenados con el comportamiento correcto pueden superar a modelos gigantes generalistas que no fueron ajustados para respetar esquemas de función con precisión.&lt;/p&gt;

&lt;p&gt;¿Es verdad? En benchmarks propios, según el repositorio del proyecto, sí. En producción real propia, no lo sé todavía — y esa diferencia importa.&lt;/p&gt;




&lt;h2&gt;
  
  
  Qué es la destilación de conocimiento y por qué importa aquí
&lt;/h2&gt;

&lt;p&gt;La destilación de conocimiento (&lt;em&gt;knowledge distillation&lt;/em&gt;) es una técnica donde un modelo grande — el &lt;em&gt;teacher&lt;/em&gt; — genera outputs que después se usan para entrenar un modelo pequeño — el &lt;em&gt;student&lt;/em&gt;. El student no aprende de datos crudos: aprende a imitar el comportamiento del teacher en las distribuciones que más importan.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Concepto simplificado del pipeline de destilación para tool calling:&lt;/span&gt;
&lt;span class="c"&gt;# 1. Teacher (Gemini) genera miles de ejemplos de tool calling correcto&lt;/span&gt;
&lt;span class="c"&gt;# 2. Student (Needle, 26M) entrena sobre esos ejemplos&lt;/span&gt;
&lt;span class="c"&gt;# 3. El student aprende la distribución de outputs del teacher, no reglas escritas a mano&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Para tool calling, esto tiene sentido particular. No necesitás que el modelo sepa historia universal. Necesitás que cuando le pasés este schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Definición de herramienta — el modelo tiene que respetar esto al 100%&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;buscar_producto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Busca un producto por ID en el catálogo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;producto_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;incluir_stock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;boolean&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;producto_id&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El output sea exactamente:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"buscar_producto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"producto_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SKU-4821"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"incluir_stock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Y no alguna variación creativa con claves renombradas, tipos erróneos o campos inventados. En eso los modelos pequeños generalistas fallan bastante. Si Needle lo resuelve de forma confiable, el caso de uso existe.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cómo probarlo en Ollama: checklist reproducible
&lt;/h2&gt;

&lt;p&gt;Si querés validar si un modelo como Needle tiene lugar en tu stack, el criterio no debería ser un benchmark ajeno. Debería ser tu propio conjunto de herramientas bajo las condiciones reales de tu sistema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Paso 1: Instalar Ollama si no lo tenés&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Paso 2: Cuando el modelo esté disponible en Ollama registry, pull directo&lt;/span&gt;
&lt;span class="c"&gt;# (verificar disponibilidad en https://ollama.com/search)&lt;/span&gt;
ollama pull needle  &lt;span class="c"&gt;# nombre tentativo — verificar el registry oficial&lt;/span&gt;

&lt;span class="c"&gt;# Paso 3: Preparar un set de pruebas de tool calling propio&lt;/span&gt;
&lt;span class="c"&gt;# No uses los ejemplos del README del modelo; usá TUS herramientas reales&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// prueba-tool-calling.ts&lt;/span&gt;
&lt;span class="c1"&gt;// Criterios de validación que yo usaría para evaluar cualquier modelo pequeño&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;esperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;obtenido&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;jsonValido&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;schemaRespetado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;latenciaMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;evaluarModeloToolCalling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;modelo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;casos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ResultadoPrueba&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;casos&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;inicio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Llamada al modelo vía API de Ollama&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;respuesta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:11434/api/chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;modelo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="c1"&gt;// Pasar las herramientas como parte del request&lt;/span&gt;
        &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;respuesta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;latencia&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;inicio&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Validar si el JSON es parseable y si respeta el schema&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;jsonValido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// El tool_call debería estar en message.tool_calls[0]&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;?.[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nx"&gt;jsonValido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!!&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="c1"&gt;// Validación básica de schema: las claves required tienen que estar presentes&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;toolCall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;requiredKeys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;k&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;parse error&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;esperado&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;caso&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;schemaEsperado&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;obtenido&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;jsonValido&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;schemaRespetado&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;latenciaMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;latencia&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;resultados&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mi criterio mínimo de aceptación para cualquier modelo de tool calling en un sistema real:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Métrica&lt;/th&gt;
&lt;th&gt;Mínimo aceptable&lt;/th&gt;
&lt;th&gt;Por qué&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;JSON válido&lt;/td&gt;
&lt;td&gt;99%+&lt;/td&gt;
&lt;td&gt;Un parse error en producción rompe el flujo entero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Schema respetado&lt;/td&gt;
&lt;td&gt;95%+&lt;/td&gt;
&lt;td&gt;Argumentos incorrectos son silenciosamente peligrosos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latencia p95&lt;/td&gt;
&lt;td&gt;&amp;lt; 500ms local&lt;/td&gt;
&lt;td&gt;Si tarda más que una API externa, perdiste el punto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hallucination de tool names&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;Un nombre inventado es un error no recuperable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Los límites que el hype no menciona
&lt;/h2&gt;

&lt;p&gt;Hay tres limitaciones que no aparecen en los titulares y que me parecen centrales antes de apostar por un modelo destilado en un sistema real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primero, la distribución del teacher define el techo.&lt;/strong&gt; Si Gemini tiene sesgos en cómo genera tool calls — ciertos patrones de argumentos, ciertas convenciones de nombrado — el student los hereda sin filtro. Esto importa si tu API tiene convenciones que se alejan del estilo de Gemini.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Segundo, la generalización a schemas no vistos es una pregunta abierta.&lt;/strong&gt; Un modelo destilado puede ser excelente en los patrones que aprendió y frágil frente a schemas complejos con &lt;code&gt;anyOf&lt;/code&gt;, &lt;code&gt;$ref&lt;/code&gt; anidados o validaciones condicionales. Hay que probarlo explícitamente con los schemas propios, no asumir que el benchmark general aplica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tercero, el tamaño de 26M parámetros implica capacidad de contexto limitada.&lt;/strong&gt; En sistemas donde el prompt incluye muchas herramientas al mismo tiempo — algo común en backends con docenas de endpoints expuestos como tools — la degradación puede ser significativa. Es una hipótesis que hay que validar, no asumir.&lt;/p&gt;

&lt;p&gt;Esto no invalida el proyecto. Lo ubica. La misma disciplina que apliqué al revisar &lt;a href="https://juanchi.dev/es/blog/pnpm-workspaces-cache-github-actions-ci-problema" rel="noopener noreferrer"&gt;problemas de caché en CI con pnpm workspaces&lt;/a&gt; aplica acá: primero entender el límite, después decidir si encaja.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dónde Needle sí tiene sentido y dónde no
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Escenarios donde tiene sentido probar Needle:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pipelines de agentes locales donde la latencia de red hacia APIs externas es el cuello de botella&lt;/li&gt;
&lt;li&gt;Edge devices o entornos con recursos limitados donde un modelo de 26M entra en memoria cómodamente&lt;/li&gt;
&lt;li&gt;Sistemas con un conjunto &lt;em&gt;acotado y estable&lt;/em&gt; de herramientas — no docenas de schemas cambiantes&lt;/li&gt;
&lt;li&gt;Como fallback local cuando las APIs externas no están disponibles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Escenarios donde probablemente no alcanza:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sistemas donde el razonamiento entre pasos de tool calling es complejo — decidir &lt;em&gt;cuándo&lt;/em&gt; llamar qué tool, no solo &lt;em&gt;cómo&lt;/em&gt; llamarla&lt;/li&gt;
&lt;li&gt;APIs con schemas profundamente anidados o polimórficos&lt;/li&gt;
&lt;li&gt;Flujos donde el contexto conversacional largo importa — el límite de contexto de 26M va a doler&lt;/li&gt;
&lt;li&gt;Entornos que necesitan garantías de seguridad auditables — un modelo destilado privado es una caja más opaca&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;La tensión que señaló el post de &lt;a href="https://juanchi.dev/es/blog/spring-boot-actuator-endpoints-seguridad-produccion" rel="noopener noreferrer"&gt;Spring Boot Actuator en producción&lt;/a&gt; aplica de otra manera acá: la comodidad de "funciona en el demo" puede esconder riesgos de superficie que solo aparecen bajo carga o con inputs inesperados.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lo que esto anticipa para el ecosistema de modelos pequeños
&lt;/h2&gt;

&lt;p&gt;Lo incómodo de Needle no es el modelo en sí. Es lo que confirma: &lt;strong&gt;la especialización funcional va a presionar la hegemonía de los modelos grandes generales en tareas estructuradas&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Tool calling, clasificación de intents, extracción de entidades con schema fijo — son tareas donde un modelo destilado bien entrenado puede ganarle a GPT-4 o Claude en costo y latencia sin sacrificar confiabilidad. Eso cambia el cálculo de arquitectura.&lt;/p&gt;

&lt;p&gt;En mi stack actual con Claude Code para razonamiento complejo y Ollama para tareas locales, hay un hueco exactamente donde Needle apuntaría: el router de herramientas que decide qué función llamar y con qué argumentos, sin necesitar el overhead de un modelo de 70B para eso. No digo que lo vaya a adoptar mañana. Digo que la categoría tiene sentido y que el experimento merece seguimiento.&lt;/p&gt;

&lt;p&gt;Al igual que cuando evalué &lt;a href="https://juanchi.dev/es/blog/spring-boot-actuator-security-spring-security-produccion-modelo-autorizacion" rel="noopener noreferrer"&gt;tradeoffs de Jakarta EE vs Spring Boot&lt;/a&gt; o comparé &lt;a href="https://juanchi.dev/es/blog/pnpm-vs-npm-2026-monorepo-benchmark-real" rel="noopener noreferrer"&gt;gestores de paquetes en monorepos reales&lt;/a&gt;, la respuesta honesta no es "adoptalo ya" ni "ignoralo": es "probalo con tus propios criterios antes de comprometerte".&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Needle, destilación y tool calling en modelos pequeños
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;¿Qué es exactamente la destilación de modelos en el contexto de LLMs?&lt;/strong&gt;&lt;br&gt;
Es un proceso donde un modelo grande (&lt;em&gt;teacher&lt;/em&gt;) genera un dataset de comportamiento correcto — en este caso, ejemplos de tool calling bien formados — que se usa para entrenar un modelo pequeño (&lt;em&gt;student&lt;/em&gt;). El student aprende a imitar la distribución de outputs del teacher en las tareas específicas para las que fue destilado, sin necesitar la arquitectura completa del teacher.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿26M parámetros es suficiente para tool calling confiable?&lt;/strong&gt;&lt;br&gt;
Depende del scope. Para un conjunto acotado de herramientas con schemas simples, probablemente sí. Para sistemas con docenas de herramientas complejas, contextos largos o razonamiento multi-paso, es una hipótesis abierta. El benchmark del proyecto es optimista; la validación con schemas propios es obligatoria antes de apostar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cómo lo pruebo localmente sin comprometer un sistema en producción?&lt;/strong&gt;&lt;br&gt;
Con Ollama, si el modelo está disponible en el registry, es tan simple como &lt;code&gt;ollama pull [nombre]&lt;/code&gt; y después evaluar con un script propio contra los schemas que ya usás. El checklist de validación de este post es un punto de partida. Siempre contra tus herramientas reales, nunca contra los ejemplos del README.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cuál es la diferencia práctica entre Needle y usar function calling de OpenAI o Anthropic?&lt;/strong&gt;&lt;br&gt;
Latencia, costo y privacidad. Un modelo local no tiene RTT de red, no tiene costo por token y no manda los schemas de tus herramientas a una API externa. La contrapartida es que la confiabilidad depende enteramente de la calidad del entrenamiento del modelo local, sin el respaldo de un proveedor con SLA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Vale la pena para un stack individual o solo para empresas con infraestructura?&lt;/strong&gt;&lt;br&gt;
Un modelo de 26M entra en una MacBook con 8GB de RAM sin drama. No es infraestructura de empresa. Si ya usás Ollama para otras tareas — como yo — agregar un modelo especializado es operativamente trivial. El costo real es el tiempo de evaluación, no el hardware.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Qué pasa si el modelo alucina un nombre de herramienta que no existe en mi sistema?&lt;/strong&gt;&lt;br&gt;
Es el peor caso y hay que diseñarlo como falla esperada. La capa de routing que consume el output del modelo tiene que validar que el &lt;code&gt;name&lt;/code&gt; de la tool call corresponda a una herramienta registrada antes de ejecutar. Si no existe, el error tiene que ser explícito y no silencioso. Esto es diseño defensivo básico, independiente del modelo que uses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusión: probalo con los ojos abiertos
&lt;/h2&gt;

&lt;p&gt;No voy a decir que Needle es el futuro ni que es ruido. Mi postura es más específica: &lt;strong&gt;la destilación funcional de comportamiento de modelos grandes en modelos pequeños especializados es una dirección legítima, y tool calling es un caso de uso donde tiene sentido técnico genuino&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Lo que no compro es el entusiasmo sin fricción. Un modelo de 26M tiene límites reales de contexto, de generalización y de confiabilidad bajo schemas no vistos. Esos límites no aparecen en el post de HN y aparecerán en producción.&lt;/p&gt;

&lt;p&gt;Mi recomendación concreta: si tenés un pipeline de agentes con un conjunto estable de herramientas y latencia es un problema, armá un harness de prueba con los schemas propios, correlo contra los criterios de aceptación del post y medí. Si pasa el umbral de 99% de JSON válido y 95% de schema respetado en tus propios casos, tenés algo útil. Si no, sabés exactamente por qué.&lt;/p&gt;

&lt;p&gt;Eso es más útil que cualquier benchmark ajeno.&lt;/p&gt;

&lt;p&gt;¿Estás usando modelos locales para tool calling? Contame en &lt;a href="https://juanchi.dev" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt; qué stack armaste y dónde encontraste los límites.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/show-needle-distilled-gemini-tool-calling-modelo-pequeno-analisis" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>typescript</category>
      <category>llm</category>
    </item>
    <item>
      <title>OpenTelemetry on Spring Boot 3: when logs say OK and traces show the problem</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 19:06:15 +0000</pubDate>
      <link>https://dev.to/jtorchia/opentelemetry-on-spring-boot-3-when-logs-say-ok-and-traces-show-the-problem-193o</link>
      <guid>https://dev.to/jtorchia/opentelemetry-on-spring-boot-3-when-logs-say-ok-and-traces-show-the-problem-193o</guid>
      <description>&lt;p&gt;There's a question I've asked myself many times while debugging backend systems: did the request take long because the DB was slow, because the downstream kept us waiting, or because some internal loop fired 60 queries to fetch 60 records? The log says &lt;code&gt;duration_ms=340&lt;/code&gt; and &lt;code&gt;status=200&lt;/code&gt;. That's it. You start guessing.&lt;/p&gt;

&lt;p&gt;That moment of uncertainty is where this lab came from. Not to measure OpenTelemetry overhead, not to compare Jaeger against Tempo, but to answer something more concrete: what signals do you lose when you only have good logs, and what shows up when you add a trace?&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/JuanTorchia/opentelemetry-spring-boot-lab" rel="noopener noreferrer"&gt;github.com/JuanTorchia/opentelemetry-spring-boot-lab&lt;/a&gt;, commit &lt;code&gt;c12ea4e848dc431c8bbd324318399172302fe053&lt;/code&gt;, tag &lt;code&gt;editorial-final-diagnosis-comparison-v2&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup: a lab that produces evidence, not benchmarks
&lt;/h2&gt;

&lt;p&gt;The stack is Spring Boot 3.5.7, Java 21, PostgreSQL 16, OpenTelemetry API 1.43.0, OpenTelemetry Java Agent 2.9.0, and Jaeger all-in-one. Everything starts with Docker Compose. To reproduce it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Quick smoke test with small dataset (1k tasks)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;small&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Full editorial run (50k tasks, 200 requests, warmup 20, concurrency 8)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Runs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Requests&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Warmup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Concurrency&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The runner starts Compose, downloads the agent into &lt;code&gt;tools/&lt;/code&gt;, packages the jar, seeds Postgres with synthetic tables (&lt;code&gt;organizations&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;projects&lt;/code&gt;, &lt;code&gt;tasks&lt;/code&gt;, &lt;code&gt;comments&lt;/code&gt;), runs the scenarios, queries Jaeger by &lt;code&gt;traceId&lt;/code&gt;, and regenerates the reports in &lt;code&gt;results/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Jaeger was chosen for local simplicity: one image, web UI, REST API to query traces by &lt;code&gt;traceId&lt;/code&gt;. Tempo is also valid, but needs more moving parts for a local editorial demo. This is not a production stack recommendation.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;editorial&lt;/code&gt; dataset has 50,000 tasks. The &lt;code&gt;small&lt;/code&gt; dataset has 1,000. That difference matters so the N+1 produces visible fan-out rather than a microsecond gap that disappears into noise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The instrumentation decision I care about most
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;pom.xml&lt;/code&gt; has &lt;code&gt;opentelemetry-api&lt;/code&gt; as a compile dependency, but the agent arrives at runtime. That means HTTP server, HTTP client, and JDBC are instrumented automatically without touching business code.&lt;/p&gt;

&lt;p&gt;Manual spans are used only for business stages that the agent can't infer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LabService.java — manual span to mark business intent&lt;/span&gt;
&lt;span class="nc"&gt;Span&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;spanBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"business.n_plus_one.load_tasks_then_comments"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;startSpan&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ignored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;makeCurrent&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// first fetches tasks, then runs one query per task&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"select t.id, t.title, u.display_name as assignee from tasks t "&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"join users u on u.id = t.assignee_id order by t.id limit ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;longValue&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// this query repeats per task → fan-out&lt;/span&gt;
        &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"select count(*) from comments where task_id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setAttribute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lab.n_plus_one.expected_extra_queries"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;end&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That mix is more honest for the post: auto-instrumentation for infrastructure, manual spans to explain intent. If I had used only manual spans, the lab would require observability-specific code in every layer. If I had relied only on the agent, business spans would be invisible.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;logback-spring.xml&lt;/code&gt; injects &lt;code&gt;traceId&lt;/code&gt; and &lt;code&gt;spanId&lt;/code&gt; into every log line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- logback-spring.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} %-5level traceId=%X{trace_id:-none} spanId=%X{span_id:-none} %logger{36} - %msg%n&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's what connects both worlds. A log with &lt;code&gt;traceId&lt;/code&gt; lets you jump directly to the trace in Jaeger. Without it, logs and traces are islands.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Matrix That Summarizes The Diagnosis
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;p95&lt;/th&gt;
&lt;th&gt;Avg spans&lt;/th&gt;
&lt;th&gt;Avg DB spans&lt;/th&gt;
&lt;th&gt;Error spans/request&lt;/th&gt;
&lt;th&gt;Defensible diagnosis&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;td&gt;3.04&lt;/td&gt;
&lt;td&gt;1.04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Healthy request, no weird story.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;59 ms&lt;/td&gt;
&lt;td&gt;3.04&lt;/td&gt;
&lt;td&gt;1.04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Same functional shape, no DB fan-out.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n-plus-one&lt;/td&gt;
&lt;td&gt;209 ms&lt;/td&gt;
&lt;td&gt;63.38&lt;/td&gt;
&lt;td&gt;61.38&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB fan-out visible inside one request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;downstream-slow&lt;/td&gt;
&lt;td&gt;374 ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Time concentrates in downstream.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;395 ms&lt;/td&gt;
&lt;td&gt;7.57&lt;/td&gt;
&lt;td&gt;1.57&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB, downstream, and transformation compete.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;partial-error&lt;/td&gt;
&lt;td&gt;184 ms&lt;/td&gt;
&lt;td&gt;6.27&lt;/td&gt;
&lt;td&gt;1.27&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Downstream error inside a partial response.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is not trying to crown a tool. It summarizes which signals are available for diagnosis. The strong point is not that one number is universal: it is that N+1 leaves a very different shape than the optimized case, and that shape does not appear in a flat log unless you enable SQL debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the six scenarios reveal
&lt;/h2&gt;

&lt;p&gt;The lab has six endpoints: &lt;code&gt;baseline&lt;/code&gt;, &lt;code&gt;n-plus-one&lt;/code&gt;, &lt;code&gt;optimized&lt;/code&gt;, &lt;code&gt;downstream-slow&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt;, and &lt;code&gt;partial-error&lt;/code&gt;. Each produces different signals that the runner consolidates into &lt;code&gt;results/comparison.md&lt;/code&gt; and &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The finding I most want to defend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N+1 vs optimized&lt;/strong&gt;: both return the same response shape. The log for both says &lt;code&gt;status=200&lt;/code&gt;. The difference lives in the trace: &lt;code&gt;n-plus-one&lt;/code&gt; generates an average of &lt;strong&gt;63.38 spans&lt;/strong&gt; per request in the editorial run; &lt;code&gt;optimized&lt;/code&gt; generates &lt;strong&gt;3.04&lt;/strong&gt;. That's not a universal performance claim — it's a diagnostic signal. With only logs and no SQL debug enabled, the difference is ambiguous. With the trace, DB fan-out is visible without extra configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downstream-slow&lt;/strong&gt;: p95 sits at &lt;strong&gt;374 ms&lt;/strong&gt;, very close to the configured 300 ms delay. Logs show total duration and &lt;code&gt;traceId&lt;/code&gt;. What they don't show is where that time went: was it DB? was it the downstream? was it in-memory transformation? The trace separates it: the downstream HTTP client span dominates the hierarchy. The local DB appears as a secondary span with low duration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed&lt;/strong&gt;: this is where flat logs fail the most. Three stages compete (DB, downstream, transformation) and none is obviously dominant. p95 reaches &lt;strong&gt;395 ms&lt;/strong&gt;. The trace shows the temporal distribution per stage. The log just says it was slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partial-error&lt;/strong&gt;: the endpoint responds with HTTP 206 (partial content). The log records &lt;code&gt;traceId&lt;/code&gt;, status, and error type. The trace goes further: the downstream span is marked with error, nested under a request that technically responded. Logs and trace don't replace each other here — they complement. The log alerts and lets you correlate. The trace places the error in the causal hierarchy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Screenshot That Changed The Diagnosis
&lt;/h2&gt;

&lt;p&gt;In Jaeger, &lt;code&gt;n-plus-one&lt;/code&gt; does not look like a request that is merely a bit slower. It looks like a request with DB fan-out: many repeated spans under the same business operation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" alt="Jaeger trace showing DB fan-out in the N+1 scenario" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The optimized case, on the other hand, keeps a compact shape. I do not need to read the code to suspect that the previous case was not "Postgres is slow" in the abstract, but the query shape.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" alt="Jaeger trace for the optimized scenario" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The partial-error case matters for another reason: the request can respond, while the downstream span is marked as errored. That nuance is exactly where logs and traces complement each other: the log alerts, the trace locates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" alt="Jaeger trace with partial downstream error marked" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest limits of the metrics
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;*_vs_root_pct&lt;/code&gt; fields in &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt; are cumulative percentages of span durations exported by Jaeger. They can exceed 100% when there are nested spans, client/server pairs, or overlap. The &lt;code&gt;duration_denominator_type&lt;/code&gt; field indicates what was used as the denominator: &lt;code&gt;root_span&lt;/code&gt;, &lt;code&gt;http_request_span&lt;/code&gt;, or &lt;code&gt;largest_observed_span&lt;/code&gt; if the trace was ambiguous.&lt;/p&gt;

&lt;p&gt;These are not overhead numbers. They are not an exact distribution of real request time. They are cumulative diagnostic signals. Treating them like CPU percentages would be a misread that this lab doesn't try to encourage.&lt;/p&gt;

&lt;p&gt;Similarly, &lt;code&gt;diagnosis_confidence_*&lt;/code&gt; is an editorial classification coded in &lt;code&gt;ScenarioDiagnosis.java&lt;/code&gt;, not an automatically measured metric. For N+1, &lt;code&gt;diagnosisConfidenceLogs&lt;/code&gt; is &lt;code&gt;low&lt;/code&gt; and &lt;code&gt;diagnosisConfidenceTrace&lt;/code&gt; is &lt;code&gt;high&lt;/code&gt;. That reflects the fact that without SQL debug, the log is ambiguous. It's not a universal benchmark of which tool is better.&lt;/p&gt;

&lt;h2&gt;
  
  
  My position: what I accept and what I don't buy
&lt;/h2&gt;

&lt;p&gt;I accept that OpenTelemetry with the Java Agent is a reasonable way to add structural visibility to a Spring Boot 3 app without polluting business code. JDBC and HTTP client auto-instrumentation works well for common scenarios.&lt;/p&gt;

&lt;p&gt;I don't buy the narrative that traces replace logs. The lab's &lt;code&gt;RequestCompletionLoggingFilter&lt;/code&gt; is a Servlet filter that records every completed request with scenario, method, path, status, and duration. Those logs are operationally useful even when Jaeger is unavailable. The &lt;code&gt;traceId&lt;/code&gt; in the log is the bridge, not the replacement.&lt;/p&gt;

&lt;p&gt;I also don't buy that Jaeger is the only valid option. It was chosen because it starts with one image and has a ready web UI. Tempo, Zipkin, or any OTLP-compatible backend would solve the same problem in this context.&lt;/p&gt;

&lt;p&gt;The honest trade-off is this: auto-instrumentation reduces accidental work but adds an agent on the classpath that exports data in the background. In a local lab that's trivial. In production, agent overhead depends on load, exporter configuration, and sampling. This lab doesn't measure that, and claiming otherwise would be misleading.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to do with this
&lt;/h2&gt;

&lt;p&gt;If you already have structured logs in production with &lt;code&gt;traceId&lt;/code&gt; and &lt;code&gt;spanId&lt;/code&gt;, the next step isn't replacing anything. It's adding a trace backend and connecting both worlds. The lab shows that Spring Boot 3 auto-instrumentation with the Java Agent is enough for common scenarios, and that manual spans only make sense when you want to name business intent that the agent can't infer.&lt;/p&gt;

&lt;p&gt;If you're evaluating whether the effort is worth it: the case where it's most clearly justified isn't the healthy baseline. It's the mixed scenario or the N+1, where logs give you a number and the trace gives you a shape. The difference between guessing and diagnosing.&lt;/p&gt;

&lt;p&gt;After this lab, my rule is simple: logs tell you what happened; traces help you understand how it happened. If the flat log only gives you total duration, you do not have an explanation yet. You have a clue.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/opentelemetry-spring-boot-logs-vs-traces-diagnosis" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experimentos</category>
      <category>backend</category>
      <category>observabilidad</category>
    </item>
    <item>
      <title>OpenTelemetry en Spring Boot 3: cuando el log dice OK y el trace muestra el problema</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 19:06:09 +0000</pubDate>
      <link>https://dev.to/jtorchia/opentelemetry-en-spring-boot-3-cuando-el-log-dice-ok-y-el-trace-muestra-el-problema-4639</link>
      <guid>https://dev.to/jtorchia/opentelemetry-en-spring-boot-3-cuando-el-log-dice-ok-y-el-trace-muestra-el-problema-4639</guid>
      <description>&lt;p&gt;Hay una pregunta que me hice muchas veces debuggeando sistemas backend: ¿la request tardó porque la DB fue lenta, porque el downstream nos clavó, o porque algún loop interno disparó 60 queries para traer 60 registros? El log dice &lt;code&gt;duration_ms=340&lt;/code&gt; y &lt;code&gt;status=200&lt;/code&gt;. Eso es todo. Empezás a adivinar.&lt;/p&gt;

&lt;p&gt;Ese momento de incertidumbre fue el origen de este laboratorio. No para medir overhead de OpenTelemetry, no para comparar Jaeger contra Tempo, sino para responder algo más concreto: ¿qué señales perdés cuando solo tenés logs buenos, y qué aparece cuando sumás un trace?&lt;/p&gt;

&lt;p&gt;El repo está en &lt;a href="https://github.com/JuanTorchia/opentelemetry-spring-boot-lab" rel="noopener noreferrer"&gt;github.com/JuanTorchia/opentelemetry-spring-boot-lab&lt;/a&gt;, commit &lt;code&gt;c12ea4e848dc431c8bbd324318399172302fe053&lt;/code&gt;, tag &lt;code&gt;editorial-final-diagnosis-comparison-v2&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  El setup: un laboratorio que produce evidencia, no benchmarks
&lt;/h2&gt;

&lt;p&gt;El stack es Spring Boot 3.5.7, Java 21, PostgreSQL 16, OpenTelemetry API 1.43.0, OpenTelemetry Java Agent 2.9.0 y Jaeger all-in-one. Todo levanta con Docker Compose. Para reproducirlo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Smoke rápido con dataset pequeño (1k tasks)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;smoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;small&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Corrida editorial completa (50k tasks, 200 requests, warmup 20, concurrencia 8)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-lab.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Mode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;editorial&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Runs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Requests&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;200&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Warmup&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Concurrency&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;8&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El runner levanta Compose, descarga el agente en &lt;code&gt;tools/&lt;/code&gt;, empaqueta el jar, seedea Postgres con tablas sintéticas (&lt;code&gt;organizations&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;projects&lt;/code&gt;, &lt;code&gt;tasks&lt;/code&gt;, &lt;code&gt;comments&lt;/code&gt;), ejecuta los escenarios, consulta Jaeger por &lt;code&gt;traceId&lt;/code&gt; y regenera los reportes en &lt;code&gt;results/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Jaeger fue elegido por simplicidad local: una imagen, UI web, API REST para consultar traces por &lt;code&gt;traceId&lt;/code&gt;. Tempo también es válido, pero necesita más piezas para una demo editorial local. No es una recomendación de stack productivo.&lt;/p&gt;

&lt;p&gt;El dataset &lt;code&gt;editorial&lt;/code&gt; tiene 50.000 tasks. El &lt;code&gt;small&lt;/code&gt; tiene 1.000. La diferencia importa para que el N+1 produzca fan-out visible y no una diferencia de microsegundos que desaparece en el ruido.&lt;/p&gt;

&lt;h2&gt;
  
  
  La decisión de instrumentación que más me importa
&lt;/h2&gt;

&lt;p&gt;El &lt;code&gt;pom.xml&lt;/code&gt; tiene &lt;code&gt;opentelemetry-api&lt;/code&gt; como dependencia de compilación, pero el agente llega en runtime. Eso significa que HTTP server, HTTP client y JDBC se instrumentan automáticamente sin tocar el código de negocio.&lt;/p&gt;

&lt;p&gt;Los spans manuales se usan solo para etapas de negocio que el agente no puede inferir:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// LabService.java — span manual para marcar intención de negocio&lt;/span&gt;
&lt;span class="nc"&gt;Span&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tracer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;spanBuilder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"business.n_plus_one.load_tasks_then_comments"&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;startSpan&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;ignored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;makeCurrent&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// primero trae tasks, luego hace una query por cada una&lt;/span&gt;
    &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="s"&gt;"select t.id, t.title, u.display_name as assignee from tasks t "&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"join users u on u.id = t.assignee_id order by t.id limit ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Long&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt;&lt;span class="nc"&gt;Number&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="o"&gt;)).&lt;/span&gt;&lt;span class="na"&gt;longValue&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="c1"&gt;// esta query se repite por cada task → fan-out&lt;/span&gt;
        &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jdbcTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;queryForObject&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
            &lt;span class="s"&gt;"select count(*) from comments where task_id = ?"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
            &lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setAttribute&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"lab.n_plus_one.expected_extra_queries"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;enriched&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;end&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Esa mezcla es más honesta para el post: auto-instrumentación para infraestructura, spans manuales para explicar intención. Si hubiera usado solo spans manuales, el lab requeriría código específico de observabilidad en cada capa. Si hubiera confiado solo en el agente, los spans de negocio serían invisibles.&lt;/p&gt;

&lt;p&gt;El &lt;code&gt;logback-spring.xml&lt;/code&gt; inyecta &lt;code&gt;traceId&lt;/code&gt; y &lt;code&gt;spanId&lt;/code&gt; en cada línea de log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- logback-spring.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;%d{yyyy-MM-dd'T'HH:mm:ss.SSSXXX} %-5level traceId=%X{trace_id:-none} spanId=%X{span_id:-none} %logger{36} - %msg%n&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eso es lo que conecta ambos mundos. Un log con &lt;code&gt;traceId&lt;/code&gt; te permite saltar directo al trace en Jaeger. Sin eso, logs y traces son islas.&lt;/p&gt;

&lt;h2&gt;
  
  
  La matriz que resume el diagnóstico
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Escenario&lt;/th&gt;
&lt;th&gt;p95&lt;/th&gt;
&lt;th&gt;Spans promedio&lt;/th&gt;
&lt;th&gt;DB spans promedio&lt;/th&gt;
&lt;th&gt;Error spans/request&lt;/th&gt;
&lt;th&gt;Diagnóstico defendible&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;td&gt;3,04&lt;/td&gt;
&lt;td&gt;1,04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Request sana, sin historia rara.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optimized&lt;/td&gt;
&lt;td&gt;59 ms&lt;/td&gt;
&lt;td&gt;3,04&lt;/td&gt;
&lt;td&gt;1,04&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Misma forma funcional, sin fan-out DB.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;n-plus-one&lt;/td&gt;
&lt;td&gt;209 ms&lt;/td&gt;
&lt;td&gt;63,38&lt;/td&gt;
&lt;td&gt;61,38&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Fan-out DB visible en una sola request.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;downstream-slow&lt;/td&gt;
&lt;td&gt;374 ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;El tiempo se concentra en downstream.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mixed&lt;/td&gt;
&lt;td&gt;395 ms&lt;/td&gt;
&lt;td&gt;7,57&lt;/td&gt;
&lt;td&gt;1,57&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;DB, downstream y transformación compiten.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;partial-error&lt;/td&gt;
&lt;td&gt;184 ms&lt;/td&gt;
&lt;td&gt;6,27&lt;/td&gt;
&lt;td&gt;1,27&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Error downstream dentro de una respuesta parcial.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Esta tabla no intenta coronar una herramienta. Resume qué señales quedan disponibles para diagnosticar. El dato fuerte no es que un número sea universal: es que el N+1 deja una forma muy distinta al caso optimizado, y esa forma no aparece en un log plano sin activar SQL debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lo que revelan los seis escenarios
&lt;/h2&gt;

&lt;p&gt;El laboratorio tiene seis endpoints: &lt;code&gt;baseline&lt;/code&gt;, &lt;code&gt;n-plus-one&lt;/code&gt;, &lt;code&gt;optimized&lt;/code&gt;, &lt;code&gt;downstream-slow&lt;/code&gt;, &lt;code&gt;mixed&lt;/code&gt; y &lt;code&gt;partial-error&lt;/code&gt;. Cada uno produce señales diferentes que el runner consolida en &lt;code&gt;results/comparison.md&lt;/code&gt; y &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;El hallazgo que más me interesa defender:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;N+1 vs optimized&lt;/strong&gt;: ambos devuelven el mismo shape de respuesta. El log de ambos dice &lt;code&gt;status=200&lt;/code&gt;. La diferencia está en el trace: &lt;code&gt;n-plus-one&lt;/code&gt; genera un promedio de &lt;strong&gt;63,38 spans&lt;/strong&gt; por request en la corrida editorial; &lt;code&gt;optimized&lt;/code&gt; genera &lt;strong&gt;3,04&lt;/strong&gt;. Eso no es un claim de performance universal, es una señal diagnóstica. Con solo los logs, sin activar SQL debug, la diferencia es ambigua. Con el trace, el fan-out DB es visible sin configuración extra.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downstream-slow&lt;/strong&gt;: el p95 está en &lt;strong&gt;374 ms&lt;/strong&gt;, muy cerca del delay configurado de 300 ms. Los logs muestran la duración total y el &lt;code&gt;traceId&lt;/code&gt;. Lo que no muestran es dónde se fue ese tiempo: ¿fue DB? ¿fue el downstream? ¿fue transformación en memoria? El trace lo separa: el span HTTP client del downstream domina la jerarquía. La DB local aparece como span secundario de duración baja.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mixed&lt;/strong&gt;: aquí es donde los logs planos fallan más. Tres etapas compiten (DB, downstream, transformación) y ninguna es dominante de forma obvia. El p95 llega a &lt;strong&gt;395 ms&lt;/strong&gt;. El trace muestra la distribución temporal por etapa. El log solo dice que tardó.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partial-error&lt;/strong&gt;: el endpoint responde con HTTP 206 (partial content). El log registra el &lt;code&gt;traceId&lt;/code&gt;, el status y el tipo de error. El trace va más lejos: el span del downstream está marcado con error, anidado bajo una request que técnicamente respondió. Logs y trace no se reemplazan acá, se complementan. El log avisa y permite correlacionar. El trace ubica el error en la jerarquía causal.&lt;/p&gt;

&lt;h2&gt;
  
  
  La captura que cambió el diagnóstico
&lt;/h2&gt;

&lt;p&gt;En Jaeger, &lt;code&gt;n-plus-one&lt;/code&gt; no se ve como una request apenas más lenta. Se ve como una request con fan-out DB: muchos spans repetidos bajo una misma operación de negocio.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrwcrw0nacyetckt1vt9.png" alt="Trace de Jaeger mostrando fan-out DB en el escenario N+1" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;El caso optimizado, en cambio, mantiene una forma compacta. No necesito mirar el código para sospechar que el problema del caso anterior no era "Postgres lento" en abstracto, sino el shape de queries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fldud8cw9ah8pwqq1rbya.png" alt="Trace de Jaeger del escenario optimizado" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;El caso de error parcial también vale por otra razón: la request puede responder, pero el span del downstream queda marcado con error. Ese matiz es justo donde logs y traces se complementan: el log avisa, el trace ubica.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmbz79pyo9mc44a8u34in.png" alt="Trace de Jaeger con error parcial marcado en downstream" width="800" height="611"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  El límite honesto de las métricas
&lt;/h2&gt;

&lt;p&gt;Los campos &lt;code&gt;*_vs_root_pct&lt;/code&gt; en &lt;code&gt;results/diagnosis-comparison.md&lt;/code&gt; son porcentajes acumulados de duración de spans exportados por Jaeger. Pueden superar el 100% cuando hay spans anidados, pares cliente/servidor o solapamiento. El campo &lt;code&gt;duration_denominator_type&lt;/code&gt; indica qué se usó como denominador: &lt;code&gt;root_span&lt;/code&gt;, &lt;code&gt;http_request_span&lt;/code&gt; o &lt;code&gt;largest_observed_span&lt;/code&gt; si la traza quedó ambigua.&lt;/p&gt;

&lt;p&gt;No son overhead. No son distribución exacta del tiempo real de la request. Son señales diagnósticas acumuladas. Usarlas como si fueran porcentajes de CPU sería un error de interpretación que este lab no intenta fomentar.&lt;/p&gt;

&lt;p&gt;De la misma forma, &lt;code&gt;diagnosis_confidence_*&lt;/code&gt; es una clasificación editorial codificada en &lt;code&gt;ScenarioDiagnosis.java&lt;/code&gt;, no una métrica medida automáticamente. Para N+1, &lt;code&gt;diagnosisConfidenceLogs&lt;/code&gt; es &lt;code&gt;low&lt;/code&gt; y &lt;code&gt;diagnosisConfidenceTrace&lt;/code&gt; es &lt;code&gt;high&lt;/code&gt;. Eso refleja que sin SQL debug, el log es ambiguo. No es un benchmark universal de qué herramienta es mejor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mi postura: qué acepto y qué no compro
&lt;/h2&gt;

&lt;p&gt;Acepto que OpenTelemetry con el Java Agent es una forma razonable de agregar visibilidad estructural a una app Spring Boot 3 sin ensuciar el código de negocio. La auto-instrumentación de JDBC y HTTP client funciona bien para escenarios comunes.&lt;/p&gt;

&lt;p&gt;No compro la narrativa de que los traces reemplazan los logs. El &lt;code&gt;RequestCompletionLoggingFilter&lt;/code&gt; del lab es un filtro Servlet que registra cada request completada con escenario, método, path, status y duración. Esos logs son operativamente útiles aunque Jaeger no esté disponible. El &lt;code&gt;traceId&lt;/code&gt; en el log es el puente, no el reemplazo.&lt;/p&gt;

&lt;p&gt;Tampoco compro que Jaeger sea la única opción válida. Se eligió porque levanta con una imagen y tiene UI web lista. Tempo, Zipkin o cualquier backend compatible con OTLP resolverían el mismo problema en este contexto.&lt;/p&gt;

&lt;p&gt;El trade-off honesto es este: la auto-instrumentación reduce trabajo accidental pero agrega un agente en el classpath que exporta datos en background. En un laboratorio local eso es trivial. En producción, el overhead del agente depende de la carga, la configuración del exporter y el sampling. Este lab no mide eso, y sería engañoso afirmar que sí.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qué hacer con esto
&lt;/h2&gt;

&lt;p&gt;Si ya tenés logs estructurados en producción con &lt;code&gt;traceId&lt;/code&gt; y &lt;code&gt;spanId&lt;/code&gt;, el paso siguiente no es reemplazar nada. Es agregar el backend de traces y conectar ambos mundos. El lab muestra que la auto-instrumentación de Spring Boot 3 con el Java Agent es suficiente para los escenarios comunes, y que los spans manuales tienen sentido solo cuando querés nombrar intención de negocio que el agente no puede inferir.&lt;/p&gt;

&lt;p&gt;Si estás evaluando si vale la pena el esfuerzo: el caso donde más claramente lo justifica no es el baseline sano. Es el escenario mixto o el N+1, donde los logs te dan un número y el trace te da una forma. La diferencia entre adivinar y diagnosticar.&lt;/p&gt;

&lt;p&gt;Después de este lab, mi regla queda así: logs para saber qué pasó; traces para entender cómo pasó. Si el log plano te da solo duración total, todavía no tenés una explicación. Tenés una pista.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/opentelemetry-spring-boot-logs-vs-traces-diagnostico" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>backend</category>
    </item>
    <item>
      <title>Prisma vs JDBC: the benchmark that almost made me blame the wrong ORM</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 01:36:53 +0000</pubDate>
      <link>https://dev.to/jtorchia/prisma-vs-jdbc-the-benchmark-that-almost-made-me-blame-the-wrong-orm-585m</link>
      <guid>https://dev.to/jtorchia/prisma-vs-jdbc-the-benchmark-that-almost-made-me-blame-the-wrong-orm-585m</guid>
      <description>&lt;p&gt;There's a discussion that surfaces every time someone posts an ORM benchmark: "of course JDBC is faster, you're measuring the abstraction". They're right, but only halfway. What nobody says is that the abstraction isn't the only culprit — sometimes the culprit is you, because you let an N+1 slip through without noticing.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;prismavsjdbc&lt;/a&gt; to test this in a controlled way. It's not a benchmark about who wins. It's a lab where the same PostgreSQL 16, the same 50k-task dataset, and the same business scenarios run against two stacks: Node.js 24 LTS + TypeScript + Prisma 5 on one side, and Spring Boot 3 + Java 21 LTS + &lt;code&gt;JdbcTemplate&lt;/code&gt; on the other. The analyzed commit is &lt;code&gt;2cd33e32bd29a1d4b46a26af0b56d6a912f5e4f5&lt;/code&gt;, tag &lt;code&gt;best-effort-editorial-final&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The thesis I'm defending is this: &lt;strong&gt;query shape, SQL/request, and N+1 explain more than the slogan "ORM vs raw SQL"&lt;/strong&gt;. When you optimize the shape, both stacks improve. When you don't, both stacks charge you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem that almost made me draw the wrong conclusion
&lt;/h2&gt;

&lt;p&gt;The first version of the lab had an obvious trap, even though I didn't see it at first. It compared the most comfortable Prisma implementation — using &lt;code&gt;include&lt;/code&gt; to fetch relations — against a manual join in JDBC. The result was predictable: JDBC measured 1 SQL/request, idiomatic Prisma measured 4 SQL/request on &lt;code&gt;read-by-id&lt;/code&gt;, and latency reflected that.&lt;/p&gt;

&lt;p&gt;Incorrect conclusion I almost published: "Prisma is slower because it emits more queries".&lt;/p&gt;

&lt;p&gt;Correct conclusion: I was comparing different shapes. Prisma's &lt;code&gt;include&lt;/code&gt; fires separate queries per relation — that's not a bug, it's the documented contract of the API. JDBC did a join because I wrote it that way. It's not fair to compare them without acknowledging that.&lt;/p&gt;

&lt;p&gt;That friction changed the entire lab design: I needed three levels within each stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three levels: naive, idiomatic, best-effort
&lt;/h2&gt;

&lt;p&gt;Adding the &lt;code&gt;level&lt;/code&gt; column to &lt;code&gt;results/comparison.csv&lt;/code&gt; was the most important decision in the project. Without it, any results table is a trap for the reader.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;naive&lt;/strong&gt;: the most direct implementation possible, with no thought given to performance. In both stacks, this includes deliberate N+1 — per-task queries inside a loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;idiomatic&lt;/strong&gt;: the normal, maintainable way to write code in each stack. Prisma with &lt;code&gt;include&lt;/code&gt; and &lt;code&gt;_count&lt;/code&gt;, JDBC with the join any Java dev would write without obsessing over micro-optimizations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;best-effort&lt;/strong&gt;: the tightest code the team would accept without it becoming a hack. For Prisma, this means dropping to &lt;code&gt;$queryRaw&lt;/code&gt; when the shape is aggregational.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;read-by-id&lt;/code&gt; scenario with idiomatic Prisma measured 4 SQL/request due to &lt;code&gt;include&lt;/code&gt;. The &lt;code&gt;read-by-id-best-effort&lt;/code&gt; variant with &lt;code&gt;$queryRaw&lt;/code&gt; dropped to 1 SQL/request — the same join JDBC uses. The PostgreSQL plan for that query is clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- read-by-id-best-effort: same SQL in Prisma $queryRaw and in JdbcTemplate&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeName"&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;organizations&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;organization_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assignee_id&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'00000000-0000-4000-0100-000000000001'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Execution Time: 0.242 ms, Buffers: shared hit=9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Prisma and JDBC emit the same SQL, the PostgreSQL plan is identical. That closes the runtime debate: the bottleneck was the shape, not the client.&lt;/p&gt;

&lt;h2&gt;
  
  
  N+1 is the usual villain, but the lab shows it with numbers
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;n-plus-one-trap&lt;/code&gt; scenario exists to make explicit something every developer knows in theory but underestimates in practice. The naive level in both stacks fires individual queries per task — on a 50k-task dataset with concurrency 16, that scales brutally.&lt;/p&gt;

&lt;p&gt;The biggest jump in the lab wasn't between Prisma and JDBC. It was between naive and idiomatic within Prisma. When you go from N+1 to &lt;code&gt;include/_count&lt;/code&gt;, the reduction in SQL/request is immediate and visible in latency. After that, if you want to squeeze more, &lt;code&gt;$queryRaw&lt;/code&gt; gives you another jump — but smaller than the first.&lt;/p&gt;

&lt;p&gt;The interesting part on the Java side is that &lt;code&gt;CountingJdbc&lt;/code&gt; — the wrapper over &lt;code&gt;JdbcTemplate&lt;/code&gt; in &lt;code&gt;apps/jdbc-service/src/main/java/com/example/jdbclab/CountingJdbc.java&lt;/code&gt; — uses an &lt;code&gt;AtomicLong&lt;/code&gt; to count queries. That allows an objective SQL/request comparison without relying on logs or &lt;code&gt;pg_stat_statements&lt;/code&gt; as the primary source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CountingJdbc.java — instrumentation with no magic, easy to audit&lt;/span&gt;
&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CountingJdbc&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JdbcTemplate&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RowMapper&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// each call to the wrapper adds 1 to the counter&lt;/span&gt;
    &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementAndGet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On the Prisma side, the equivalent lives in &lt;code&gt;apps/prisma-client/src/db.ts&lt;/code&gt;: it hooks into the client's &lt;code&gt;query&lt;/code&gt; event to count. That symmetry in instrumentation is what makes the SQL/request numbers comparable across stacks.&lt;/p&gt;

&lt;h2&gt;
  
  
  When $queryRaw makes sense and when it's a surrender
&lt;/h2&gt;

&lt;p&gt;This is the part where a lot of Prisma posts aren't honest. &lt;code&gt;$queryRaw&lt;/code&gt; exists and is valid, but using it for everything is admitting you don't want to use Prisma — you're using PostgreSQL with a fancy TypeScript client.&lt;/p&gt;

&lt;p&gt;The decision in the lab was clear: best-effort with &lt;code&gt;$queryRaw&lt;/code&gt; makes sense in &lt;code&gt;relation-summary&lt;/code&gt; and &lt;code&gt;report-aggregation&lt;/code&gt; because the shape is genuinely aggregational. Prisma &lt;code&gt;groupBy&lt;/code&gt; doesn't cleanly express &lt;code&gt;date_trunc&lt;/code&gt; + join by organization, and forcing it would be worse than writing SQL.&lt;/p&gt;

&lt;p&gt;By contrast, &lt;code&gt;paginated-list&lt;/code&gt; has no best-effort variant because idiomatic Prisma already emits 1 SQL/request with &lt;code&gt;findMany&lt;/code&gt; and filters. Adding &lt;code&gt;$queryRaw&lt;/code&gt; there wouldn't change anything meaningful — it would be complexity with no benefit.&lt;/p&gt;

&lt;p&gt;The table in &lt;code&gt;docs/brief-post.md&lt;/code&gt; models this well: the &lt;code&gt;level&lt;/code&gt; column isn't a scale of "how much effort you put in" but of "how much the SQL shape changes when you apply the variant".&lt;/p&gt;

&lt;h2&gt;
  
  
  What the lab can't guarantee
&lt;/h2&gt;

&lt;p&gt;The HTTP runner is homegrown — not k6 or wrk. The hardware is local. Docker Desktop, GC, plan cache, and indexes can shift absolute latencies between runs. The editorial run used 3 runs, 300 requests per run, 30 warmup requests, concurrency 16, and a 50k-task dataset — but those numbers on different hardware can produce different results.&lt;/p&gt;

&lt;p&gt;The version matrix (&lt;code&gt;docs/java-version-matrix.md&lt;/code&gt;) shows Java 21 vs Java 25: there are differences, but the main argument — that N+1 and SQL/request dominate — holds on both JVMs. Java 25 improved &lt;code&gt;read-by-id&lt;/code&gt; by ~20% over Java 21 in the local run, but that doesn't change the fact that the problem in &lt;code&gt;relation-summary-naive&lt;/code&gt; was the shape, not the JVM.&lt;/p&gt;

&lt;p&gt;I wouldn't publish those absolute numbers as universal truth. I publish them as evidence of a pattern: when you change the shape, the delta is orders of magnitude larger than when you change the runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The position I landed on
&lt;/h2&gt;

&lt;p&gt;Prisma is not slow. Prisma with &lt;code&gt;include&lt;/code&gt; emitting 4 queries where you could emit 1 is an ergonomics trade-off with an observable cost — and that cost is worth it for most endpoints in an API that isn't under extreme pressure. When shape genuinely matters, &lt;code&gt;$queryRaw&lt;/code&gt; exists and works well.&lt;/p&gt;

&lt;p&gt;JDBC with &lt;code&gt;JdbcTemplate&lt;/code&gt; is not superior just because it's raw SQL. It's predictable because the developer controls the shape from the start. The risk is on the other side: that nobody checks whether those Java loops are also doing N+1 without an ORM to blame.&lt;/p&gt;

&lt;p&gt;The lab is reproducible. If you have Docker, Node 24 LTS, and Java 21 or 25, you can run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# full editorial run — Bash&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; editorial &lt;span class="nt"&gt;--size&lt;/span&gt; editorial &lt;span class="nt"&gt;--runs&lt;/span&gt; 3 &lt;span class="nt"&gt;--requests&lt;/span&gt; 300 &lt;span class="nt"&gt;--warmup&lt;/span&gt; 30 &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if you just want to verify the scenarios run without errors before committing time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# quick smoke test to validate the setup&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; smoke &lt;span class="nt"&gt;--size&lt;/span&gt; small
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code is at &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;github.com/JuanTorchia/prismavsjdbc&lt;/a&gt;. Editorial results are in &lt;code&gt;results/comparison.csv&lt;/code&gt; and &lt;code&gt;results/comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I'd like to know: in the stack you're using right now, do you have real visibility into the SQL/request count for each endpoint? Or do you assume the ORM handles it on its own?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experiments</category>
      <category>typescript</category>
      <category>performance</category>
    </item>
    <item>
      <title>Prisma vs JDBC: el benchmark que casi me hace culpar al ORM equivocado</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Sat, 16 May 2026 01:36:44 +0000</pubDate>
      <link>https://dev.to/jtorchia/prisma-vs-jdbc-el-benchmark-que-casi-me-hace-culpar-al-orm-equivocado-12hc</link>
      <guid>https://dev.to/jtorchia/prisma-vs-jdbc-el-benchmark-que-casi-me-hace-culpar-al-orm-equivocado-12hc</guid>
      <description>&lt;p&gt;Hay una discusión que aparece cada vez que alguien postea un benchmark de ORM: "claro que JDBC es más rápido, estás midiendo la abstracción". Y tienen razón, pero solo a medias. Lo que nadie dice es que la abstracción no es el único culpable — a veces el culpable sos vos, que dejaste pasar un N+1 sin darte cuenta.&lt;/p&gt;

&lt;p&gt;Armé &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;prismavsjdbc&lt;/a&gt; para probar esto de forma controlada. No es un benchmark de quién gana. Es un laboratorio donde el mismo PostgreSQL 16, el mismo dataset de 50k tasks y los mismos casos de negocio corren contra dos stacks: Node.js 24 LTS + TypeScript + Prisma 5 por un lado, y Spring Boot 3 + Java 21 LTS + &lt;code&gt;JdbcTemplate&lt;/code&gt; por el otro. El commit analizado es &lt;code&gt;2cd33e32bd29a1d4b46a26af0b56d6a912f5e4f5&lt;/code&gt;, tag &lt;code&gt;best-effort-editorial-final&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;La tesis que defiendo es esta: &lt;strong&gt;query shape, SQL/request y N+1 explican más que el slogan "ORM vs SQL directo"&lt;/strong&gt;. Cuando optimizás el shape, los dos stacks mejoran. Cuando no, los dos te cobran.&lt;/p&gt;

&lt;h2&gt;
  
  
  El problema que casi me hace concluir mal
&lt;/h2&gt;

&lt;p&gt;La primera versión del laboratorio tenía una trampa obvia, aunque no la vi al principio. Comparaba la implementación más cómoda de Prisma — usando &lt;code&gt;include&lt;/code&gt; para traer relaciones — contra un join manual en JDBC. El resultado era predecible: JDBC medía 1 SQL/request, Prisma idiomatic medía 4 SQL/request en &lt;code&gt;read-by-id&lt;/code&gt;, y la latencia lo reflejaba.&lt;/p&gt;

&lt;p&gt;Conclusión incorrecta que casi publico: "Prisma es más lento porque emite más queries".&lt;/p&gt;

&lt;p&gt;Conclusión correcta: estaba comparando shapes distintos. El &lt;code&gt;include&lt;/code&gt; de Prisma hace queries separadas por relación — no es un bug, es el contrato documentado de la API. JDBC hacía un join porque yo lo escribí así. No es fair compararlos sin reconocerlo.&lt;/p&gt;

&lt;p&gt;Esa es la fricción que cambió todo el diseño del lab: necesitaba tres niveles dentro de cada stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tres niveles: naive, idiomatic, best-effort
&lt;/h2&gt;

&lt;p&gt;Agregar la columna &lt;code&gt;level&lt;/code&gt; al &lt;code&gt;results/comparison.csv&lt;/code&gt; fue la decisión más importante del proyecto. Sin ella, cualquier tabla de resultados es una trampa para el lector.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;naive&lt;/strong&gt;: la implementación más directa posible, sin pensar en performance. En ambos stacks, esto incluye N+1 deliberado — consultas por task dentro de un loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;idiomatic&lt;/strong&gt;: la forma normal y mantenible de escribir el código en cada stack. Prisma con &lt;code&gt;include&lt;/code&gt; y &lt;code&gt;_count&lt;/code&gt;, JDBC con el join que escribiría cualquier dev Java sin obsesionarse con micro-optimizaciones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;best-effort&lt;/strong&gt;: el código más ajustado que acepta el equipo sin convertirse en un hack. Para Prisma, esto significa bajar a &lt;code&gt;$queryRaw&lt;/code&gt; cuando el shape es agregacional.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;El escenario &lt;code&gt;read-by-id&lt;/code&gt; con Prisma idiomatic midió 4 SQL/request por el &lt;code&gt;include&lt;/code&gt;. La variante &lt;code&gt;read-by-id-best-effort&lt;/code&gt; con &lt;code&gt;$queryRaw&lt;/code&gt; bajó a 1 SQL/request — el mismo join que usa JDBC. El plan de PostgreSQL para ese query es limpio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- read-by-id-best-effort: mismo SQL en Prisma $queryRaw y en JdbcTemplate&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"projectName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"organizationName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;"assigneeName"&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;projects&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;organizations&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;organization_id&lt;/span&gt;
&lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;assignee_id&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'00000000-0000-4000-0100-000000000001'&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Execution Time: 0.242 ms, Buffers: shared hit=9&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cuando Prisma y JDBC emiten el mismo SQL, el plan de PostgreSQL es idéntico. Eso cierra la discusión del runtime: el cuello de botella era el shape, no el cliente.&lt;/p&gt;

&lt;h2&gt;
  
  
  El N+1 es el villano de siempre, pero el lab lo muestra con números
&lt;/h2&gt;

&lt;p&gt;El escenario &lt;code&gt;n-plus-one-trap&lt;/code&gt; existe para hacer explícito algo que cualquier desarrollador sabe en teoría pero subestima en práctica. El nivel naive en ambos stacks hace consultas individuales por task — en un dataset de 50k tasks con concurrencia 16, eso escala de manera brutal.&lt;/p&gt;

&lt;p&gt;El salto más importante en el lab no fue entre Prisma y JDBC. Fue entre naive e idiomatic dentro de Prisma. Cuando pasás de N+1 a &lt;code&gt;include/_count&lt;/code&gt;, la reducción de SQL/request es inmediata y visible en la latencia. Después, si querés apretarlo más, &lt;code&gt;$queryRaw&lt;/code&gt; te da otro salto — pero menor que el primero.&lt;/p&gt;

&lt;p&gt;Lo interesante del lado Java es que &lt;code&gt;CountingJdbc&lt;/code&gt; — el wrapper sobre &lt;code&gt;JdbcTemplate&lt;/code&gt; que está en &lt;code&gt;apps/jdbc-service/src/main/java/com/example/jdbclab/CountingJdbc.java&lt;/code&gt; — usa un &lt;code&gt;AtomicLong&lt;/code&gt; para contar queries. Eso permite comparar SQL/request de forma objetiva sin depender de logs ni de &lt;code&gt;pg_stat_statements&lt;/code&gt; como fuente principal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CountingJdbc.java — instrumentación sin magia, fácil de auditar&lt;/span&gt;
&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CountingJdbc&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JdbcTemplate&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AtomicLong&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;RowMapper&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// cada llamada al wrapper suma 1 al contador&lt;/span&gt;
    &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;incrementAndGet&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;jdbc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mapper&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;queryCount&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Del lado de Prisma, el equivalente está en &lt;code&gt;apps/prisma-client/src/db.ts&lt;/code&gt;: se engancha al evento &lt;code&gt;query&lt;/code&gt; del cliente para contar. Esa simetría en la instrumentación es lo que hace que los números de SQL/request sean comparables entre stacks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cuándo $queryRaw tiene sentido y cuándo es una rendición
&lt;/h2&gt;

&lt;p&gt;Esta es la parte donde muchos posts sobre Prisma no son honestos. &lt;code&gt;$queryRaw&lt;/code&gt; existe y es válido, pero usarlo para todo es admitir que no querés usar Prisma — estás usando PostgreSQL con un cliente TypeScript de lujo.&lt;/p&gt;

&lt;p&gt;La decisión en el lab fue clara: best-effort con &lt;code&gt;$queryRaw&lt;/code&gt; tiene sentido en &lt;code&gt;relation-summary&lt;/code&gt; y &lt;code&gt;report-aggregation&lt;/code&gt; porque el shape es genuinamente agregacional. Prisma &lt;code&gt;groupBy&lt;/code&gt; no expresa limpiamente &lt;code&gt;date_trunc&lt;/code&gt; + join por organization, y forzarlo sería peor que escribir SQL.&lt;/p&gt;

&lt;p&gt;En cambio, &lt;code&gt;paginated-list&lt;/code&gt; no tiene variante best-effort porque Prisma idiomatic ya emite 1 SQL/request con &lt;code&gt;findMany&lt;/code&gt; y filtros. Agregar &lt;code&gt;$queryRaw&lt;/code&gt; ahí no cambiaría nada relevante — sería complejidad sin beneficio.&lt;/p&gt;

&lt;p&gt;La tabla en &lt;code&gt;docs/brief-post.md&lt;/code&gt; lo modela bien: la columna &lt;code&gt;level&lt;/code&gt; no es una escala de "cuánto esfuerzo pusiste" sino de "cuánto cambia el shape SQL cuando aplicás la variante".&lt;/p&gt;

&lt;h2&gt;
  
  
  Lo que el lab no puede garantizar
&lt;/h2&gt;

&lt;p&gt;El runner HTTP es propio — no es k6 ni wrk. El hardware es local. Docker Desktop, GC, plan cache e índices pueden mover las latencias absolutas entre corridas. La corrida editorial usó 3 runs, 300 requests por run, warmup de 30, concurrencia 16 y dataset de 50k tasks, pero esos números en otro hardware pueden dar resultados distintos.&lt;/p&gt;

&lt;p&gt;La matriz de versiones (&lt;code&gt;docs/java-version-matrix.md&lt;/code&gt;) muestra Java 21 vs Java 25: hay diferencias, pero el argumento principal — que N+1 y SQL/request dominan — se mantiene en ambas JVMs. Java 25 mejoró &lt;code&gt;read-by-id&lt;/code&gt; un ~20% sobre Java 21 en la corrida local, pero eso no cambia que el problema en &lt;code&gt;relation-summary-naive&lt;/code&gt; era el shape, no la JVM.&lt;/p&gt;

&lt;p&gt;No publicaría esos números absolutos como verdad universal. Los publico como evidencia de un patrón: cuando cambiás el shape, el delta es órdenes de magnitud mayor que cuando cambiás el runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  La postura que me quedé
&lt;/h2&gt;

&lt;p&gt;Prisma no es lento. Prisma con &lt;code&gt;include&lt;/code&gt; que emite 4 queries donde podrías emitir 1 es una decisión de ergonomía que tiene un costo observable — y ese costo vale la pena en la mayoría de los endpoints de una API que no está bajo presión extrema. Cuando el shape importa de verdad, &lt;code&gt;$queryRaw&lt;/code&gt; existe y funciona bien.&lt;/p&gt;

&lt;p&gt;JDBC con &lt;code&gt;JdbcTemplate&lt;/code&gt; no es superior por ser SQL directo. Es predecible porque el desarrollador controla el shape desde el primer momento. El riesgo está en el lado opuesto: que nadie revise si esos loops en Java también están haciendo N+1 sin que el ORM sea el chivo expiatorio.&lt;/p&gt;

&lt;p&gt;El lab es reproducible. Si tenés Docker, Node 24 LTS y Java 21 o 25, podés correrlo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# corrida editorial completa — Bash&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; editorial &lt;span class="nt"&gt;--size&lt;/span&gt; editorial &lt;span class="nt"&gt;--runs&lt;/span&gt; 3 &lt;span class="nt"&gt;--requests&lt;/span&gt; 300 &lt;span class="nt"&gt;--warmup&lt;/span&gt; 30 &lt;span class="nt"&gt;--concurrency&lt;/span&gt; 16
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Y si querés solo verificar que los escenarios corren sin errores antes de comprometer tiempo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# smoke rápido para validar el setup&lt;/span&gt;
bash scripts/run-lab.sh &lt;span class="nt"&gt;--mode&lt;/span&gt; smoke &lt;span class="nt"&gt;--size&lt;/span&gt; small
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El código está en &lt;a href="https://github.com/JuanTorchia/prismavsjdbc" rel="noopener noreferrer"&gt;github.com/JuanTorchia/prismavsjdbc&lt;/a&gt;. Los resultados editoriales están en &lt;code&gt;results/comparison.csv&lt;/code&gt; y &lt;code&gt;results/comparison.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Lo que me gustaría saber: en el stack que usás ahora mismo, ¿tenés visibilidad real del SQL/request de cada endpoint? ¿O asumís que el ORM lo resuelve solo?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este articulo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/prisma-vs-jdbc-benchmark-query-shape-n1" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>typescript</category>
    </item>
    <item>
      <title>Retry isn't free: budget, amplification, and the cost that never shows up in p95</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 15:55:35 +0000</pubDate>
      <link>https://dev.to/jtorchia/retry-isnt-free-budget-amplification-and-the-cost-that-never-shows-up-in-p95-ae9</link>
      <guid>https://dev.to/jtorchia/retry-isnt-free-budget-amplification-and-the-cost-that-never-shows-up-in-p95-ae9</guid>
      <description>&lt;p&gt;There's a decision I've gotten wrong more than once: adding retry as if it were a free improvement. Configure three attempts with exponential backoff, the system looks more stable on the dashboard, done. What I wasn't watching was how many extra calls I was sending to the downstream on every failure.&lt;/p&gt;

&lt;p&gt;This post comes from an experiment I built to measure exactly that: when retry buys real availability, when it multiplies pressure, and when it simply changes nothing because the problem isn't transient. The repo is &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;, commit &lt;code&gt;bdfc350&lt;/code&gt;, with Spring Boot 3.3.5, Java 21, Resilience4j 2.2.0, and k6 as the load generator.&lt;/p&gt;

&lt;p&gt;My thesis is simple: retry is budget. Each extra attempt consumes user wait time, hits the real downstream, and can accelerate a degradation that was already in progress. It's not a feature you flip on and call it done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with only looking at success rate
&lt;/h2&gt;

&lt;p&gt;When the downstream has simulated random failures at 35%, the difference between policies is visible. With &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, the success rate in that run was &lt;code&gt;0.6529&lt;/code&gt;. With &lt;code&gt;immediate-retry&lt;/code&gt;, it climbed to &lt;code&gt;0.955&lt;/code&gt;. That looks like a clear win.&lt;/p&gt;

&lt;p&gt;But the number that matters is right next to it: &lt;code&gt;retry_amplification_factor&lt;/code&gt;. With &lt;code&gt;immediate-retry&lt;/code&gt; on &lt;code&gt;random-failures&lt;/code&gt; it reached &lt;code&gt;1.465&lt;/code&gt;. That means for every user request, the system made 1.465 real calls to the downstream. In &lt;code&gt;jitter-random-failures&lt;/code&gt; it was &lt;code&gt;1.471&lt;/code&gt;. The downstream received almost 47% more traffic than k6 generated.&lt;/p&gt;

&lt;p&gt;For transient failures that might be acceptable. The downstream is failing for external reasons, retries land at different moments, and the outcome improves. But that 47% extra isn't abstract: downstream capacity has to exist to absorb it. If the service is already at its limit, that overhead is the nudge that tips it over.&lt;/p&gt;

&lt;p&gt;The metric the repo defines as a contract for not fooling yourself is exactly that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MetricSnapshot.java — this line exists to prevent self-deception&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;retryAmplificationFactor&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// downstream_calls / total_requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you only look at &lt;code&gt;successRate&lt;/code&gt; and &lt;code&gt;errorRate&lt;/code&gt;, you can believe you won when you actually pushed 47% more load onto a system that was already struggling.&lt;/p&gt;

&lt;h2&gt;
  
  
  progressive-degradation: where retry can accelerate the collapse
&lt;/h2&gt;

&lt;p&gt;This scenario is the most interesting one methodologically, and also the one with the most important warning.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;PROGRESSIVE_DEGRADATION&lt;/code&gt; downstream implements this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DownstreamScenario.java — delay grows with each real call received&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="no"&gt;PROGRESSIVE_DEGRADATION&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;callNumber&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The delay isn't external or fixed: it grows with &lt;code&gt;callNumber&lt;/code&gt;, which is the counter of real calls to the downstream. That means a policy with more retries generates more calls, and those calls accelerate the degradation. It's not the same failure for everyone: policies with retry degrade faster because they push harder.&lt;/p&gt;

&lt;p&gt;The numbers from the run show this clearly. With &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, &lt;code&gt;7720&lt;/code&gt; total requests were processed and &lt;code&gt;7720&lt;/code&gt; downstream calls were initiated. With &lt;code&gt;immediate-retry&lt;/code&gt;, total requests dropped to &lt;code&gt;2939&lt;/code&gt; but downstream calls went up to &lt;code&gt;8699&lt;/code&gt;, with an amplification factor of &lt;code&gt;2.96&lt;/code&gt;. The retry policy processed fewer user requests but made more downstream calls.&lt;/p&gt;

&lt;p&gt;To be clear: this isn't a design flaw, it's the point of the experiment. The lab documents it explicitly in &lt;code&gt;docs/brief-post.md&lt;/code&gt;: &lt;code&gt;progressive-degradation&lt;/code&gt; should be read as load-sensitive degradation, not as an identical external failure for all policies. If you treat it as a direct comparison between policies under the same conditions, the conclusion is framed wrong from the start.&lt;/p&gt;

&lt;p&gt;What you can conclude: in scenarios where the degradation rate depends on the volume of calls received, retries can be an accelerant. That has a name in production: retry storm. And the lab reproduces it in a controlled way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The percentiles that lie to you when there are timeouts
&lt;/h2&gt;

&lt;p&gt;There's a technical detail that changed how I read the results, and the README documents it honestly.&lt;/p&gt;

&lt;p&gt;The caller timeout is implemented with &lt;code&gt;future.cancel(true)&lt;/code&gt; in the &lt;code&gt;RetryExecutor&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// RetryExecutor.java — cancel(true) interrupts the attempt from the caller side&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toMillis&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MILLISECONDS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TimeoutException&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;cancel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an attempt exceeds the timeout, the latency recorded for that attempt is capped by the caller timeout: &lt;code&gt;STANDARD_TIMEOUT = Duration.ofMillis(260)&lt;/code&gt;. That's why in &lt;code&gt;progressive-degradation&lt;/code&gt; almost all &lt;code&gt;all_attempt_p95_ms&lt;/code&gt; and &lt;code&gt;all_attempt_p99_ms&lt;/code&gt; values show exactly &lt;code&gt;260&lt;/code&gt;. It's not that the downstream responded in 260 ms: it's that the caller stopped waiting at 260 ms and recorded that as the attempt latency.&lt;/p&gt;

&lt;p&gt;What happens after the &lt;code&gt;cancel(true)&lt;/code&gt; in the simulated downstream isn't fully modeled. In a real system with HTTP, a database, or a queue, the downstream may keep executing work even after the client has given up. The lab counts initiated calls but can't guarantee there's no residual work post-cancellation.&lt;/p&gt;

&lt;p&gt;This also matters for reading &lt;code&gt;successful_requests_per_second&lt;/code&gt;. The value of &lt;code&gt;0.95&lt;/code&gt; that appears across several &lt;code&gt;progressive-degradation&lt;/code&gt; scenarios isn't the system's maximum capacity: it's the useful work observed under that closed k6 load. With a different VU configuration, a different duration, or a real network, the numbers would differ.&lt;/p&gt;

&lt;h2&gt;
  
  
  circuit-breaker and bulkhead: visible rejections as a protection signal
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;progressive-degradation&lt;/code&gt;, the circuit breaker produces something that looks contradictory at first glance. The &lt;code&gt;13-circuit-breaker-progressive-degradation&lt;/code&gt; run has &lt;code&gt;total_requests = 44777&lt;/code&gt; and &lt;code&gt;circuit_breaker_rejected = 44718&lt;/code&gt;. The error rate is &lt;code&gt;0.9987&lt;/code&gt;. That looks catastrophic.&lt;/p&gt;

&lt;p&gt;But look at the downstream calls: &lt;code&gt;198&lt;/code&gt;. Amplification factor: &lt;code&gt;0.004&lt;/code&gt;. The circuit breaker almost completely stopped sending calls to the downstream. The rejections are visible to the client, but the downstream is protected.&lt;/p&gt;

&lt;p&gt;Compare that with &lt;code&gt;immediate-retry-progressive-degradation&lt;/code&gt;, which has &lt;code&gt;downstream_calls = 8699&lt;/code&gt; and keeps failing at the same rate, and the trade-off becomes obvious. The circuit breaker chooses to reject fast rather than multiply pressure on something that can no longer respond.&lt;/p&gt;

&lt;p&gt;The bulkhead in the same run shows a different variant: &lt;code&gt;bulkhead_rejected = 22122&lt;/code&gt; with &lt;code&gt;downstream_calls = 3668&lt;/code&gt;. It limits concurrency instead of opening the circuit, but the effect is similar: it reduces downstream pressure at the cost of visible rejections.&lt;/p&gt;

&lt;p&gt;Those concurrency signals (&lt;code&gt;max_inflight_downstream = 16&lt;/code&gt; for bulkhead, &lt;code&gt;40&lt;/code&gt; for most other runs) are observations, not proof of saturation. The lab renamed the metric from &lt;code&gt;saturationObservation&lt;/code&gt; to &lt;code&gt;concurrencyObservation&lt;/code&gt; for exactly that reason: high &lt;code&gt;max_inflight&lt;/code&gt; doesn't prove CPU, network, or connection pool saturation. It's a signal that invites investigation, not a conclusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I conclude and what I don't
&lt;/h2&gt;

&lt;p&gt;This experiment is a local simulation, a single published run, against a simulated downstream with in-memory delays. The numbers don't represent production, don't represent any real provider, and don't support claiming "this policy scales to X RPS". If you want to publish exact values with strong claims, the README says it clearly: run at least three &lt;code&gt;editorial&lt;/code&gt; runs and look for consistency, not a single pass.&lt;/p&gt;

&lt;p&gt;What I think can be sustained:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In transient failures, retry can improve success rate but always has an amplification factor greater than 1. That overhead exists and has to fit within the system.&lt;/li&gt;
&lt;li&gt;In load-sensitive degradation, more retries can accelerate the degradation because they generate more calls. This isn't universal, but the scenario is real and the experiment reproduces it.&lt;/li&gt;
&lt;li&gt;p95 and p99 of attempts don't tell you the real downstream latency when there are timeouts: they tell you how long the caller waited before giving up.&lt;/li&gt;
&lt;li&gt;Circuit breaker and bulkhead produce visible rejections that can be exactly the right decision to protect the system.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I don't conclude: that one policy is better than another in the abstract, that these numbers apply to a different system, or that &lt;code&gt;max_inflight_downstream&lt;/code&gt; proves saturation.&lt;/p&gt;

&lt;p&gt;The question I'm leaving open for further exploration: how much real residual work actually remains in the downstream after a &lt;code&gt;future.cancel(true)&lt;/code&gt; in a system with an HTTP connection pool? The lab notes it as a known limitation. In production that's exactly where the difference lies between a timeout that protects and one that only hides the problem.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;github.com/JuanTorchia/retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;. If you run it and get different numbers, I want to know.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>experiments</category>
      <category>backend</category>
      <category>arquitectura</category>
    </item>
    <item>
      <title>Retry no es gratis: presupuesto, amplificación y el costo que no aparece en el p95</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 15:55:26 +0000</pubDate>
      <link>https://dev.to/jtorchia/retry-no-es-gratis-presupuesto-amplificacion-y-el-costo-que-no-aparece-en-el-p95-22no</link>
      <guid>https://dev.to/jtorchia/retry-no-es-gratis-presupuesto-amplificacion-y-el-costo-que-no-aparece-en-el-p95-22no</guid>
      <description>&lt;p&gt;Hay una decisión que tomé mal más de una vez: agregar retry como si fuera una mejora sin costo. Configuro tres intentos con backoff exponencial, el sistema se ve más estable en el dashboard, y listo. Lo que no estaba mirando era cuántas llamadas extra le estaba mandando al downstream en cada falla.&lt;/p&gt;

&lt;p&gt;Este post nace de un experimento que armé para medir eso con precisión: cuándo retry compra disponibilidad real, cuándo multiplica presión y cuándo simplemente no cambia nada porque el problema no es transitorio. El repo es &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;, commit &lt;code&gt;bdfc350&lt;/code&gt;, con Spring Boot 3.3.5, Java 21, Resilience4j 2.2.0 y k6 como generador de carga.&lt;/p&gt;

&lt;p&gt;Mi tesis es simple: retry es presupuesto. Cada intento extra consume tiempo de espera del usuario, llama al downstream real y puede acelerar una degradación que ya estaba en curso. No es una feature que activás y listo.&lt;/p&gt;

&lt;h2&gt;
  
  
  El problema de mirar solo el success rate
&lt;/h2&gt;

&lt;p&gt;Cuando el downstream tiene fallas aleatorias simuladas al 35%, la diferencia entre políticas es visible. Con &lt;code&gt;no-retry-standard-timeout&lt;/code&gt;, el success rate en esa corrida fue &lt;code&gt;0.6529&lt;/code&gt;. Con &lt;code&gt;immediate-retry&lt;/code&gt;, subió a &lt;code&gt;0.955&lt;/code&gt;. Eso parece una victoria clara.&lt;/p&gt;

&lt;p&gt;Pero el número que importa está al lado: el &lt;code&gt;retry_amplification_factor&lt;/code&gt;. Con &lt;code&gt;immediate-retry&lt;/code&gt; en &lt;code&gt;random-failures&lt;/code&gt; llegó a &lt;code&gt;1.465&lt;/code&gt;. Eso significa que por cada request del usuario, el sistema hizo 1.465 llamadas reales al downstream. En &lt;code&gt;jitter-random-failures&lt;/code&gt; fue &lt;code&gt;1.471&lt;/code&gt;. El downstream recibió casi un 47% más de tráfico del que generó k6.&lt;/p&gt;

&lt;p&gt;En fallas transitorias eso puede ser aceptable. El downstream está fallando por razones externas, los reintentos aterrizan en momentos distintos y el resultado mejora. Pero ese 47% extra no es abstracto: tiene que existir capacidad downstream para absorberlo. Si el servicio ya está al límite, ese overhead es el empujón que lo tira.&lt;/p&gt;

&lt;p&gt;La métrica que el repo define como contrato para no engañarse es exactamente esa:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// MetricSnapshot.java — la razón de esta línea es evitar autoengaño&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt; &lt;span class="n"&gt;retryAmplificationFactor&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// downstream_calls / total_requests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si solo mirás &lt;code&gt;successRate&lt;/code&gt; y &lt;code&gt;errorRate&lt;/code&gt;, podés creer que ganaste cuando en realidad le metiste 47% más de carga a un sistema que ya estaba sufriendo.&lt;/p&gt;

&lt;h2&gt;
  
  
  progressive-degradation: donde el retry puede acelerar la caída
&lt;/h2&gt;

&lt;p&gt;Este escenario es el más interesante metodológicamente, y también el que tiene la advertencia más importante.&lt;/p&gt;

&lt;p&gt;El downstream de &lt;code&gt;PROGRESSIVE_DEGRADATION&lt;/code&gt; implementa esto:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DownstreamScenario.java — el delay sube con cada llamada real recibida&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="no"&gt;PROGRESSIVE_DEGRADATION&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;min&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;900&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;callNumber&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El delay no es externo ni fijo: crece con &lt;code&gt;callNumber&lt;/code&gt;, que es el contador de llamadas reales al downstream. Esto significa que una política con más retries genera más llamadas, y esas llamadas aceleran la degradación. No es la misma falla para todos: las políticas con retry se degradan más rápido porque presionan más.&lt;/p&gt;

&lt;p&gt;Los números de la corrida muestran eso claramente. Con &lt;code&gt;no-retry-standard-timeout&lt;/code&gt; se procesaron &lt;code&gt;7720&lt;/code&gt; requests totales y se iniciaron &lt;code&gt;7720&lt;/code&gt; llamadas downstream. Con &lt;code&gt;immediate-retry&lt;/code&gt;, los requests totales bajaron a &lt;code&gt;2939&lt;/code&gt; pero las llamadas downstream subieron a &lt;code&gt;8699&lt;/code&gt;, con un amplification factor de &lt;code&gt;2.96&lt;/code&gt;. La policy con retry procesó menos requests de usuarios pero le hizo más llamadas al downstream.&lt;/p&gt;

&lt;p&gt;Ahora bien: esto no es un fallo de diseño, es el punto del experimento. El laboratorio lo documenta explícitamente en &lt;code&gt;docs/brief-post.md&lt;/code&gt;: &lt;code&gt;progressive-degradation&lt;/code&gt; debe leerse como degradación sensible a carga, no como falla externa idéntica para todos. Si lo tratás como comparación directa entre políticas bajo las mismas condiciones, la conclusión está mal planteada desde el vantage point.&lt;/p&gt;

&lt;p&gt;Lo que sí podés concluir: en escenarios donde la velocidad de degradación depende del volumen de llamadas recibidas, los retries pueden ser un acelerador de la caída. Eso tiene nombre en producción: retry storm. Y el laboratorio lo reproduce de forma controlada.&lt;/p&gt;

&lt;h2&gt;
  
  
  Los percentiles que te mienten cuando hay timeouts
&lt;/h2&gt;

&lt;p&gt;Hay un detalle técnico que cambió mi forma de leer los resultados, y que el README documenta con honestidad.&lt;/p&gt;

&lt;p&gt;El timeout del caller se implementa con &lt;code&gt;future.cancel(true)&lt;/code&gt; en el &lt;code&gt;RetryExecutor&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// RetryExecutor.java — el cancel(true) interrumpe el intento desde el caller&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;toMillis&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MILLISECONDS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"ok"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TimeoutException&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;cancel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;AttemptResult&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;elapsedMs&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"timeout"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cuando un intento vence el timeout, la latencia registrada para ese intento está capada por el timeout del caller: &lt;code&gt;STANDARD_TIMEOUT = Duration.ofMillis(260)&lt;/code&gt;. Por eso en &lt;code&gt;progressive-degradation&lt;/code&gt; casi todos los &lt;code&gt;all_attempt_p95_ms&lt;/code&gt; y &lt;code&gt;all_attempt_p99_ms&lt;/code&gt; muestran exactamente &lt;code&gt;260&lt;/code&gt;. No es que el downstream respondió en 260 ms: es que el caller dejó de esperar a los 260 ms y registró eso como latencia del intento.&lt;/p&gt;

&lt;p&gt;Lo que pasa después del &lt;code&gt;cancel(true)&lt;/code&gt; en el downstream simulado no se modela completamente. En un sistema real con HTTP, base de datos o cola, el downstream puede seguir ejecutando trabajo aunque el cliente ya no espere. El laboratorio cuenta llamadas iniciadas, pero no puede garantizar que no hay trabajo residual post-cancelación.&lt;/p&gt;

&lt;p&gt;Esto importa para leer &lt;code&gt;successful_requests_per_second&lt;/code&gt; también. El valor de &lt;code&gt;0.95&lt;/code&gt; que aparece en varios escenarios de &lt;code&gt;progressive-degradation&lt;/code&gt; no es la capacidad máxima del sistema: es el trabajo útil observado bajo esa carga cerrada de k6. Con otra configuración de VUs, otra duración o una red real, los números serían distintos.&lt;/p&gt;

&lt;h2&gt;
  
  
  circuit-breaker y bulkhead: rechazos visibles como señal de protección
&lt;/h2&gt;

&lt;p&gt;En &lt;code&gt;progressive-degradation&lt;/code&gt;, el circuit breaker produce algo que parece contradictorio al primer vistazo. La corrida &lt;code&gt;13-circuit-breaker-progressive-degradation&lt;/code&gt; tiene &lt;code&gt;total_requests = 44777&lt;/code&gt; y &lt;code&gt;circuit_breaker_rejected = 44718&lt;/code&gt;. El error rate es &lt;code&gt;0.9987&lt;/code&gt;. Eso parece catastrófico.&lt;/p&gt;

&lt;p&gt;Pero mirá las llamadas downstream: &lt;code&gt;198&lt;/code&gt;. Amplification factor: &lt;code&gt;0.004&lt;/code&gt;. El circuit breaker dejó de mandar llamadas al downstream casi por completo. Los rechazos son visibles hacia el cliente, pero el downstream está protegido.&lt;/p&gt;

&lt;p&gt;Si comparás con &lt;code&gt;immediate-retry-progressive-degradation&lt;/code&gt;, que tiene &lt;code&gt;downstream_calls = 8699&lt;/code&gt; y sigue fallando igual, el trade-off se hace evidente. El circuit breaker elige rechazar rápido antes que multiplicar presión sobre algo que ya no puede responder.&lt;/p&gt;

&lt;p&gt;El bulkhead en la misma corrida muestra una variante distinta: &lt;code&gt;bulkhead_rejected = 22122&lt;/code&gt; con &lt;code&gt;downstream_calls = 3668&lt;/code&gt;. Limita concurrencia en lugar de cortar el circuito, pero el efecto es similar: reduce presión downstream a costa de rechazos visibles.&lt;/p&gt;

&lt;p&gt;Esas señales de concurrencia (&lt;code&gt;max_inflight_downstream = 16&lt;/code&gt; para bulkhead, &lt;code&gt;40&lt;/code&gt; para la mayoría de las otras corridas) son observaciones, no prueba de saturación. El laboratorio renombró la métrica de &lt;code&gt;saturationObservation&lt;/code&gt; a &lt;code&gt;concurrencyObservation&lt;/code&gt; exactamente por eso: &lt;code&gt;max_inflight&lt;/code&gt; alto no prueba saturación de CPU, red ni pool de conexiones. Es una señal que invita a investigar, no una conclusión.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qué concluyo y qué no
&lt;/h2&gt;

&lt;p&gt;Este experimento es una simulación local, corrida única publicada, sobre un downstream simulado con delays en memoria. Los números no representan producción, no representan ningún proveedor real y no permiten afirmar "esta política escala a X RPS". Si querés publicar valores exactos con claims fuertes, el README lo dice claramente: hacé al menos tres corridas &lt;code&gt;editorial&lt;/code&gt; y mirá consistencia, no una sola pasada.&lt;/p&gt;

&lt;p&gt;Lo que sí creo que puede sostenerse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;En fallas transitorias, retry puede mejorar success rate pero siempre tiene un amplification factor mayor a 1. Ese overhead existe y tiene que caber en el sistema.&lt;/li&gt;
&lt;li&gt;En degradación sensible a carga, más retries pueden acelerar la degradación porque generan más llamadas. Esto no es universal, pero el escenario es real y el experimento lo reproduce.&lt;/li&gt;
&lt;li&gt;p95 y p99 de intentos no te cuentan la latencia real del downstream cuando hay timeouts: te cuentan cuánto esperó el caller antes de rendirse.&lt;/li&gt;
&lt;li&gt;Circuit breaker y bulkhead producen rechazos visibles que pueden ser exactamente la decisión correcta para proteger el sistema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lo que no concluyo: que una política es mejor que otra en abstracto, que estos números aplican a otro sistema, o que &lt;code&gt;max_inflight_downstream&lt;/code&gt; prueba saturación.&lt;/p&gt;

&lt;p&gt;La pregunta que me dejo para seguir explorando: ¿cuánto trabajo residual real queda en el downstream después de un &lt;code&gt;future.cancel(true)&lt;/code&gt; en un sistema con pool de conexiones HTTP? El laboratorio lo anota como limitación conocida. En producción eso es exactamente donde está la diferencia entre un timeout que protege y uno que solo esconde el problema.&lt;/p&gt;

&lt;p&gt;El repo está en &lt;a href="https://github.com/JuanTorchia/retry-resilience-experiment" rel="noopener noreferrer"&gt;&lt;code&gt;github.com/JuanTorchia/retry-resilience-experiment&lt;/code&gt;&lt;/a&gt;. Si lo corrés y obtenés números distintos, me interesa saberlo.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Este articulo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/retry-backoff-jitter-spring-boot-amplification" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>backend</category>
    </item>
    <item>
      <title>HikariCP: the p95 that lies to you and how to read the real pool signals</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 03:11:26 +0000</pubDate>
      <link>https://dev.to/jtorchia/hikaricp-the-p95-that-lies-to-you-and-how-to-read-the-real-pool-signals-10eo</link>
      <guid>https://dev.to/jtorchia/hikaricp-the-p95-that-lies-to-you-and-how-to-read-the-real-pool-signals-10eo</guid>
      <description>&lt;h1&gt;
  
  
  HikariCP: the p95 that lies to you and how to read the real pool signals
&lt;/h1&gt;

&lt;p&gt;There was a version of this analysis that started wrong. I was looking at the p95 for the &lt;code&gt;tiny&lt;/code&gt; scenario with a 500ms delay and seeing &lt;code&gt;260.78ms&lt;/code&gt;. Compared to the &lt;code&gt;default&lt;/code&gt; scenario showing &lt;code&gt;2418.16ms&lt;/code&gt;, it looked almost five times faster. That's a classic trap, and I almost fell for it.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;tiny&lt;/code&gt; scenario had a 97.05% error rate. Out of 8139 attempts, 7899 failed. Those 260ms were the average rejection time, not the time for a useful response. It wasn't fast — it was failing fast. And that difference matters enormously when you're trying to understand whether your HikariCP configuration is working or not.&lt;/p&gt;

&lt;p&gt;That led me to build &lt;a href="https://github.com/JuanTorchia/hikaricp-pool-experiment" rel="noopener noreferrer"&gt;hikaricp-pool-experiment&lt;/a&gt;: a reproducible lab with Java 21, Spring Boot 3.4.5, PostgreSQL 16, HikariCP, Docker Compose, and k6 0.51.0. The goal wasn't to simulate production or document a real incident. It was to build an environment where pool signals would be visible and measurable, so I could reason about them with actual numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The experiment design
&lt;/h2&gt;

&lt;p&gt;The app exposes two endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GET /api/query?delayMs=500&lt;/code&gt;: executes a real query against PostgreSQL and holds the connection using &lt;code&gt;pg_sleep&lt;/code&gt; for the specified duration.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/pool&lt;/code&gt;: returns the pool state in real time — &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;total&lt;/code&gt;, &lt;code&gt;threadsAwaitingConnection&lt;/code&gt;, and the effective configuration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;delayMs&lt;/code&gt; is the central mechanism of the experiment. An instant query can hide contention even at high concurrency because connections get released before the next request needs them. With &lt;code&gt;pg_sleep(0.5)&lt;/code&gt;, each connection stays occupied for half a second. With 50 virtual users hitting in parallel, pressure on the pool becomes visible quickly.&lt;/p&gt;

&lt;p&gt;The k6 script does something that the original draft didn't have cleanly separated: it records &lt;code&gt;query_duration&lt;/code&gt; for all attempts and &lt;code&gt;query_success_duration&lt;/code&gt; only for those that return HTTP 200. Without that distinction, the p95 aggregates fast rejections with slow successful queries and the resulting number doesn't represent any useful reality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// load/hikari-pool.js — critical separation between all attempts and successful ones&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query status is 200&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;queryDuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;querySuccessDuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;queryErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scenarios defined in &lt;code&gt;application.yml&lt;/code&gt; are:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;&lt;code&gt;maximumPoolSize&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;connectionTimeout&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;default&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;10 (Spring Boot default)&lt;/td&gt;
&lt;td&gt;30000ms (HikariCP default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;250ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool16&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool32&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The matrix was run with two delays — 50ms and 500ms — because the contrast matters: a query that releases its connection quickly and a query that holds it for half a second don't stress the pool the same way.&lt;/p&gt;

&lt;p&gt;To reproduce it from scratch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-matrix.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or scenario by scenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;compose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;down&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-v&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-scenario.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scenario&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tiny&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DelayMs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-scenario.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scenario&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pool16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DelayMs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important limitation:&lt;/strong&gt; all of this is a single local run from 2026-05-14 on Windows with Docker Desktop/WSL2. The numbers are useful for comparing scenarios within the same machine. They are not a universal benchmark and don't reflect behavior in any cloud environment, Railway, or otherwise. &lt;code&gt;pg_sleep&lt;/code&gt; holds connections artificially to make the pressure visible — it doesn't represent a real production workload.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The full results — and what to read in them
&lt;/h2&gt;

&lt;p&gt;This is the table generated by &lt;code&gt;summarize-results.ps1&lt;/code&gt; from the k6 JSON output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Delay&lt;/th&gt;
&lt;th&gt;Attempts&lt;/th&gt;
&lt;th&gt;Successful&lt;/th&gt;
&lt;th&gt;Failed&lt;/th&gt;
&lt;th&gt;Error rate&lt;/th&gt;
&lt;th&gt;Successful/s&lt;/th&gt;
&lt;th&gt;p95 all&lt;/th&gt;
&lt;th&gt;p95 successful&lt;/th&gt;
&lt;th&gt;Max active&lt;/th&gt;
&lt;th&gt;Max waiting&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;default&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;11772&lt;/td&gt;
&lt;td&gt;11772&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;195.38&lt;/td&gt;
&lt;td&gt;165.7ms&lt;/td&gt;
&lt;td&gt;165.7ms&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;default&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1240&lt;/td&gt;
&lt;td&gt;1240&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;19.85&lt;/td&gt;
&lt;td&gt;2418.16ms&lt;/td&gt;
&lt;td&gt;2418.16ms&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tiny&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;8289&lt;/td&gt;
&lt;td&gt;2325&lt;/td&gt;
&lt;td&gt;5964&lt;/td&gt;
&lt;td&gt;71.95%&lt;/td&gt;
&lt;td&gt;38.53&lt;/td&gt;
&lt;td&gt;298.81ms&lt;/td&gt;
&lt;td&gt;304.84ms&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;tiny&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;500ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8139&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;240&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7899&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;97.05%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;260.78ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;752.51ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool4&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;4712&lt;/td&gt;
&lt;td&gt;4712&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;77.75&lt;/td&gt;
&lt;td&gt;557.55ms&lt;/td&gt;
&lt;td&gt;557.55ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool4&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1779&lt;/td&gt;
&lt;td&gt;492&lt;/td&gt;
&lt;td&gt;1287&lt;/td&gt;
&lt;td&gt;72.34%&lt;/td&gt;
&lt;td&gt;7.95&lt;/td&gt;
&lt;td&gt;1962.83ms&lt;/td&gt;
&lt;td&gt;1990.52ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool8&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;9253&lt;/td&gt;
&lt;td&gt;9253&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;153.4&lt;/td&gt;
&lt;td&gt;365.15ms&lt;/td&gt;
&lt;td&gt;365.15ms&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool8&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1653&lt;/td&gt;
&lt;td&gt;984&lt;/td&gt;
&lt;td&gt;669&lt;/td&gt;
&lt;td&gt;40.47%&lt;/td&gt;
&lt;td&gt;15.87&lt;/td&gt;
&lt;td&gt;1996.36ms&lt;/td&gt;
&lt;td&gt;1998.83ms&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool16&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;18155&lt;/td&gt;
&lt;td&gt;18155&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;301.83&lt;/td&gt;
&lt;td&gt;82.92ms&lt;/td&gt;
&lt;td&gt;82.92ms&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool16&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1948&lt;/td&gt;
&lt;td&gt;1947&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.05%&lt;/td&gt;
&lt;td&gt;31.62&lt;/td&gt;
&lt;td&gt;1492.44ms&lt;/td&gt;
&lt;td&gt;1492.44ms&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool32&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;18892&lt;/td&gt;
&lt;td&gt;18892&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;314.16&lt;/td&gt;
&lt;td&gt;70.33ms&lt;/td&gt;
&lt;td&gt;70.33ms&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool32&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;3830&lt;/td&gt;
&lt;td&gt;3830&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;63.00&lt;/td&gt;
&lt;td&gt;784.9ms&lt;/td&gt;
&lt;td&gt;784.9ms&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There are several things worth reading together, not in isolation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The trap of a low p95 with a high error rate
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;tiny&lt;/code&gt; scenario with a 500ms delay is the most instructive in the experiment. The p95 for all attempts is &lt;code&gt;260.78ms&lt;/code&gt;. If you only look at that number, it looks like the pool responds very quickly. But the 97.05% error rate tells you that almost no query ever executed — HikariCP was rejecting requests after &lt;code&gt;connectionTimeout: 250ms&lt;/code&gt; because there were no free connections.&lt;/p&gt;

&lt;p&gt;The separation between &lt;code&gt;query_duration&lt;/code&gt; and &lt;code&gt;query_success_duration&lt;/code&gt; makes visible what the aggregated number was hiding: the p95 for &lt;strong&gt;successful&lt;/strong&gt; queries is &lt;code&gt;752.51ms&lt;/code&gt; — almost three times higher. Those few queries that did get a connection took nearly a second, probably because they had to wait for one of the two pool connections to be released.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;active&lt;/code&gt; is pinned at the pool maximum (2/2) and &lt;code&gt;waiting&lt;/code&gt; reaches 47, the system isn't processing load — it's rejecting it. The 260ms is the time to fail, not to succeed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Signal that matters:&lt;/strong&gt; if &lt;code&gt;p95 all attempts&lt;/code&gt; ≪ &lt;code&gt;p95 successful&lt;/code&gt; and the error rate is high, the pool is in exhaustion. You're not seeing query latency — you're seeing rejection latency.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to read the four signals together
&lt;/h2&gt;

&lt;p&gt;The experiment confirmed that no single metric is enough. The signals that make sense to cross-reference are:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Error rate + successful queries/s
&lt;/h3&gt;

&lt;p&gt;These two together are the first filter. A 0% error rate with 19.85 successful/s (&lt;code&gt;default&lt;/code&gt;, 500ms delay) is very different from a 97% error rate with 3.97 successful/s (&lt;code&gt;tiny&lt;/code&gt;, 500ms delay). Successful throughput tells you how much useful work the system is doing; error rate tells you how much work it's throwing away.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;pool4&lt;/code&gt; at 500ms delay: 72.34% error rate with only 7.95 successful/s. Four connections with 500ms queries give a theoretical ceiling of 8 successful/s (4 connections × 2 per second). The numbers match — the pool is at its limit and rejects the rest.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;active = maximumPoolSize&lt;/code&gt; sustained + &lt;code&gt;waiting &amp;gt; 0&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This combination is the most direct operational signal that the pool is under pressure. When &lt;code&gt;maxActiveConnections&lt;/code&gt; hits the configured ceiling and &lt;code&gt;maxThreadsAwaitingConnection&lt;/code&gt; is greater than zero for a sustained period, application threads are waiting for a connection that isn't available.&lt;/p&gt;

&lt;p&gt;From the experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tiny&lt;/code&gt; 500ms delay: max active 2/2, max waiting 47. Pool exhausted from the start.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool8&lt;/code&gt; 500ms delay: max active 8/8, max waiting 41, error rate 40.47%. High pressure but not total.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool32&lt;/code&gt; 500ms delay: max active 32/32, max waiting 24, error rate 0%. The pool hits the ceiling but absorbs the load without rejecting requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In &lt;code&gt;pool32&lt;/code&gt; with 500ms delay, &lt;code&gt;waiting = 24&lt;/code&gt; with 0% error rate means threads are waiting but the &lt;code&gt;connectionTimeout: 1500ms&lt;/code&gt; is enough — queries queue up and eventually get a connection. That's a system under pressure that still works, not one in crisis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Attempt latency vs. successful latency
&lt;/h3&gt;

&lt;p&gt;I already covered the &lt;code&gt;tiny&lt;/code&gt; case. But it's worth generalizing: when there's significant error rate, the p95 of all attempts stops being an application performance metric and becomes a rejection speed metric. The real operational latency is that of successful queries.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;pool4&lt;/code&gt; at 500ms delay: p95 all attempts &lt;code&gt;1962.83ms&lt;/code&gt;, p95 successful &lt;code&gt;1990.52ms&lt;/code&gt;. Here the numbers are similar because queries that do get through also wait a lot — the pool has 4 connections with 500ms queries, so almost all the time is spent waiting for one to free up.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The jump from 50ms to 500ms as a pressure revealer
&lt;/h3&gt;

&lt;p&gt;With a 50ms delay, &lt;code&gt;pool8&lt;/code&gt; has zero errors and processes 153.4 successful/s. With a 500ms delay, it drops to 40.47% error rate and 15.87 successful/s. The pool didn't change — the connection hold time changed. If each connection takes ten times longer to release, a pool that was previously sufficient now isn't.&lt;/p&gt;

&lt;p&gt;This is the variable most frequently ignored when calibrating a pool: it's not just how many connections exist, but how long each query holds them. A pool of 16 connections with 50ms queries is very different from a pool of 16 connections with 500ms queries.&lt;/p&gt;




&lt;h2&gt;
  
  
  The diminishing returns of going from pool16 to pool32 with a short delay
&lt;/h2&gt;

&lt;p&gt;There's an observation from the experiment that I think is important to avoid the easy conclusion of "more connections = better".&lt;/p&gt;

&lt;p&gt;With 50ms delay:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pool16&lt;/code&gt;: 301.83 successful/s, p95 82.92ms&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool32&lt;/code&gt;: 314.16 successful/s, p95 70.33ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Doubling the pool size gave only about ~4% improvement in throughput. The jump from &lt;code&gt;pool8&lt;/code&gt; to &lt;code&gt;pool16&lt;/code&gt; was much larger (153.4 → 301.83, nearly double). Beyond a certain point, the bottleneck is no longer the pool — it becomes something else. In this case, probably the Docker Desktop CPU or PostgreSQL itself under load from 50 VUs.&lt;/p&gt;

&lt;p&gt;This is consistent with the formula Brettwooldridge mentions in the HikariCP README: the optimal pool for database throughput is not simply "as large as possible". Beyond a certain threshold, adding connections creates overhead without real benefit, and in an environment with &lt;code&gt;max_connections&lt;/code&gt; limits on PostgreSQL, you can run out of slots before throughput improves.&lt;/p&gt;

&lt;p&gt;The practical conclusion from the experiment isn't that 32 is the right number. It's that &lt;code&gt;pool16&lt;/code&gt; with a 500ms delay has a 0.05% error rate and &lt;code&gt;pool32&lt;/code&gt; has 0%, with 2x higher throughput. Depending on your actual query times and your PostgreSQL limits, the trade-off is different in each case.&lt;/p&gt;




&lt;h2&gt;
  
  
  The metrics the experiment exposes via Actuator
&lt;/h2&gt;

&lt;p&gt;The app has Actuator enabled with health, info, metrics, and prometheus. During a run you can query pool state directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Pool state via custom endpoint&lt;/span&gt;
curl http://localhost:8080/api/pool

&lt;span class="c"&gt;# Micrometer metrics via Actuator&lt;/span&gt;
curl http://localhost:8080/actuator/metrics/hikaricp.connections.active
curl http://localhost:8080/actuator/metrics/hikaricp.connections.pending
curl http://localhost:8080/actuator/metrics/hikaricp.connections.timeout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/api/pool&lt;/code&gt; endpoint uses &lt;code&gt;HikariPoolMXBean&lt;/code&gt; directly and returns &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;total&lt;/code&gt;, &lt;code&gt;threadsAwaitingConnection&lt;/code&gt;, and the effective configuration. That's what k6 queries in parallel to record the &lt;code&gt;hikari_pool_active&lt;/code&gt;, &lt;code&gt;hikari_pool_idle&lt;/code&gt;, &lt;code&gt;hikari_pool_total&lt;/code&gt;, and &lt;code&gt;hikari_pool_threads_awaiting_connection&lt;/code&gt; metrics.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;hikaricp.connections.timeout&lt;/code&gt; metric from Actuator is the one I care most about in any real environment: it counts the number of times a thread waited for a connection and the &lt;code&gt;connectionTimeout&lt;/code&gt; expired. If that counter is greater than zero, users are being affected — that's not a warning, it's a fact.&lt;/p&gt;




&lt;h2&gt;
  
  
  The experiment configuration vs. configuration for a real environment
&lt;/h2&gt;

&lt;p&gt;The experiment uses values designed to make pool pressure visible in a lab, not values to copy into any system. The &lt;code&gt;tiny&lt;/code&gt; profile has &lt;code&gt;connectionTimeout: 250ms&lt;/code&gt; because 250ms makes the pool reject requests quickly and errors become immediately visible. In a real system, 250ms is probably too aggressive — you'll generate false positives during any brief spike.&lt;/p&gt;

&lt;p&gt;What does translate are the reading principles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On &lt;code&gt;connectionTimeout&lt;/code&gt;:&lt;/strong&gt; the value defines the speed of failure, not the speed of success. A short timeout generates errors faster and makes symptoms visible sooner. A long timeout accumulates blocked threads that consume memory and can saturate the web server's thread pool before the error becomes obvious. Which one you want depends on whether you have circuit breakers and retry logic, and on how long a user can wait before the experience breaks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On &lt;code&gt;maximumPoolSize&lt;/code&gt;:&lt;/strong&gt; the right number depends on the average query hold time, expected concurrency, and your PostgreSQL's &lt;code&gt;max_connections&lt;/code&gt; limits. There's no universal formula. What the experiment shows is that with 500ms queries and 50 VUs, you need at least 16 connections to get close to zero error rate — and that doubling to 32 gives diminishing returns on throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On managed cloud databases:&lt;/strong&gt; if you use Railway, Supabase, RDS, or another service where you don't directly control the server, there's an additional parameter that matters and that this experiment doesn't cover: &lt;code&gt;maxLifetime&lt;/code&gt;. The server may close idle connections before HikariCP's 30-minute default, and a connection that the pool thinks is alive but the server has already closed will generate &lt;code&gt;PSQLException: This connection has been closed&lt;/code&gt; on the next use. Setting &lt;code&gt;maxLifetime&lt;/code&gt; below the server's timeout is a necessary adjustment in those environments — but it's not something this local Docker lab can measure.&lt;/p&gt;




&lt;h2&gt;
  
  
  My take after the experiment
&lt;/h2&gt;

&lt;p&gt;The most valuable thing from this exercise wasn't picking a connection count. It was understanding that you can't tune HikariCP by looking at a single metric.&lt;/p&gt;

&lt;p&gt;If you only look at the p95 of all attempts, you might conclude that a pool in crisis is "fast". If you only look at error rate, you can't tell whether the system is absorbing load or rejecting it. If you only look at &lt;code&gt;active&lt;/code&gt;, you don't know whether the pool has headroom or is at its limit. You need to cross all four: error rate, successful queries/s, active vs. configured maximum, waiting, and successful latency.&lt;/p&gt;

&lt;p&gt;The other takeaway that stuck with me: there are two ways a pool can fail under load. One is the long timeout — threads waiting 30 seconds and eventually blowing up the heap. The other is the short timeout — fast rejections that generate a high error rate but create the illusion of low latency. The lab made both visible with real numbers.&lt;/p&gt;

&lt;p&gt;I don't buy the idea that there's a universally correct &lt;code&gt;maximumPoolSize&lt;/code&gt;. What there is is a correct size for your combination of query hold time, expected concurrency, and database capacity. And that number only makes sense read alongside connection hold time and error rate — not in isolation.&lt;/p&gt;

&lt;p&gt;The repo has everything needed to run it again in your environment and compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-matrix.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you change the delay, the concurrency, or the &lt;code&gt;maximumPoolSize&lt;/code&gt;, the signals change. That's exactly the point.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/JuanTorchia/hikaricp-pool-experiment" rel="noopener noreferrer"&gt;github.com/JuanTorchia/hikaricp-pool-experiment&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Reference:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HikariCP GitHub — Configuration: &lt;a href="https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby" rel="noopener noreferrer"&gt;https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>english</category>
      <category>experiments</category>
      <category>performance</category>
      <category>backend</category>
    </item>
    <item>
      <title>HikariCP: el p95 que te miente y cómo leer las señales reales del pool</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Fri, 15 May 2026 03:11:20 +0000</pubDate>
      <link>https://dev.to/jtorchia/hikaricp-el-p95-que-te-miente-y-como-leer-las-senales-reales-del-pool-1da6</link>
      <guid>https://dev.to/jtorchia/hikaricp-el-p95-que-te-miente-y-como-leer-las-senales-reales-del-pool-1da6</guid>
      <description>&lt;h1&gt;
  
  
  HikariCP: el p95 que te miente y cómo leer las señales reales del pool
&lt;/h1&gt;

&lt;p&gt;Hubo una versión de este análisis que empezaba mal. Miraba el p95 del escenario &lt;code&gt;tiny&lt;/code&gt; con delay de 500ms y veía &lt;code&gt;260.78ms&lt;/code&gt;. Comparado con el escenario &lt;code&gt;default&lt;/code&gt; que mostraba &lt;code&gt;2418.16ms&lt;/code&gt;, parecía casi cinco veces más rápido. Eso es una trampa clásica, y casi me la como.&lt;/p&gt;

&lt;p&gt;El escenario &lt;code&gt;tiny&lt;/code&gt; tenía 97.05% de error rate. De 8139 intentos, 7899 fallaban. Los 260ms eran el tiempo promedio de rechazo, no de respuesta útil. No era rápido — estaba fallando rápido. Y la diferencia importa muchísimo cuando estás intentando entender si la configuración de HikariCP sirve o no.&lt;/p&gt;

&lt;p&gt;Eso me llevó a armar &lt;a href="https://github.com/JuanTorchia/hikaricp-pool-experiment" rel="noopener noreferrer"&gt;hikaricp-pool-experiment&lt;/a&gt;: un laboratorio reproducible con Java 21, Spring Boot 3.4.5, PostgreSQL 16, HikariCP, Docker Compose y k6 0.51.0. El objetivo no fue simular producción ni documentar un incidente real. Fue construir un entorno donde las señales del pool fueran visibles y medibles, para poder razonar sobre ellas con números en la mano.&lt;/p&gt;




&lt;h2&gt;
  
  
  El diseño del experimento
&lt;/h2&gt;

&lt;p&gt;La app expone dos endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GET /api/query?delayMs=500&lt;/code&gt;: ejecuta una consulta real contra PostgreSQL y retiene la conexión usando &lt;code&gt;pg_sleep&lt;/code&gt; durante el tiempo indicado.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/pool&lt;/code&gt;: devuelve el estado del pool en tiempo real — &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;total&lt;/code&gt;, &lt;code&gt;threadsAwaitingConnection&lt;/code&gt; y la configuración efectiva.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;El &lt;code&gt;delayMs&lt;/code&gt; es el mecanismo central del experimento. Una query instantánea puede esconder contención aunque la concurrencia sea alta porque las conexiones se liberan antes de que el siguiente request las necesite. Con &lt;code&gt;pg_sleep(0.5)&lt;/code&gt;, cada conexión queda ocupada durante medio segundo. Con 50 usuarios virtuales golpeando en paralelo, la presión sobre el pool se vuelve visible rápido.&lt;/p&gt;

&lt;p&gt;El script de k6 hace algo que el draft original no tenía bien separado: registra &lt;code&gt;query_duration&lt;/code&gt; para todos los intentos y &lt;code&gt;query_success_duration&lt;/code&gt; solo para los que devuelven HTTP 200. Sin esa distinción, el p95 agrega rechazos rápidos con queries exitosas lentas y el número resultante no representa ninguna realidad útil.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// load/hikari-pool.js — separación crítica entre intentos totales y exitosos&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query status is 200&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;queryDuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;querySuccessDuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;queryResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;timings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;duration&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;queryErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Los escenarios definidos en &lt;code&gt;application.yml&lt;/code&gt; son:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Escenario&lt;/th&gt;
&lt;th&gt;&lt;code&gt;maximumPoolSize&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;connectionTimeout&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;default&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;10 (Spring Boot default)&lt;/td&gt;
&lt;td&gt;30000ms (HikariCP default)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tiny&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;250ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool16&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;pool32&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;1500ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;La matriz se corrió con dos delays — 50ms y 500ms — porque el contraste es importante: una query que libera la conexión rápido y una query que la retiene durante medio segundo no estresan el pool de la misma manera.&lt;/p&gt;

&lt;p&gt;Para reproducirlo desde cero:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-matrix.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;O escenario por escenario:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;docker&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;compose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;down&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-v&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-scenario.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scenario&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tiny&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DelayMs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-scenario.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Scenario&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;pool16&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-DelayMs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;500&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Limitación importante:&lt;/strong&gt; todo esto es un single local run del 2026-05-14 en Windows con Docker Desktop/WSL2. Los números sirven para comparar escenarios dentro de la misma máquina. No son un benchmark universal ni reflejan comportamiento en ningún entorno en la nube, Railway o de otro tipo. &lt;code&gt;pg_sleep&lt;/code&gt; retiene conexiones de forma artificial para hacer visible la presión — no representa una workload real de producción.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Los resultados completos — y qué leer en ellos
&lt;/h2&gt;

&lt;p&gt;Esta es la tabla que generó &lt;code&gt;summarize-results.ps1&lt;/code&gt; a partir de los JSON de k6:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Escenario&lt;/th&gt;
&lt;th&gt;Delay&lt;/th&gt;
&lt;th&gt;Intentos&lt;/th&gt;
&lt;th&gt;Exitosas&lt;/th&gt;
&lt;th&gt;Fallidas&lt;/th&gt;
&lt;th&gt;Error rate&lt;/th&gt;
&lt;th&gt;Exitosas/s&lt;/th&gt;
&lt;th&gt;p95 todos&lt;/th&gt;
&lt;th&gt;p95 exitosas&lt;/th&gt;
&lt;th&gt;Active máx.&lt;/th&gt;
&lt;th&gt;Waiting máx.&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;default&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;11772&lt;/td&gt;
&lt;td&gt;11772&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;195.38&lt;/td&gt;
&lt;td&gt;165.7ms&lt;/td&gt;
&lt;td&gt;165.7ms&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;default&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1240&lt;/td&gt;
&lt;td&gt;1240&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;19.85&lt;/td&gt;
&lt;td&gt;2418.16ms&lt;/td&gt;
&lt;td&gt;2418.16ms&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tiny&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;8289&lt;/td&gt;
&lt;td&gt;2325&lt;/td&gt;
&lt;td&gt;5964&lt;/td&gt;
&lt;td&gt;71.95%&lt;/td&gt;
&lt;td&gt;38.53&lt;/td&gt;
&lt;td&gt;298.81ms&lt;/td&gt;
&lt;td&gt;304.84ms&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;47&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;tiny&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;500ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8139&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;240&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7899&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;97.05%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.97&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;260.78ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;752.51ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;47&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool4&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;4712&lt;/td&gt;
&lt;td&gt;4712&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;77.75&lt;/td&gt;
&lt;td&gt;557.55ms&lt;/td&gt;
&lt;td&gt;557.55ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;43&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool4&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1779&lt;/td&gt;
&lt;td&gt;492&lt;/td&gt;
&lt;td&gt;1287&lt;/td&gt;
&lt;td&gt;72.34%&lt;/td&gt;
&lt;td&gt;7.95&lt;/td&gt;
&lt;td&gt;1962.83ms&lt;/td&gt;
&lt;td&gt;1990.52ms&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool8&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;9253&lt;/td&gt;
&lt;td&gt;9253&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;153.4&lt;/td&gt;
&lt;td&gt;365.15ms&lt;/td&gt;
&lt;td&gt;365.15ms&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool8&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1653&lt;/td&gt;
&lt;td&gt;984&lt;/td&gt;
&lt;td&gt;669&lt;/td&gt;
&lt;td&gt;40.47%&lt;/td&gt;
&lt;td&gt;15.87&lt;/td&gt;
&lt;td&gt;1996.36ms&lt;/td&gt;
&lt;td&gt;1998.83ms&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool16&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;18155&lt;/td&gt;
&lt;td&gt;18155&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;301.83&lt;/td&gt;
&lt;td&gt;82.92ms&lt;/td&gt;
&lt;td&gt;82.92ms&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool16&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;1948&lt;/td&gt;
&lt;td&gt;1947&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.05%&lt;/td&gt;
&lt;td&gt;31.62&lt;/td&gt;
&lt;td&gt;1492.44ms&lt;/td&gt;
&lt;td&gt;1492.44ms&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool32&lt;/td&gt;
&lt;td&gt;50ms&lt;/td&gt;
&lt;td&gt;18892&lt;/td&gt;
&lt;td&gt;18892&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;314.16&lt;/td&gt;
&lt;td&gt;70.33ms&lt;/td&gt;
&lt;td&gt;70.33ms&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;pool32&lt;/td&gt;
&lt;td&gt;500ms&lt;/td&gt;
&lt;td&gt;3830&lt;/td&gt;
&lt;td&gt;3830&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;td&gt;63.00&lt;/td&gt;
&lt;td&gt;784.9ms&lt;/td&gt;
&lt;td&gt;784.9ms&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Hay varias cosas que vale la pena leer juntas, no por separado.&lt;/p&gt;




&lt;h2&gt;
  
  
  La trampa del p95 bajo con error rate alto
&lt;/h2&gt;

&lt;p&gt;El caso &lt;code&gt;tiny&lt;/code&gt; con delay 500ms es el más instructivo del experimento. El p95 de todos los intentos es &lt;code&gt;260.78ms&lt;/code&gt;. Si solo mirás ese número, parecería que el pool responde muy rápido. Pero el 97.05% de error rate te dice que casi ninguna query llegó a ejecutarse — HikariCP estaba rechazando requests en &lt;code&gt;connectionTimeout: 250ms&lt;/code&gt; porque no había conexiones libres.&lt;/p&gt;

&lt;p&gt;La separación entre &lt;code&gt;query_duration&lt;/code&gt; y &lt;code&gt;query_success_duration&lt;/code&gt; hace visible lo que el número agregado escondía: el p95 de las queries &lt;strong&gt;exitosas&lt;/strong&gt; es &lt;code&gt;752.51ms&lt;/code&gt; — casi tres veces más. Esas pocas queries que sí consiguieron una conexión tardaron casi un segundo, probablemente porque esperaron a que alguna de las dos conexiones del pool se liberara.&lt;/p&gt;

&lt;p&gt;Cuando &lt;code&gt;active&lt;/code&gt; está pegado al máximo del pool (2/2) y &lt;code&gt;waiting&lt;/code&gt; llega a 47, el sistema no está procesando carga — la está rechazando. Los 260ms son el tiempo de fracaso, no de éxito.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Señal que importa:&lt;/strong&gt; si &lt;code&gt;p95 todos los intentos&lt;/code&gt; ≪ &lt;code&gt;p95 exitosas&lt;/code&gt; y el error rate es alto, el pool está en exhaustion. No estás viendo latencia de queries: estás viendo latencia de rechazo.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cómo leer las cuatro señales en conjunto
&lt;/h2&gt;

&lt;p&gt;El experimento confirmó que ninguna métrica sola alcanza. Las señales que tiene sentido cruzar son:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Error rate + successful queries/s
&lt;/h3&gt;

&lt;p&gt;Estas dos juntas son el primer filtro. Un error rate de 0% con 19.85 exitosas/s (&lt;code&gt;default&lt;/code&gt;, delay 500ms) es muy diferente a un error rate de 97% con 3.97 exitosas/s (&lt;code&gt;tiny&lt;/code&gt;, delay 500ms). El throughput de exitosas dice cuánto trabajo útil hace el sistema; el error rate dice cuánto trabajo está tirando a la basura.&lt;/p&gt;

&lt;p&gt;En &lt;code&gt;pool4&lt;/code&gt; con delay 500ms: 72.34% de error rate con solo 7.95 exitosas/s. Cuatro conexiones con queries de 500ms dan un techo teórico de 8 exitosas/s (4 conexiones × 2 por segundo). Los números coinciden: el pool está al límite y rechaza el resto.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;active = maximumPoolSize&lt;/code&gt; sostenido + &lt;code&gt;waiting &amp;gt; 0&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Esta combinación es la señal operativa más directa de que el pool está bajo presión. Cuando &lt;code&gt;maxActiveConnections&lt;/code&gt; bate el techo configurado y &lt;code&gt;maxThreadsAwaitingConnection&lt;/code&gt; es mayor que cero durante un período sostenido, los threads de la aplicación están esperando una conexión que no está disponible.&lt;/p&gt;

&lt;p&gt;Del experimento:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tiny&lt;/code&gt; delay 500ms: active máx. 2/2, waiting máx. 47. Pool exhausto desde el principio.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool8&lt;/code&gt; delay 500ms: active máx. 8/8, waiting máx. 41, error rate 40.47%. Presión alta pero no total.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool32&lt;/code&gt; delay 500ms: active máx. 32/32, waiting máx. 24, error rate 0%. El pool llega al techo pero absorbe la carga sin rechazar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;En &lt;code&gt;pool32&lt;/code&gt; con delay 500ms, &lt;code&gt;waiting = 24&lt;/code&gt; con error rate 0% significa que los threads esperan pero el &lt;code&gt;connectionTimeout: 1500ms&lt;/code&gt; alcanza — las queries encolan y eventualmente consiguen conexión. Es un sistema bajo presión que aún funciona, no uno en crisis.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Latencia de intentos vs. latencia de exitosas
&lt;/h3&gt;

&lt;p&gt;Ya mencioné el caso &lt;code&gt;tiny&lt;/code&gt;. Pero vale generalizar: cuando hay error rate significativo, el p95 de todos los intentos deja de ser una métrica de performance de la aplicación y pasa a ser una métrica de velocidad de rechazo. La latencia operativa real es la de las queries exitosas.&lt;/p&gt;

&lt;p&gt;En &lt;code&gt;pool4&lt;/code&gt; delay 500ms: p95 todos los intentos &lt;code&gt;1962.83ms&lt;/code&gt;, p95 exitosas &lt;code&gt;1990.52ms&lt;/code&gt;. Acá los números son similares porque las queries que sí pasan también esperan mucho — el pool tiene 4 conexiones con queries de 500ms, así que casi todo el tiempo está esperando que alguna se libere.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. El salto de 50ms a 500ms como revelador de presión
&lt;/h3&gt;

&lt;p&gt;Con delay 50ms, &lt;code&gt;pool8&lt;/code&gt; no tiene un solo error y procesa 153.4 exitosas/s. Con delay 500ms, cae a 40.47% de error rate y 15.87 exitosas/s. El pool no cambió — cambió el tiempo de retención de la conexión. Si cada conexión tarda diez veces más en liberarse, el pool que antes era suficiente ahora no alcanza.&lt;/p&gt;

&lt;p&gt;Esta es la variable que más frecuentemente se ignora cuando se calibra un pool: no es solo cuántas conexiones hay, sino cuánto tiempo cada query las retiene. Un pool de 16 conexiones con queries de 50ms es muy diferente a un pool de 16 conexiones con queries de 500ms.&lt;/p&gt;




&lt;h2&gt;
  
  
  El límite del salto de pool16 a pool32 con delay corto
&lt;/h2&gt;

&lt;p&gt;Hay una observación del experimento que me parece importante para evitar la conclusión fácil de "más conexiones = mejor".&lt;/p&gt;

&lt;p&gt;Con delay 50ms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pool16&lt;/code&gt;: 301.83 exitosas/s, p95 82.92ms&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pool32&lt;/code&gt;: 314.16 exitosas/s, p95 70.33ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Doblar el tamaño del pool dio una mejora de apenas ~4% en throughput. El salto de &lt;code&gt;pool8&lt;/code&gt; a &lt;code&gt;pool16&lt;/code&gt; fue mucho mayor (153.4 → 301.83, casi el doble). A partir de cierto punto, el cuello de botella deja de ser el pool y pasa a ser otra cosa — en este caso, probablemente el CPU del Docker Desktop o el propio PostgreSQL bajo carga de 50 VUs.&lt;/p&gt;

&lt;p&gt;Esto es consistente con la fórmula que Brettwooldridge menciona en el README de HikariCP: el pool óptimo para throughput de base de datos no es simplemente "el más grande posible". Más allá de cierto umbral, agregar conexiones genera overhead sin beneficio real, y en un entorno con límites de &lt;code&gt;max_connections&lt;/code&gt; en PostgreSQL podés quedarte sin slots antes de que el throughput mejore.&lt;/p&gt;

&lt;p&gt;La conclusión práctica del experimento no es que 32 sea el número correcto. Es que &lt;code&gt;pool16&lt;/code&gt; con delay 500ms tiene 0.05% de error rate y &lt;code&gt;pool32&lt;/code&gt; tiene 0%, con un throughput 2x mayor. Dependiendo de los tiempos reales de las queries y los límites de la PostgreSQL, el trade-off es diferente en cada caso.&lt;/p&gt;




&lt;h2&gt;
  
  
  Las métricas que expone el experimento vía Actuator
&lt;/h2&gt;

&lt;p&gt;La app tiene Actuator habilitado con health, info, metrics y prometheus. Durante una corrida podés consultar el estado del pool directamente:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Estado del pool vía endpoint propio&lt;/span&gt;
curl http://localhost:8080/api/pool

&lt;span class="c"&gt;# Métricas Micrometer vía Actuator&lt;/span&gt;
curl http://localhost:8080/actuator/metrics/hikaricp.connections.active
curl http://localhost:8080/actuator/metrics/hikaricp.connections.pending
curl http://localhost:8080/actuator/metrics/hikaricp.connections.timeout
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El endpoint &lt;code&gt;/api/pool&lt;/code&gt; usa &lt;code&gt;HikariPoolMXBean&lt;/code&gt; directamente y devuelve &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;total&lt;/code&gt;, &lt;code&gt;threadsAwaitingConnection&lt;/code&gt; y la configuración efectiva. Es lo que k6 consulta en paralelo para registrar las métricas &lt;code&gt;hikari_pool_active&lt;/code&gt;, &lt;code&gt;hikari_pool_idle&lt;/code&gt;, &lt;code&gt;hikari_pool_total&lt;/code&gt; y &lt;code&gt;hikari_pool_threads_awaiting_connection&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;La métrica &lt;code&gt;hikaricp.connections.timeout&lt;/code&gt; de Actuator es la que más me interesa en cualquier entorno real: cuenta las veces que un thread esperó una conexión y se venció el &lt;code&gt;connectionTimeout&lt;/code&gt;. Si ese contador es mayor que cero, hay usuarios afectados — no es una advertencia, es un hecho.&lt;/p&gt;




&lt;h2&gt;
  
  
  La configuración del experimento vs. configuración para un entorno real
&lt;/h2&gt;

&lt;p&gt;El experimento usa valores diseñados para hacer visible la presión en un laboratorio, no valores para copiar en cualquier sistema. El perfil &lt;code&gt;tiny&lt;/code&gt; tiene &lt;code&gt;connectionTimeout: 250ms&lt;/code&gt; porque 250ms hace que el pool rechace requests rápido y los errores sean inmediatamente visibles. En un sistema real, 250ms es probablemente demasiado agresivo — vas a generar falsos positivos ante cualquier pico breve.&lt;/p&gt;

&lt;p&gt;Lo que sí traslada son los principios de lectura:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sobre &lt;code&gt;connectionTimeout&lt;/code&gt;:&lt;/strong&gt; el valor define la velocidad del fallo, no la velocidad del éxito. Un timeout corto genera errores más rápido y hace que los síntomas sean visibles antes. Un timeout largo acumula threads bloqueados que consumen memoria y pueden saturar el thread pool del servidor web antes de que el error sea obvio. Cuál de los dos querés depende de si tenés circuit breakers y retry logic, y de cuánto tiempo puede esperar un usuario antes de que la experiencia se rompa.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sobre &lt;code&gt;maximumPoolSize&lt;/code&gt;:&lt;/strong&gt; el número correcto depende del tiempo promedio de retención de las queries, la concurrencia esperada, y los límites de &lt;code&gt;max_connections&lt;/code&gt; de la PostgreSQL. No hay una fórmula universal. Lo que el experimento muestra es que con queries de 500ms y 50 VUs, necesitás al menos 16 conexiones para llegar a error rate cercano a cero — y que doblar a 32 da rendimientos marginales decrecientes en el throughput.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sobre bases de datos gestionadas en la nube:&lt;/strong&gt; si usás Railway, Supabase, RDS u otro servicio donde no controlás el servidor directamente, hay un parámetro adicional que importa y que el experimento no cubre: &lt;code&gt;maxLifetime&lt;/code&gt;. El servidor puede cerrar conexiones inactivas antes del default de 30 minutos de HikariCP, y una conexión que desde el pool "está viva" pero el servidor ya cerró va a generar &lt;code&gt;PSQLException: This connection has been closed&lt;/code&gt; en el próximo uso. Configurar &lt;code&gt;maxLifetime&lt;/code&gt; por debajo del timeout del servidor es un ajuste necesario en esos entornos — pero no es algo que este laboratorio local con Docker pueda medir.&lt;/p&gt;




&lt;h2&gt;
  
  
  Mi postura después del experimento
&lt;/h2&gt;

&lt;p&gt;Lo más valioso del ejercicio no fue elegir un número de conexiones. Fue entender que HikariCP no se ajusta mirando una sola métrica.&lt;/p&gt;

&lt;p&gt;Si solo mirás el p95 de todos los intentos, podés concluir que un pool en crisis es "rápido". Si solo mirás el error rate, no sabés si el sistema está absorbiendo carga o rechazándola. Si solo mirás &lt;code&gt;active&lt;/code&gt;, no sabés si el pool tiene margen o está al límite. Necesitás cruzar los cuatro: error rate, successful queries/s, active vs. máximo configurado, waiting, y latencia de exitosas.&lt;/p&gt;

&lt;p&gt;El otro aprendizaje que me quedó: hay dos formas de que un pool falle bajo carga. Una es el timeout largo — threads que esperan 30 segundos y eventualmente explotan el heap. La otra es el timeout corto — rechazos rápidos que generan error rate alto pero dan la ilusión de baja latencia. El laboratorio hizo visible las dos con números reales.&lt;/p&gt;

&lt;p&gt;No compro la idea de que hay un &lt;code&gt;maximumPoolSize&lt;/code&gt; correcto universal. Lo que hay es un tamaño correcto para la combinación de tiempo de query, concurrencia esperada, y capacidad de la base de datos. Y ese número solo tiene sentido leído junto con el tiempo de retención de conexiones y la tasa de error — no en aislamiento.&lt;/p&gt;

&lt;p&gt;El repo tiene todo lo necesario para correrlo de nuevo en la entorno y comparar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\scripts\run-matrix.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Vus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Duration&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;60s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Si cambiás el delay, la concurrencia o el &lt;code&gt;maximumPoolSize&lt;/code&gt;, las señales cambian. Eso es exactamente el punto.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/JuanTorchia/hikaricp-pool-experiment" rel="noopener noreferrer"&gt;github.com/JuanTorchia/hikaricp-pool-experiment&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Referencia:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HikariCP GitHub — Configuration: &lt;a href="https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby" rel="noopener noreferrer"&gt;https://github.com/brettwooldridge/HikariCP#gear-configuration-knobs-baby&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>experimentos</category>
      <category>performance</category>
    </item>
    <item>
      <title>pnpm workspaces: the CI cache that survived the fix and cost me 40 minutes per build</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Tue, 12 May 2026 14:31:10 +0000</pubDate>
      <link>https://dev.to/jtorchia/pnpm-workspaces-the-ci-cache-that-survived-the-fix-and-cost-me-40-minutes-per-build-2807</link>
      <guid>https://dev.to/jtorchia/pnpm-workspaces-the-ci-cache-that-survived-the-fix-and-cost-me-40-minutes-per-build-2807</guid>
      <description>&lt;h1&gt;
  
  
  pnpm workspaces: the CI cache that survived the fix and cost me 40 minutes per build
&lt;/h1&gt;

&lt;p&gt;I finished my previous post convinced the monorepo was solid. Tests green, deploy successful, pnpm workspaces configured exactly as the docs say. Went to bed happy.&lt;/p&gt;

&lt;p&gt;Next morning I checked the third CI run and saw this in the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Cache not found for input keys: node-modules-cache-abc123
Run pnpm install --frozen-lockfile
&lt;/span&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="go"&gt;Progress: resolved 847, reused 0, downloaded 847, added 847
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;reused 0&lt;/code&gt;. Eight hundred and forty-seven packages downloaded from scratch. Forty minutes of build time where it should've been eight.&lt;/p&gt;

&lt;p&gt;My thesis, before I get into the details: &lt;strong&gt;pnpm's cache in GitHub Actions does not work out-of-the-box with monorepos&lt;/strong&gt;. Not because pnpm is broken — pnpm is excellent, I'll say that without ambiguity — but because the store-dir in CI behaves differently than it does locally, and most people never configure it explicitly. That invisible difference destroys any cache strategy that doesn't account for it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The real problem: pnpm store-dir in CI isn't where you think it is
&lt;/h2&gt;

&lt;p&gt;When you run &lt;code&gt;pnpm install&lt;/code&gt; on your machine, the global store lives at &lt;code&gt;~/.local/share/pnpm/store&lt;/code&gt; (Linux) or &lt;code&gt;~/Library/pnpm/store&lt;/code&gt; (macOS). Every project on your system shares that store — if a package already exists, pnpm links it with hard links. Instantaneous.&lt;/p&gt;

&lt;p&gt;In GitHub Actions, the runner starts clean on every execution. There's no previous store. So pnpm has two possible behaviors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Without explicit configuration&lt;/strong&gt;: pnpm picks a dynamic path for the store — sometimes inside the workspace, sometimes in a temp dir on the runner. The path changes between runners and between runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With an explicit &lt;code&gt;--store-dir&lt;/code&gt;&lt;/strong&gt;: pnpm always uses exactly that path. You can cache that path with &lt;code&gt;actions/cache&lt;/code&gt; and restore it on the next run.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The problem with case 1 is that &lt;code&gt;actions/cache&lt;/code&gt; needs a fixed path to work. If the store path varies, the restore never matches even if the key is identical. The cache exists in GitHub's S3, but it never gets restored because pnpm is looking in a different directory.&lt;/p&gt;

&lt;p&gt;This is exactly what pnpm's official CI documentation covers — but it's buried in the advanced configuration section, not in the quickstart that everyone copies.&lt;/p&gt;




&lt;h2&gt;
  
  
  The YAML before the fix: what everyone was copying
&lt;/h2&gt;

&lt;p&gt;This was the workflow I had, assembled from a handful of tutorials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow BEFORE — broken cache in monorepo&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="c1"&gt;# ⚠️ cache: 'pnpm' here looks like it does something, but it doesn't configure store-dir&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pnpm'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;cache: 'pnpm'&lt;/code&gt; in &lt;code&gt;setup-node&lt;/code&gt; caches &lt;code&gt;node_modules&lt;/code&gt; at the root project level. In a monorepo with workspaces, that's not enough: each package has its own &lt;code&gt;node_modules&lt;/code&gt; with symlinks pointing back to the global store. If the store doesn't restore correctly, those symlinks point to nothing and pnpm reinstalls everything.&lt;/p&gt;

&lt;p&gt;The cache miss in the logs looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;##[group]Cache not found
  Key: node-modules-pnpm-store-Linux-abc1234def5678
  Restore keys attempted:
    node-modules-pnpm-store-Linux-
    node-modules-pnpm-store-
  Cache Size: ~0 B
##[endgroup]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cache restored: zero bytes. Every run started from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  The YAML after: explicit store-dir and workspace lockfile hashing
&lt;/h2&gt;

&lt;p&gt;The fix requires three concrete changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow AFTER — cache that actually works in a monorepo&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Fixed store path — critical so actions/cache always finds the same thing&lt;/span&gt;
      &lt;span class="na"&gt;PNPM_STORE_PATH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.pnpm-store&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="c1"&gt;# No cache: 'pnpm' here — we manage it manually below&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Get pnpm store path&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-cache&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# Force the explicit store-dir so the path is predictable&lt;/span&gt;
          &lt;span class="s"&gt;pnpm config set store-dir $PNPM_STORE_PATH&lt;/span&gt;
          &lt;span class="s"&gt;echo "store-path=$PNPM_STORE_PATH" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restore pnpm store cache&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.pnpm-cache.outputs.store-path }}&lt;/span&gt;
          &lt;span class="c1"&gt;# Key includes lockfile hash — invalidates when dependencies change&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-store-${{ runner.os }}-${{ hashFiles('**/pnpm-lock.yaml') }}&lt;/span&gt;
          &lt;span class="c1"&gt;# Broader restore key in case the lockfile changed partially&lt;/span&gt;
          &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;pnpm-store-${{ runner.os }}-&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build workspaces&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run -r build&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run -r test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical changes are in three places:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;PNPM_STORE_PATH&lt;/code&gt; as a fixed environment variable.&lt;/strong&gt; Without this, every runner picks its own path. With this, the store always lives at &lt;code&gt;~/.pnpm-store&lt;/code&gt; and &lt;code&gt;actions/cache&lt;/code&gt; knows exactly what to restore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;pnpm config set store-dir&lt;/code&gt; before install.&lt;/strong&gt; Defining the environment variable isn't enough — you have to explicitly tell pnpm to use that path. This is the line missing from 90% of the examples I found.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. `hashFiles('&lt;/strong&gt;/pnpm-lock.yaml')&lt;code&gt;.** The &lt;/code&gt;&lt;strong&gt;&lt;code&gt; matters. In a monorepo you can have lockfiles per workspace in addition to the root one. With &lt;/code&gt;&lt;/strong&gt;/pnpm-lock.yaml&lt;code&gt;, the cache key changes if any lockfile in the repo changes. With just &lt;/code&gt;pnpm-lock.yaml`, you miss changes in nested workspaces.&lt;/p&gt;




&lt;h2&gt;
  
  
  The gotchas nobody documents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A broad &lt;code&gt;restore-keys&lt;/code&gt; can cause more damage than good
&lt;/h3&gt;

&lt;p&gt;With &lt;code&gt;restore-keys: pnpm-store-${{ runner.os }}-&lt;/code&gt; you're telling GitHub Actions "if you can't find the exact key, use the most recent cache that matches this prefix." Sounds reasonable. The problem is a partially-restored store (from a different lockfile) can cause subtle conflicts where pnpm thinks a package is installed but it's missing a transitive dependency.&lt;/p&gt;

&lt;p&gt;My solution: use the broad restore-key only to reduce initial download time, but always run &lt;code&gt;pnpm install --frozen-lockfile&lt;/code&gt; afterwards. The &lt;code&gt;--frozen-lockfile&lt;/code&gt; guarantees consistency even if the store is partially stale.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;pnpm run -r build&lt;/code&gt; doesn't respect dependency order between workspaces by default
&lt;/h3&gt;

&lt;p&gt;If &lt;code&gt;apps/web&lt;/code&gt; depends on &lt;code&gt;packages/ui&lt;/code&gt;, you need &lt;code&gt;packages/ui&lt;/code&gt; to build first. &lt;code&gt;pnpm run -r build&lt;/code&gt; runs in parallel by default. The fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Respect the workspace dependency graph order&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build in topological order&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run --filter="..." --workspace-concurrency=1 build&lt;/span&gt;
  &lt;span class="c1"&gt;# Or better yet, using the --sort flag:&lt;/span&gt;
  &lt;span class="c1"&gt;# pnpm run -r --sort build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--sort&lt;/code&gt; flag makes pnpm respect the workspace dependency graph. Without this, in a monorepo with shared packages you'll see import errors for things that don't exist yet because the package you depend on hasn't compiled yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cache is saved at the end of the job, not the beginning
&lt;/h3&gt;

&lt;p&gt;This is &lt;code&gt;actions/cache&lt;/code&gt; behavior that burns a lot of people: the cache is persisted when the job finishes &lt;em&gt;successfully&lt;/em&gt;. If the job fails on the build step (after installing dependencies), the new store cache doesn't get saved. The next run downloads everything again.&lt;/p&gt;

&lt;p&gt;To mitigate this, you can split install into its own job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;install&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Only installs and caches — always finishes successfully if deps are fine&lt;/span&gt;
      &lt;span class="s"&gt;...&lt;/span&gt;

  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;install&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Restores the cache from the previous job and builds&lt;/span&gt;
      &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The actual numbers
&lt;/h2&gt;

&lt;p&gt;In a reproducible scenario with a three-workspace monorepo (&lt;code&gt;apps/web&lt;/code&gt;, &lt;code&gt;packages/ui&lt;/code&gt;, &lt;code&gt;packages/config&lt;/code&gt;) and ~850 total dependencies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuration&lt;/th&gt;
&lt;th&gt;Install time&lt;/th&gt;
&lt;th&gt;Total CI time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No cache (downloads everything)&lt;/td&gt;
&lt;td&gt;~22 min&lt;/td&gt;
&lt;td&gt;~40 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cache: 'pnpm'&lt;/code&gt; in setup-node (broken cache)&lt;/td&gt;
&lt;td&gt;~20 min&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Explicit store-dir + lockfile hash&lt;/td&gt;
&lt;td&gt;~1.5 min&lt;/td&gt;
&lt;td&gt;~8 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "broken cache" in the second row is the most treacherous case: the workflow shows the cache step exists, the log says "Cache found" on some runs, but the restore is partial. The time drops by barely 2 minutes because something is restored — just not enough to avoid most of the downloads.&lt;/p&gt;

&lt;p&gt;The difference between 38 and 8 minutes is exactly the kind of overhead that accumulates silently. A team of four people doing ten PRs a day is 1,200 minutes of wasted build time per week.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: pnpm workspaces cache GitHub Actions CI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why doesn't &lt;code&gt;cache: 'pnpm'&lt;/code&gt; in &lt;code&gt;actions/setup-node&lt;/code&gt; work well with monorepos?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Because it caches the &lt;code&gt;node_modules&lt;/code&gt; in the root directory but not pnpm's global store. In a monorepo with workspaces, each package has its own &lt;code&gt;node_modules&lt;/code&gt; with symlinks pointing to the store. If the store doesn't restore correctly, pnpm detects the broken symlinks and reinstalls everything from scratch. The fix is to cache the store directly with &lt;code&gt;actions/cache&lt;/code&gt; and an explicit path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What path does the pnpm store use in GitHub Actions runners?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Without explicit configuration, it varies. On Ubuntu runners it might be at &lt;code&gt;/home/runner/.local/share/pnpm/store&lt;/code&gt; or in a temp path inside the workspace. That's exactly why the first rule is to define &lt;code&gt;store-dir&lt;/code&gt; explicitly with &lt;code&gt;pnpm config set store-dir&lt;/code&gt; before running &lt;code&gt;pnpm install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's the right cache key strategy for pnpm in a monorepo?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;hashFiles('**/pnpm-lock.yaml')&lt;/code&gt; with the double-asterisk glob. This includes the root lockfile and any lockfiles in subdirectories. Combined with &lt;code&gt;runner.os&lt;/code&gt; to separate caches between Linux and macOS if you run on both. The broad restore-key without the hash works as a fallback but never as the primary key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need to change anything in &lt;code&gt;pnpm-workspace.yaml&lt;/code&gt; for better cache behavior?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not directly. &lt;code&gt;pnpm-workspace.yaml&lt;/code&gt; defines the workspace structure, not store behavior. What does matter is that all packages have their dependencies properly declared in their respective &lt;code&gt;package.json&lt;/code&gt; files. If a package uses a dependency that's only in the root without declaring it, pnpm might resolve it locally but fail in CI when the store is partially restored.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is it worth separating the install job from the build job?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Depends on the size of the monorepo. For repos with more than 500 dependencies and builds that fail frequently (tests, linting) — yes, it's worth it: it guarantees the cache gets persisted even when the build fails. For small repos where install is fast, it's unnecessary overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this work the same with pnpm 9 and Node.js 22?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes. The store-dir configuration has been stable since pnpm 8. With &lt;code&gt;pnpm/action-setup@v4&lt;/code&gt; and &lt;code&gt;actions/setup-node@v4&lt;/code&gt; the setup is identical regardless of Node version. What changes between pnpm versions are some command flags — &lt;code&gt;--workspace-concurrency&lt;/code&gt; was renamed at some point — but the cache logic is the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  The uncomfortable thing nobody says about pnpm and CI
&lt;/h2&gt;

&lt;p&gt;pnpm is the best option for monorepos — I said it &lt;a href="https://juanchi.dev/en/blog/pnpm-vs-npm-vs-yarn-2026-monorepo-real-benchmark" rel="noopener noreferrer"&gt;when I compared pnpm vs npm vs yarn with real benchmarks&lt;/a&gt; and I stand by it. But it has a CI configuration curve that's genuinely frustrating because the errors are silent. The workflow "works" — CI doesn't explode, tests pass — but the cache is broken and nobody notices until someone actually looks at the timing with some attention.&lt;/p&gt;

&lt;p&gt;The previous post about &lt;a href="https://juanchi.dev/en/blog/pnpm-workspaces-nextjs-16-monorepo-ci-hoisting-cache" rel="noopener noreferrer"&gt;pnpm workspaces in a monorepo with Next.js 16&lt;/a&gt; ended with CI green. This post is what was left unresolved: the cache that survived the initial fix and kept silently costing time on every run. The lesson isn't that pnpm is poorly documented — the official CI docs are clear if you read them completely. The lesson is that "CI working" and "CI working efficiently" are two completely different states, and the second one requires you to watch the numbers, not just the green checkmark.&lt;/p&gt;

&lt;p&gt;If you're starting a new monorepo today, copy the fixed YAML directly. Don't use &lt;code&gt;cache: 'pnpm'&lt;/code&gt; from setup-node as your only strategy. Configure store-dir before install. Use the &lt;code&gt;**/pnpm-lock.yaml&lt;/code&gt; glob for the hash. That's ten extra lines that save thirty minutes per run.&lt;/p&gt;

&lt;p&gt;For architectures where CI time matters at scale — and if you're designing distributed systems, it does — these infrastructure details are part of the job. The same rigor I apply to digital signature system design or to analyzing &lt;a href="https://juanchi.dev/en/blog/jakarta-ee-vs-spring-boot-2026-production-migration-tradeoffs" rel="noopener noreferrer"&gt;Jakarta EE vs Spring Boot tradeoffs&lt;/a&gt; applies here: reasonable defaults are rarely the correct defaults for real-world cases.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pnpm.io/continuous-integration" rel="noopener noreferrer"&gt;pnpm Docs — Continuous Integration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This article was originally published on &lt;a href="https://juanchi.dev/en/blog/pnpm-workspaces-ci-cache-github-actions-40-minutes-fix" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>english</category>
      <category>typescript</category>
      <category>pnpm</category>
      <category>node</category>
    </item>
    <item>
      <title>pnpm workspaces: el caché de CI que sobrevivió al fix y me costó 40 minutos de build</title>
      <dc:creator>Juan Torchia</dc:creator>
      <pubDate>Tue, 12 May 2026 14:31:05 +0000</pubDate>
      <link>https://dev.to/jtorchia/pnpm-workspaces-el-cache-de-ci-que-sobrevivio-al-fix-y-me-costo-40-minutos-de-build-1hde</link>
      <guid>https://dev.to/jtorchia/pnpm-workspaces-el-cache-de-ci-que-sobrevivio-al-fix-y-me-costo-40-minutos-de-build-1hde</guid>
      <description>&lt;h1&gt;
  
  
  pnpm workspaces: el caché de CI que sobrevivió al fix y me costó 40 minutos de build
&lt;/h1&gt;

&lt;p&gt;Terminé el post anterior convencido de que el monorepo andaba. Tests en verde, deploy exitoso, pnpm workspaces configurado como la documentación dice. Me fui a dormir contento.&lt;/p&gt;

&lt;p&gt;Al día siguiente revisé el tercer run de CI y vi esto en los logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Cache not found for input keys: node-modules-cache-abc123
Run pnpm install --frozen-lockfile
&lt;/span&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;span class="go"&gt;Progress: resolved 847, reused 0, downloaded 847, added 847
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;reused 0&lt;/code&gt;. Ochocientos cuarenta y siete paquetes descargados de cero. Cuarenta minutos de build donde deberían ser ocho.&lt;/p&gt;

&lt;p&gt;Mi tesis, antes de entrar al detalle: &lt;strong&gt;el caché de pnpm en GitHub Actions no funciona out-of-the-box con monorepos&lt;/strong&gt;. No porque pnpm esté roto — pnpm es excelente, lo digo sin ambigüedad — sino porque el store-dir en CI tiene un comportamiento distinto al local que la mayoría no configura explícitamente. Y esa diferencia invisible destruye cualquier estrategia de caché que no la tenga en cuenta.&lt;/p&gt;




&lt;h2&gt;
  
  
  El problema real: pnpm store-dir en CI no es donde pensás
&lt;/h2&gt;

&lt;p&gt;Cuando corrés &lt;code&gt;pnpm install&lt;/code&gt; en tu máquina, el store global está en &lt;code&gt;~/.local/share/pnpm/store&lt;/code&gt; (Linux) o &lt;code&gt;~/Library/pnpm/store&lt;/code&gt; (macOS). Todos los proyectos del sistema comparten ese store: si un paquete ya existe, pnpm lo linkea con hard links. Instantáneo.&lt;/p&gt;

&lt;p&gt;En GitHub Actions, el runner arranca limpio con cada ejecución. No hay un store previo. Entonces pnpm tiene dos comportamientos posibles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Sin configuración explícita&lt;/strong&gt;: pnpm elige una ruta dinámica para el store — a veces dentro del workspace, a veces en un temp dir del runner. El path cambia entre runners y entre runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Con &lt;code&gt;--store-dir&lt;/code&gt; explícito&lt;/strong&gt;: pnpm siempre usa exactamente esa ruta. Podés cachear esa ruta con &lt;code&gt;actions/cache&lt;/code&gt; y recuperarla en el próximo run.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;El problema con el caso 1 es que &lt;code&gt;actions/cache&lt;/code&gt; necesita un path fijo para funcionar. Si el path del store varía, el restore nunca hace match aunque el key sea idéntico. El caché existe en S3 de GitHub, pero nunca se restaura porque pnpm busca en otro directorio.&lt;/p&gt;

&lt;p&gt;Esto es exactamente lo que muestra la documentación oficial de pnpm para CI — pero está enterrado en la sección de configuración avanzada, no en el quickstart que todos copian.&lt;/p&gt;




&lt;h2&gt;
  
  
  El YAML antes del fix: lo que copiaba todo el mundo
&lt;/h2&gt;

&lt;p&gt;Este era el workflow que tenía, armado a partir de varios tutoriales:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow ANTES — caché roto en monorepo&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="c1"&gt;# ⚠️ cache: 'pnpm' acá parece que hace algo, pero no configura el store-dir&lt;/span&gt;
          &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pnpm'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Instalar dependencias&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El &lt;code&gt;cache: 'pnpm'&lt;/code&gt; en &lt;code&gt;setup-node&lt;/code&gt; cachea &lt;code&gt;node_modules&lt;/code&gt; a nivel de proyecto raíz. En un monorepo con workspaces, eso es insuficiente: cada package tiene su propio &lt;code&gt;node_modules&lt;/code&gt; con symlinks al store global. Si el store no se restaura correctamente, los symlinks apuntan a la nada y pnpm reinstala todo.&lt;/p&gt;

&lt;p&gt;El cache miss en los logs se veía así:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="c"&gt;#[group]Cache not found&lt;/span&gt;
&lt;span class="go"&gt;  Key: node-modules-pnpm-store-Linux-abc1234def5678
  Restore keys attempted:
    node-modules-pnpm-store-Linux-
    node-modules-pnpm-store-
  Cache Size: ~0 B
&lt;/span&gt;&lt;span class="gp"&gt;#&lt;/span&gt;&lt;span class="c"&gt;#[endgroup]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tamaño de caché restaurado: cero bytes. Cada run partía de cero.&lt;/p&gt;




&lt;h2&gt;
  
  
  El YAML después: store-dir explícito y hash por workspace
&lt;/h2&gt;

&lt;p&gt;La solución requiere tres cambios concretos:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow DESPUÉS — caché que realmente funciona en monorepo&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CI&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Path fijo del store — crítico para que actions/cache encuentre siempre lo mismo&lt;/span&gt;
      &lt;span class="na"&gt;PNPM_STORE_PATH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;~/.pnpm-store&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm/action-setup@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-node@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;node-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
          &lt;span class="c1"&gt;# Sin cache: 'pnpm' acá — lo manejamos manualmente abajo&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Obtener path del store de pnpm&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-cache&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;# Forzamos el store-dir explícito para que el path sea predecible&lt;/span&gt;
          &lt;span class="s"&gt;pnpm config set store-dir $PNPM_STORE_PATH&lt;/span&gt;
          &lt;span class="s"&gt;echo "store-path=$PNPM_STORE_PATH" &amp;gt;&amp;gt; $GITHUB_OUTPUT&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restaurar caché del store de pnpm&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.pnpm-cache.outputs.store-path }}&lt;/span&gt;
          &lt;span class="c1"&gt;# Key con hash del lockfile — invalida cuando cambian dependencias&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm-store-${{ runner.os }}-${{ hashFiles('**/pnpm-lock.yaml') }}&lt;/span&gt;
          &lt;span class="c1"&gt;# Restore key más amplia por si el lockfile cambió parcialmente&lt;/span&gt;
          &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;pnpm-store-${{ runner.os }}-&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Instalar dependencias&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm install --frozen-lockfile&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build workspaces&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run -r build&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run -r test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El cambio crítico está en tres lugares:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;PNPM_STORE_PATH&lt;/code&gt; como variable de entorno fija.&lt;/strong&gt; Sin esto, cada runner elige su propio path. Con esto, el store siempre vive en &lt;code&gt;~/.pnpm-store&lt;/code&gt; y &lt;code&gt;actions/cache&lt;/code&gt; sabe exactamente qué restaurar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. &lt;code&gt;pnpm config set store-dir&lt;/code&gt; antes del install.&lt;/strong&gt; No alcanza con definir la variable de entorno: hay que decirle explícitamente a pnpm que use ese path. Esta es la línea que faltaba en el 90% de los ejemplos que encontré.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. `hashFiles('&lt;/strong&gt;/pnpm-lock.yaml')&lt;code&gt;.** El &lt;/code&gt;&lt;strong&gt;&lt;code&gt; es importante. En un monorepo podés tener lockfiles por workspace además del raíz. Con &lt;/code&gt;&lt;/strong&gt;/pnpm-lock.yaml&lt;code&gt; el key de caché cambia si cualquier lockfile del repo cambia. Con solo &lt;/code&gt;pnpm-lock.yaml` te perdés cambios en workspaces anidados.&lt;/p&gt;




&lt;h2&gt;
  
  
  Los gotchas que nadie documenta
&lt;/h2&gt;

&lt;h3&gt;
  
  
  El &lt;code&gt;restore-keys&lt;/code&gt; amplio puede hacer más daño que bien
&lt;/h3&gt;

&lt;p&gt;Con &lt;code&gt;restore-keys: pnpm-store-${{ runner.os }}-&lt;/code&gt; le decís a GitHub Actions "si no encontrás el key exacto, usá el caché más reciente que matchee este prefijo". Suena razonable. El problema es que un store parcialmente restaurado (de un lockfile diferente) puede causar conflictos sutiles donde pnpm cree que un paquete está instalado pero le falta una dependencia transitiva.&lt;/p&gt;

&lt;p&gt;Mi solución: usar el restore-key amplio solo para reducir el tiempo de descarga inicial, pero siempre correr &lt;code&gt;pnpm install --frozen-lockfile&lt;/code&gt; después. El &lt;code&gt;--frozen-lockfile&lt;/code&gt; garantiza consistencia aunque el store esté parcialmente stale.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;pnpm run -r build&lt;/code&gt; no respeta el orden de dependencias entre workspaces por default
&lt;/h3&gt;

&lt;p&gt;Si el package &lt;code&gt;apps/web&lt;/code&gt; depende de &lt;code&gt;packages/ui&lt;/code&gt;, necesitás que &lt;code&gt;packages/ui&lt;/code&gt; se buildee primero. &lt;code&gt;pnpm run -r build&lt;/code&gt; corre en paralelo por default. La solución:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Respetar el orden del workspace graph&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build en orden topológico&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pnpm run --filter="..." --workspace-concurrency=1 build&lt;/span&gt;
  &lt;span class="c1"&gt;# O mejor aún, usando el flag --sort:&lt;/span&gt;
  &lt;span class="c1"&gt;# pnpm run -r --sort build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;El flag &lt;code&gt;--sort&lt;/code&gt; hace que pnpm respete el grafo de dependencias del workspace. Sin esto, en un monorepo con shared packages vas a ver errores de imports que no existen todavía porque el package del que dependés todavía no compiló.&lt;/p&gt;

&lt;h3&gt;
  
  
  El caché se guarda al final del job, no al principio
&lt;/h3&gt;

&lt;p&gt;Esto es un comportamiento de &lt;code&gt;actions/cache&lt;/code&gt; que quema a mucha gente: el caché se persiste cuando el job termina exitosamente. Si el job falla en el step de build (después de instalar dependencias), el nuevo caché del store no se guarda. El próximo run vuelve a descargar todo.&lt;/p&gt;

&lt;p&gt;Para mitigar esto, podés separar el install en un job propio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;install&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Solo instala y cachea — siempre termina exitoso si las deps están bien&lt;/span&gt;
      &lt;span class="s"&gt;...&lt;/span&gt;

  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;install&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="c1"&gt;# Restaura el caché del job anterior y buildea&lt;/span&gt;
      &lt;span class="s"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Los números concretos
&lt;/h2&gt;

&lt;p&gt;En un escenario reproducible con un monorepo de tres workspaces (&lt;code&gt;apps/web&lt;/code&gt;, &lt;code&gt;packages/ui&lt;/code&gt;, &lt;code&gt;packages/config&lt;/code&gt;) y un total de ~850 dependencias:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Configuración&lt;/th&gt;
&lt;th&gt;Tiempo de install&lt;/th&gt;
&lt;th&gt;Tiempo total de CI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sin caché (descarga todo)&lt;/td&gt;
&lt;td&gt;~22 min&lt;/td&gt;
&lt;td&gt;~40 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;cache: 'pnpm'&lt;/code&gt; en setup-node (caché roto)&lt;/td&gt;
&lt;td&gt;~20 min&lt;/td&gt;
&lt;td&gt;~38 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Store-dir explícito + lockfile hash&lt;/td&gt;
&lt;td&gt;~1.5 min&lt;/td&gt;
&lt;td&gt;~8 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;El "caché roto" de la segunda fila es el caso más traicionero: el workflow muestra que el paso de caché existe, el log dice "Cache found" en algunas corridas, pero el restore es parcial. El tiempo baja apenas 2 minutos porque algo se restaura — pero no lo suficiente para evitar la mayoría de las descargas.&lt;/p&gt;

&lt;p&gt;La diferencia entre 38 y 8 minutos es exactamente el tipo de overhead que se acumula silencioso. Un equipo de cuatro personas haciendo diez PRs por día son 1200 minutos de build time desperdiciado por semana.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: pnpm workspaces cache GitHub Actions CI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;¿Por qué &lt;code&gt;cache: 'pnpm'&lt;/code&gt; en &lt;code&gt;actions/setup-node&lt;/code&gt; no funciona bien con monorepos?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Porque cachea el &lt;code&gt;node_modules&lt;/code&gt; del directorio raíz pero no el store global de pnpm. En un monorepo con workspaces, cada package tiene su propio &lt;code&gt;node_modules&lt;/code&gt; con symlinks al store. Si el store no se restaura correctamente, pnpm detecta que los symlinks están rotos y reinstala todo de cero. La solución es cachear el store directamente con &lt;code&gt;actions/cache&lt;/code&gt; y un path explícito.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Qué path tiene el store de pnpm en GitHub Actions runners?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sin configuración explícita, varía. En runners Ubuntu puede estar en &lt;code&gt;/home/runner/.local/share/pnpm/store&lt;/code&gt; o en un path temporal dentro del workspace. Por eso la primera regla es definir &lt;code&gt;store-dir&lt;/code&gt; explícitamente con &lt;code&gt;pnpm config set store-dir&lt;/code&gt; antes de correr &lt;code&gt;pnpm install&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Cuál es la estrategia correcta de key para el caché de pnpm en monorepo?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Usar &lt;code&gt;hashFiles('**/pnpm-lock.yaml')&lt;/code&gt; con el glob doble asterisco. Esto incluye el lockfile raíz y cualquier lockfile en subdirectorios. Combinado con &lt;code&gt;runner.os&lt;/code&gt; para separar caché entre Linux y macOS si corrés en ambos. El restore-key amplio sin el hash sirve como fallback pero nunca como key principal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Tengo que cambiar algo en &lt;code&gt;pnpm-workspace.yaml&lt;/code&gt; para que el caché funcione mejor?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;No directamente. &lt;code&gt;pnpm-workspace.yaml&lt;/code&gt; define la estructura del workspace, no el comportamiento del store. Lo que sí importa es que todos los packages tengan sus dependencias declaradas correctamente en sus respectivos &lt;code&gt;package.json&lt;/code&gt;: si un package usa una dependencia que solo está en el raíz sin declararlo, pnpm puede resolver localmente pero fallar en CI cuando el store se restaura parcialmente.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Vale la pena separar el job de install del job de build?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Depende del tamaño del monorepo. Para repos con más de 500 dependencias y builds que pueden fallar frecuentemente (tests, linting), sí vale: garantiza que el caché se persiste incluso cuando el build falla. Para repos chicos donde el install es rápido, es overhead innecesario.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;¿Esto funciona igual con pnpm 9 y Node.js 22?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Sí. La configuración del store-dir es estable desde pnpm 8. Con &lt;code&gt;pnpm/action-setup@v4&lt;/code&gt; y &lt;code&gt;actions/setup-node@v4&lt;/code&gt; el setup es el mismo independientemente de la versión de Node. Lo que cambia entre versiones de pnpm son los flags de algunos comandos — por ejemplo, &lt;code&gt;--workspace-concurrency&lt;/code&gt; fue renombrado en algún punto — pero la lógica de caché es idéntica.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lo incómodo que nadie dice sobre pnpm y CI
&lt;/h2&gt;

&lt;p&gt;pnpm es la mejor opción para monorepos — lo dije &lt;a href="https://juanchi.dev/es/blog/pnpm-vs-npm-2026-monorepo-benchmark-real" rel="noopener noreferrer"&gt;cuando comparé pnpm vs npm vs yarn con benchmarks reales&lt;/a&gt; y lo sostengo. Pero tiene una curva de configuración en CI que es genuinamente frustrante porque los errores son silenciosos. El workflow "funciona" — el CI no explota, los tests pasan — pero el caché está roto y nadie lo ve hasta que alguien mira los tiempos con atención.&lt;/p&gt;

&lt;p&gt;El post anterior sobre &lt;a href="https://juanchi.dev/es/blog/pnpm-workspaces-monorepo-nextjs-ci-cache-problemas" rel="noopener noreferrer"&gt;pnpm workspaces en monorepo con Next.js 16&lt;/a&gt; terminaba con el CI verde. Este post es lo que quedó sin resolver: el caché que sobrevivió al fix inicial y siguió costando tiempo en silencio. La lección no es que pnpm esté mal documentado — la documentación oficial de CI es clara si la leés completa. La lección es que "CI funcionando" y "CI funcionando eficientemente" son dos estados completamente distintos, y el segundo requiere que prestés atención a los números, no solo al check verde.&lt;/p&gt;

&lt;p&gt;Si arrancás un monorepo nuevo hoy, copiá el YAML del fix directamente. No uses el &lt;code&gt;cache: 'pnpm'&lt;/code&gt; de setup-node como única estrategia. Configurá el store-dir antes del install. Usá el glob &lt;code&gt;**/pnpm-lock.yaml&lt;/code&gt; para el hash. Son diez líneas extra que ahorran treinta minutos por run.&lt;/p&gt;

&lt;p&gt;Para arquitecturas donde el tiempo de CI importa a escala — y si estás diseñando sistemas distribuidos, importa — estos detalles de infraestructura son parte del trabajo. El mismo criterio que aplico al diseño de sistemas de firma digital o al análisis de &lt;a href="https://juanchi.dev/es/blog/jakarta-ee-vs-spring-boot-2026-migracion-backend-produccion-tradeoffs" rel="noopener noreferrer"&gt;tradeoffs de Jakarta EE vs Spring Boot&lt;/a&gt; aplica acá: los defaults razonables raramente son los defaults correctos para casos reales.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Fuente original:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pnpm.io/continuous-integration" rel="noopener noreferrer"&gt;pnpm Docs — Continuous Integration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Este artículo fue publicado originalmente en &lt;a href="https://juanchi.dev/es/blog/pnpm-workspaces-cache-github-actions-ci-problema" rel="noopener noreferrer"&gt;juanchi.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>spanish</category>
      <category>espanol</category>
      <category>typescript</category>
      <category>pnpm</category>
    </item>
  </channel>
</rss>
