<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: S M Tahosin</title>
    <description>The latest articles on DEV Community by S M Tahosin (@tahosin).</description>
    <link>https://dev.to/tahosin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3886453%2F0f012a95-ad46-4c17-97e8-125ec8b4978d.png</url>
      <title>DEV Community: S M Tahosin</title>
      <link>https://dev.to/tahosin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tahosin"/>
    <language>en</language>
    <item>
      <title>I Replaced 200 Threads With 10,000. Java Finished 13.5x Faster.</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:51:01 +0000</pubDate>
      <link>https://dev.to/tahosin/i-started-10000-java-threads-my-laptop-barely-noticed-2i6g</link>
      <guid>https://dev.to/tahosin/i-started-10000-java-threads-my-laptop-barely-noticed-2i6g</guid>
      <description>&lt;p&gt;I expected the fans to spin.&lt;/p&gt;

&lt;p&gt;I had just asked Java to start &lt;strong&gt;10,000 tasks&lt;/strong&gt;, give each task its own virtual&lt;br&gt;
thread, and make every one wait for 100 milliseconds.&lt;/p&gt;

&lt;p&gt;Instead, the program finished before I could move my hand away from Enter.&lt;/p&gt;

&lt;p&gt;So I ran it again. Then three more times.&lt;/p&gt;

&lt;p&gt;On my 12-logical-processor laptop, the median result looked like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Executor&lt;/th&gt;
&lt;th&gt;10,000 waiting tasks&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fixed pool of 200 platform threads&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5,116 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One virtual thread per task&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;378 ms&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is &lt;strong&gt;13.5x faster completion&lt;/strong&gt; after changing the executor, not the task.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6u4fp3uadf5qqd4gmzk8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6u4fp3uadf5qqd4gmzk8.png" alt="Benchmark results comparing platform and virtual threads" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is not proof that virtual threads make Java code 13.5x faster.&lt;/p&gt;

&lt;p&gt;It is proof that I had been thinking about threads incorrectly.&lt;/p&gt;

&lt;p&gt;Let us rebuild that mental model from the inside.&lt;/p&gt;
&lt;h2&gt;
  
  
  First, Make a Prediction
&lt;/h2&gt;

&lt;p&gt;Each task does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are 10,000 tasks.&lt;/p&gt;

&lt;p&gt;How long should the whole program take?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A:&lt;/strong&gt; About 1,000 seconds, because &lt;code&gt;10,000 x 100 ms = 1,000 seconds&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B:&lt;/strong&gt; About 5 seconds, because 200 platform threads process the work in waves&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C:&lt;/strong&gt; Well under 1 second, because waiting virtual threads can step aside&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three answers can be correct. The executor decides which world you live&lt;br&gt;
in.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Old Mental Model
&lt;/h2&gt;

&lt;p&gt;For most of Java's life, a Java thread was a thin wrapper around an operating&lt;br&gt;
system thread.&lt;/p&gt;

&lt;p&gt;That made threads useful, but expensive enough to treat as a limited resource.&lt;/p&gt;

&lt;p&gt;If your server had a pool of 200 platform threads and all 200 were waiting for&lt;br&gt;
a slow database, request 201 had to stand in line.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;request -&amp;gt; platform thread -&amp;gt; OS thread -&amp;gt; wait
request -&amp;gt; platform thread -&amp;gt; OS thread -&amp;gt; wait
request -&amp;gt;       queue       -&amp;gt;          -&amp;gt; wait for a free thread
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The code was blocked, but the operating system thread assigned to it was still&lt;br&gt;
occupied.&lt;/p&gt;

&lt;p&gt;Virtual threads break that one-to-one relationship.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ctp8cglum3dwayrbcjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ctp8cglum3dwayrbcjr.png" alt="Platform threads compared with virtual threads" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A virtual thread is still a real &lt;code&gt;java.lang.Thread&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The difference is that it does not permanently own an OS thread. The JVM&lt;br&gt;
schedules many virtual threads onto a smaller number of platform threads,&lt;br&gt;
called &lt;strong&gt;carrier threads&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can see the distinction directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;platform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofPlatform&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isVirtual&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;virtual&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofVirtual&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;isVirtual&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="n"&gt;platform&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;virtual&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;false
true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same &lt;code&gt;Thread&lt;/code&gt; API. Different scheduling model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens When a Virtual Thread Waits?
&lt;/h2&gt;

&lt;p&gt;Imagine a virtual thread running on carrier thread 3.&lt;/p&gt;

&lt;p&gt;It calls a supported blocking operation, such as &lt;code&gt;Thread.sleep()&lt;/code&gt; or blocking&lt;br&gt;
network I/O.&lt;/p&gt;

&lt;p&gt;The JVM can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pause the virtual thread.&lt;/li&gt;
&lt;li&gt;Unmount it from carrier thread 3.&lt;/li&gt;
&lt;li&gt;Use carrier thread 3 to run other virtual threads.&lt;/li&gt;
&lt;li&gt;Remount the original virtual thread when its wait is over.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedd4vam2dr97i4n76z7y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedd4vam2dr97i4n76z7y.png" alt="Timeline showing a virtual thread stepping aside while waiting" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The virtual thread did not make the database, network, or timer faster.&lt;/p&gt;

&lt;p&gt;It stopped wasting a scarce carrier thread while waiting.&lt;/p&gt;

&lt;p&gt;That sentence is the whole feature:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Virtual threads make &lt;strong&gt;waiting&lt;/strong&gt; cheap. They do not make &lt;strong&gt;work&lt;/strong&gt; cheap.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The Experiment
&lt;/h2&gt;

&lt;p&gt;Here is the important part of the benchmark.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="no"&gt;TASKS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10_000&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Duration&lt;/span&gt; &lt;span class="no"&gt;WAIT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ExecutorService&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;futures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ArrayList&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class="no"&gt;TASKS&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;TASKS&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

            &lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;add&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;submit&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sleep&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="no"&gt;WAIT&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;taskId&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
            &lt;span class="o"&gt;}));&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Integer&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;futures&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran the same method with two executors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newFixedThreadPool&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newVirtualThreadPerTaskExecutor&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first executor lets at most 200 tasks wait at once.&lt;/p&gt;

&lt;p&gt;The virtual-thread executor starts one virtual thread for every task. When the&lt;br&gt;
tasks sleep, the JVM can unmount them and keep its carrier threads available.&lt;/p&gt;

&lt;p&gt;That is why the fixed pool behaves roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10,000 tasks / 200 threads = 50 waves
50 waves x 100 ms          = about 5 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The virtual-thread version does not need 50 waves. Almost every task can begin,&lt;br&gt;
sleep, and get out of the carriers' way.&lt;/p&gt;

&lt;p&gt;The measured medians from three runs were:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WAITING WORK
200 platform threads        5,116 ms
virtual thread per task       378 ms

CPU WORK
platform threads            2,387 ms
virtual threads             2,300 ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The waiting result changed dramatically.&lt;/p&gt;

&lt;p&gt;The CPU result did not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark Trap
&lt;/h2&gt;

&lt;p&gt;Virtual threads are not tiny turbo buttons.&lt;/p&gt;

&lt;p&gt;To test that, I also submitted 48 CPU-heavy tasks that counted primes up to&lt;br&gt;
1,000,000.&lt;/p&gt;

&lt;p&gt;Both executors finished in roughly the same time because my laptop still had&lt;br&gt;
only 12 logical processors.&lt;/p&gt;

&lt;p&gt;You can create one million virtual threads.&lt;/p&gt;

&lt;p&gt;You cannot create one million CPU cores.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcfvj07axuus4y301gtw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcfvj07axuus4y301gtw.png" alt="Decision tree for choosing virtual threads" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Good virtual-thread workloads spend meaningful time waiting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTTP requests&lt;/li&gt;
&lt;li&gt;database queries&lt;/li&gt;
&lt;li&gt;many file operations, after profiling&lt;/li&gt;
&lt;li&gt;message queues&lt;/li&gt;
&lt;li&gt;remote API calls&lt;/li&gt;
&lt;li&gt;many independent &lt;code&gt;sleep()&lt;/code&gt; or timer waits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Poor candidates spend most of their time calculating:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;image processing&lt;/li&gt;
&lt;li&gt;video encoding&lt;/li&gt;
&lt;li&gt;compression&lt;/li&gt;
&lt;li&gt;machine-learning inference&lt;/li&gt;
&lt;li&gt;large in-memory transformations&lt;/li&gt;
&lt;li&gt;number crunching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For CPU-bound work, use bounded parallelism near the amount of CPU your machine&lt;br&gt;
can actually execute.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Simplest Useful Rule
&lt;/h2&gt;

&lt;p&gt;When tasks mostly wait:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newVirtualThreadPerTaskExecutor&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;submit&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;loadUser&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="nc"&gt;Future&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;submit&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;loadOrders&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;

    &lt;span class="n"&gt;renderProfile&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code is ordinary, blocking, and readable.&lt;/p&gt;

&lt;p&gt;That is intentional.&lt;/p&gt;

&lt;p&gt;For years, developers often had to choose between simple thread-per-request&lt;br&gt;
code that did not scale and asynchronous code that scaled but split the&lt;br&gt;
workflow across callbacks, futures, or reactive operators.&lt;/p&gt;

&lt;p&gt;Virtual threads make the simple shape practical for many high-throughput&lt;br&gt;
blocking applications.&lt;/p&gt;

&lt;p&gt;They do not remove every concurrency problem. They remove one expensive&lt;br&gt;
assumption: that every concurrent task needs its own OS thread.&lt;/p&gt;
&lt;h2&gt;
  
  
  Do Not Pool Virtual Threads
&lt;/h2&gt;

&lt;p&gt;This feels wrong at first.&lt;/p&gt;

&lt;p&gt;We learned to pool threads because platform threads were expensive. A pool&lt;br&gt;
limited how many of those scarce threads existed.&lt;/p&gt;

&lt;p&gt;Virtual threads are designed to be created per task.&lt;/p&gt;

&lt;p&gt;So this is the normal pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Executors&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newVirtualThreadPerTaskExecutor&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;a tiny pool of reusable virtual threads
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you must limit access to something scarce, limit &lt;strong&gt;that thing&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Suppose a partner API permits only 20 concurrent requests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Semaphore&lt;/span&gt; &lt;span class="n"&gt;partnerApiSlots&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Semaphore&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;callPartnerApi&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;partnerApiSlots&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;acquire&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;makeBlockingHttpRequest&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;partnerApiSlots&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;release&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbmj39g8cw3r2wlr0izs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbmj39g8cw3r2wlr0izs.png" alt="Many virtual threads passing through a semaphore before a partner API" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The executor can still create a virtual thread per task.&lt;/p&gt;

&lt;p&gt;The semaphore protects the actual bottleneck.&lt;/p&gt;

&lt;p&gt;This separation is useful far beyond virtual threads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Concurrency is how much work can be in progress. Capacity is how much work a&lt;br&gt;
dependency can safely accept.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Quiet &lt;code&gt;ThreadLocal&lt;/code&gt; Trap
&lt;/h2&gt;

&lt;p&gt;Virtual threads support &lt;code&gt;ThreadLocal&lt;/code&gt;, so request context such as a user ID or&lt;br&gt;
trace ID can continue to work.&lt;/p&gt;

&lt;p&gt;The dangerous pattern is using &lt;code&gt;ThreadLocal&lt;/code&gt; as a tiny object pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ThreadLocal&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ExpensiveClient&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;CLIENT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="nc"&gt;ThreadLocal&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withInitial&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nl"&gt;ExpensiveClient:&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That may look efficient when 200 pooled platform threads reuse 200 clients.&lt;/p&gt;

&lt;p&gt;With one virtual thread per task, it can quietly become thousands of expensive&lt;br&gt;
clients that are barely reused.&lt;/p&gt;

&lt;p&gt;Keep context in thread-local variables only when it truly belongs to the task.&lt;br&gt;
Do not use them to cache heavy reusable objects per virtual thread.&lt;/p&gt;
&lt;h2&gt;
  
  
  You Can Observe Them
&lt;/h2&gt;

&lt;p&gt;Virtual threads are invisible to the operating system because the OS sees&lt;br&gt;
carrier threads, not every virtual thread.&lt;/p&gt;

&lt;p&gt;The JDK understands them, though.&lt;/p&gt;

&lt;p&gt;You can create a virtual-thread-aware dump with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jcmd &amp;lt;pid&amp;gt; Thread.dump_to_file &lt;span class="nt"&gt;-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;json threads.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That distinction matters during debugging. An OS dashboard may show a modest&lt;br&gt;
thread count while the JVM is managing thousands of virtual threads.&lt;/p&gt;

&lt;p&gt;The right question is not only "how many threads exist?"&lt;/p&gt;

&lt;p&gt;It is "what are those threads waiting for?"&lt;/p&gt;
&lt;h2&gt;
  
  
  One Outdated Warning
&lt;/h2&gt;

&lt;p&gt;You may have read this advice:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never block inside &lt;code&gt;synchronized&lt;/code&gt; code when using virtual threads, because it&lt;br&gt;
pins the carrier thread.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That warning mattered when virtual threads became final in Java 21.&lt;/p&gt;

&lt;p&gt;Java 24 changed the implementation through&lt;br&gt;
&lt;a href="https://openjdk.org/jeps/491" rel="noopener noreferrer"&gt;JEP 491&lt;/a&gt;. Virtual threads can now release their&lt;br&gt;
carrier when blocking inside &lt;code&gt;synchronized&lt;/code&gt; code in the normal case.&lt;/p&gt;

&lt;p&gt;Pinning has not vanished completely. Native and foreign-function calls can&lt;br&gt;
still pin a virtual thread.&lt;/p&gt;

&lt;p&gt;But the blanket "virtual threads and &lt;code&gt;synchronized&lt;/code&gt; do not mix" rule is&lt;br&gt;
outdated on modern JDKs.&lt;/p&gt;

&lt;p&gt;This is one reason I ran the experiment on Java 25 LTS instead of repeating an&lt;br&gt;
old Java 21 checklist.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Five-Minute Migration Checklist
&lt;/h2&gt;

&lt;p&gt;Do not rewrite an application because virtual threads sound exciting.&lt;/p&gt;

&lt;p&gt;Take one blocking workflow and inspect it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Confirm the workload waits.&lt;/strong&gt; Look for database calls, HTTP calls, file
access, queues, and sleeps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Replace the task executor.&lt;/strong&gt; Try
&lt;code&gt;Executors.newVirtualThreadPerTaskExecutor()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep downstream limits.&lt;/strong&gt; Connection pools, API quotas, and rate limits
still exist.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Load test the real path.&lt;/strong&gt; A sleep benchmark teaches the model, not your
production capacity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure CPU and memory too.&lt;/strong&gt; Cheap threads can still run expensive code
or retain large objects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check native integrations.&lt;/strong&gt; Native calls are one of the remaining
pinning cases.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is not "use virtual threads everywhere."&lt;/p&gt;

&lt;p&gt;The goal is "stop paying for idle OS threads where you do not need them."&lt;/p&gt;
&lt;h2&gt;
  
  
  The Mental Model I Am Keeping
&lt;/h2&gt;

&lt;p&gt;Before this experiment, I thought:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;More concurrent Java work requires a larger thread pool.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now I think:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Waiting work wants cheap virtual threads. CPU work wants bounded&lt;br&gt;
parallelism. Scarce dependencies want explicit limits.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That model is simple enough for a beginner and accurate enough to prevent a&lt;br&gt;
surprising number of production mistakes.&lt;/p&gt;

&lt;p&gt;The full runnable lab behind the numbers uses only the JDK. No framework, build&lt;br&gt;
tool, or dependency is required.&lt;/p&gt;

&lt;p&gt;Compile and run it with Java 25:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;javac VirtualThreadsLab.java
java VirtualThreadsLab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="//./code/VirtualThreadsLab.java"&gt;Open the complete runnable &lt;code&gt;VirtualThreadsLab.java&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Virtual threads became final in Java 21. Java 25 is not required for the basic&lt;br&gt;
API, but it gives us the current LTS behavior, including the post-Java-24&lt;br&gt;
improvements discussed above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openjdk.org/jeps/444" rel="noopener noreferrer"&gt;JEP 444: Virtual Threads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.oracle.com/en/java/javase/25/core/virtual-threads.html" rel="noopener noreferrer"&gt;Oracle Java 25 Guide: Virtual Threads&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openjdk.org/jeps/491" rel="noopener noreferrer"&gt;JEP 491: Synchronize Virtual Threads without Pinning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What should I put through this lab next: a database connection pool, 10,000&lt;br&gt;
real HTTP calls, or a &lt;code&gt;ThreadLocal&lt;/code&gt;-heavy application?&lt;/p&gt;

</description>
      <category>java</category>
      <category>beginners</category>
      <category>programming</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Added a 71-Line Black Box to My Python Agent, Then Queried the $200 Crash With DuckDB</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sun, 31 May 2026 16:54:50 +0000</pubDate>
      <link>https://dev.to/tahosin/i-added-a-71-line-black-box-to-my-python-agent-then-queried-the-200-crash-with-duckdb-4h18</link>
      <guid>https://dev.to/tahosin/i-added-a-71-line-black-box-to-my-python-agent-then-queried-the-200-crash-with-duckdb-4h18</guid>
      <description>&lt;p&gt;The incident started with a boring support automation task.&lt;/p&gt;

&lt;p&gt;Take a user request, search a private document index, summarize the answer, and hand the result to a reviewer. Nothing heroic. The kind of Python agent you build when the demo is over and the real workflow begins.&lt;/p&gt;

&lt;p&gt;Then one run got stuck in a retry loop.&lt;/p&gt;

&lt;p&gt;It did not burn $200 before I caught it. The actual test run was cheaper. The problem was the projection: same bad loop, same document search, same model calls, left inside the overnight batch. The estimate landed close to $200 for one avoidable failure.&lt;/p&gt;

&lt;p&gt;The answer it produced looked polished enough to pass a sleepy review. The trace behind it was not polished at all. The agent had called the right tool with the wrong input, retried against stale context, summarized old results, and kept paying for each turn.&lt;/p&gt;

&lt;p&gt;That is when I stopped treating the agent like a chat feature.&lt;/p&gt;

&lt;p&gt;I started treating it like a system that needs a black box.&lt;/p&gt;

&lt;p&gt;Not a dashboard. Not a full observability stack. Not another hosted service.&lt;/p&gt;

&lt;p&gt;Just one local file that can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent try?&lt;/li&gt;
&lt;li&gt;Which tool did it call?&lt;/li&gt;
&lt;li&gt;What input did the tool receive?&lt;/li&gt;
&lt;li&gt;Did the tool fail?&lt;/li&gt;
&lt;li&gt;How long did it take?&lt;/li&gt;
&lt;li&gt;Did the run cross a cost or turn limit?&lt;/li&gt;
&lt;li&gt;Can I query the run after everything is over?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will build that black box in plain Python, then use DuckDB to inspect it like a tiny crash database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before And After
&lt;/h2&gt;

&lt;p&gt;Before the fix, debugging looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The final answer is wrong.
The model probably hallucinated.
Maybe the search tool returned bad data.
Maybe the retry loop reused an old message.
Maybe the cost spike came from the model call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not debugging. That is guessing with syntax highlighting.&lt;/p&gt;

&lt;p&gt;After the fix, debugging looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn 1 called search_docs with the wrong query.
The tool timed out after 147.82 ms.
The retry used stale context.
The guard stopped the run at $0.0124.
DuckDB shows one tool_error and one guard_stop.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same bug. Very different day.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shape Of The Problem
&lt;/h2&gt;

&lt;p&gt;A normal Python script usually fails in one place.&lt;/p&gt;

&lt;p&gt;An agent fails across a chain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Request -&amp;gt; Model Decision -&amp;gt; Tool Call -&amp;gt; Tool Result -&amp;gt; Next Turn -&amp;gt; Final Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru6assdaedzv370r15wg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fru6assdaedzv370r15wg.png" alt="Agent run flow diagram" width="800" height="395"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you only log the final answer, you have a diary entry.&lt;/p&gt;

&lt;p&gt;If you record the chain, you have evidence.&lt;/p&gt;

&lt;p&gt;The simplest useful format is JSONL. One event per line.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"search_docs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"rate limits"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_end"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"search_docs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"duration_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;83.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"turn_end"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"turn"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"total_cost_usd"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;0.0041&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSONL is boring in exactly the right way. It appends cleanly, survives crashes better than one large JSON document, and can be searched with normal tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg78cujupv6jfmmkrz0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg78cujupv6jfmmkrz0y.png" alt="JSONL trace from a failed run" width="800" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Small Recorder That Does Real Work
&lt;/h2&gt;

&lt;p&gt;Here is the recorder.&lt;/p&gt;

&lt;p&gt;It does four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gives every run a unique id&lt;/li&gt;
&lt;li&gt;writes append-only JSONL events&lt;/li&gt;
&lt;li&gt;measures tool duration&lt;/li&gt;
&lt;li&gt;sanitizes obvious secrets before writing anything to disk
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;__future__&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;annotations&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid4&lt;/span&gt;


&lt;span class="n"&gt;SECRET_KEYS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(api[_-]?key|token|password|secret|authorization|cookie)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;SECRET_KEYS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
                &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[redacted]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;cleaned&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;started&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_exc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;sanitize()&lt;/code&gt; function is not perfect. It is a seatbelt, not a vault.&lt;/p&gt;

&lt;p&gt;Still, it prevents the most embarrassing version of this pattern: building a helpful debug trace that quietly stores API keys.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrap One Tool First
&lt;/h2&gt;

&lt;p&gt;Start with one tool. Do not instrument everything on day one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Document search timed out&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSONL works well for append-only traces.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context managers are useful around tool calls.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DuckDB can query JSON files without a server.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now record the call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traces/run.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python agent trace format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;traces/run.jsonl&lt;/code&gt; and the key is redacted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"search_docs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:{&lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"python agent trace format"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"api_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"[redacted]"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tiny detail matters. Debugging should not create a second incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add A Cheap Run Guard
&lt;/h2&gt;

&lt;p&gt;Most runaway agent stories start with a loop that looked harmless.&lt;/p&gt;

&lt;p&gt;So the black box should not only record what happened. It should record when it refused to continue.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stop_if_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_turns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stopped at turn &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Max turns is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stopped at $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Budget is $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not exact billing. Use your provider response for real token counts when you have them.&lt;/p&gt;

&lt;p&gt;The goal here is a local tripwire. You want the run to leave a clear reason when it stops.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Tiny Agent Loop
&lt;/h2&gt;

&lt;p&gt;This fake loop keeps the moving parts small.&lt;/p&gt;

&lt;p&gt;Replace the pretend model section with your real model call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.0000005&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.0000015&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traces/run.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;max_turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="n"&gt;max_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;

    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;stop_if_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Pretend the model picked this tool input.
&lt;/span&gt;        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python jsonl duckdb traces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

        &lt;span class="n"&gt;turn_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;turn_cost&lt;/span&gt;

        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;message_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;turn_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;total_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Record every tool call as JSONL, then query failures after the run.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it once with a normal question.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How should I debug Python agent tools?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run it with a bad one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout during document search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second run should fail, but now it fails with a trail.&lt;/p&gt;

&lt;p&gt;To force a budget stop for testing, temporarily set &lt;code&gt;max_usd = 0.0001&lt;/code&gt;. The next guard check will write a &lt;code&gt;guard_stop&lt;/code&gt; event instead of letting the loop continue quietly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Query The Crash With DuckDB
&lt;/h2&gt;

&lt;p&gt;This is the part that makes JSONL feel less like logging and more like a debugging tool.&lt;/p&gt;

&lt;p&gt;Install DuckDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then query the trace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;duckdb&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query_trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traces/run.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;con&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;duckdb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        create or replace view events as
        select *
        from read_json_auto(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Event counts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        select type, count(*) as events
        from events
        group by type
        order by events desc;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tool errors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        select
            data.tool as tool,
            data.error_type as error_type,
            data.error as error,
            data.duration_ms as duration_ms
        from events
        where type = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Slow tools&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;con&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        select
            data.tool as tool,
            data.duration_ms as duration_ms
        from events
        where type = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tool_end&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
        order by data.duration_ms desc
        limit 5;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;query_trace&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The payoff should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl76xqp1z7xokjcssrol.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnl76xqp1z7xokjcssrol.png" alt="DuckDB query output for an agent crash" width="800" height="461"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Event counts
+-------------+--------+
| type        | events |
+-------------+--------+
| guard_check |      4 |
| turn_start  |      3 |
| tool_start  |      3 |
| tool_end    |      2 |
| tool_error  |      1 |
| guard_stop  |      1 |
+-------------+--------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the crash row is now a query result, not a mystery:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tool errors
+-------------+--------------+---------------------------+-------------+
| tool        | error_type   | error                     | duration_ms |
+-------------+--------------+---------------------------+-------------+
| search_docs | TimeoutError | Document search timed out |      147.82 |
+-------------+--------------+---------------------------+-------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can answer questions that normal print logs make annoying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tools failed most often?&lt;/li&gt;
&lt;li&gt;Which tool was slowest?&lt;/li&gt;
&lt;li&gt;Which turn crossed the budget?&lt;/li&gt;
&lt;li&gt;Did the same input fail repeatedly?&lt;/li&gt;
&lt;li&gt;Did the guard stop the run, or did the tool crash first?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the upgrade.&lt;/p&gt;

&lt;p&gt;Not "I have logs."&lt;/p&gt;

&lt;p&gt;"I can interrogate the run."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Record In A Real Project
&lt;/h2&gt;

&lt;p&gt;For a demo, the trace above is enough.&lt;/p&gt;

&lt;p&gt;For a real project, I would add these fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;model&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;provider&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;prompt_hash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tool_schema_version&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;input_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;output_tokens&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;finish_reason&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;retry_count&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;user_id_hash&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;environment&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I would not record these by default:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw access tokens&lt;/li&gt;
&lt;li&gt;private documents&lt;/li&gt;
&lt;li&gt;full customer prompts&lt;/li&gt;
&lt;li&gt;full tool responses with sensitive data&lt;/li&gt;
&lt;li&gt;cookies or request headers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The boring security rule is simple:&lt;/p&gt;

&lt;p&gt;Record enough to debug behavior. Do not record enough to harm someone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern In One Sentence
&lt;/h2&gt;

&lt;p&gt;Every agent run should produce a local, append-only event stream that is safe to keep, easy to query, and useful after the process crashes.&lt;/p&gt;

&lt;p&gt;That sentence is less exciting than a new prompt trick.&lt;/p&gt;

&lt;p&gt;It is also more likely to save your weekend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full File
&lt;/h2&gt;

&lt;p&gt;Here is the complete example in one place.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;__future__&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;annotations&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;traceback&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;contextlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;contextmanager&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;field&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid4&lt;/span&gt;


&lt;span class="n"&gt;SECRET_KEYS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(api[_-]?key|token|password|secret|authorization|cookie)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;default_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[redacted]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;SECRET_KEYS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mkdir&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exist_ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nb"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;event_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;sanitize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;asdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@contextmanager&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;started&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;error_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;traceback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_exc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;duration_ms&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;perf_counter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;started&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stop_if_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_turns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stopped at turn &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Max turns is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;guard_stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RunStopped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stopped at $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Budget is $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Document search timed out&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JSONL works well for append-only traces.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Context managers are useful around tool calls.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DuckDB can query JSON files without a server.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.0000005&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;output_tokens&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.0000015&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentBlackBox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;traces/run.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
    &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
    &lt;span class="n"&gt;max_turns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="n"&gt;max_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;

    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;stop_if_needed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python jsonl duckdb traces&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_docs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk-not-a-real-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

        &lt;span class="n"&gt;turn_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;output_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;spent_usd&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;turn_cost&lt;/span&gt;

        &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;turn_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;message_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;turn_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;total_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Record every tool call as JSONL, then query failures after the run.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;total_cost_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spent_usd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;answer&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;run_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How should I debug Python agent tools?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is one line in that full file worth staring at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;box&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_turns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That line changes the posture of the program.&lt;/p&gt;

&lt;p&gt;The run is no longer a private conversation with a model. It is a recorded execution with a trace you can inspect, query, and improve.&lt;/p&gt;

&lt;p&gt;That is the difference between a demo and something you can trust.&lt;/p&gt;

&lt;p&gt;What would you add next: prompt hashes, token counts, screenshots, checkpoints, or replayable tool fixtures?&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>debugging</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Most Underrated Announcement from Google I/O 2026 Was Buried in a 90-Second Demo</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 21 May 2026 20:22:22 +0000</pubDate>
      <link>https://dev.to/tahosin/the-most-underrated-announcement-from-google-io-2026-was-buried-in-a-90-second-demo-550</link>
      <guid>https://dev.to/tahosin/the-most-underrated-announcement-from-google-io-2026-was-buried-in-a-90-second-demo-550</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I watched the Google I/O 2026 keynote twice.&lt;/p&gt;

&lt;p&gt;First time, I got swept up in the shiny stuff. Gemini 3.5 Flash benchmarks. Veo 3 generating videos that look disturbingly real. Gemini Omni doing that multimodal physics thing. Cool. Expected. The usual I/O sugar rush that gets 50,000 retweets and fades by Thursday.&lt;/p&gt;

&lt;p&gt;Second time through, I caught something different.&lt;/p&gt;

&lt;p&gt;About 40 minutes into the developer keynote, sandwiched between the Jules GA announcement and a Stitch demo, there was maybe 90 seconds on something called the &lt;strong&gt;Managed Agents API&lt;/strong&gt;. The presenter dropped one line that made me hit pause and rewind.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Deploy an autonomous agent that reasons, writes code, browses the web, and executes in a secure sandbox. One API call."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I closed every other tab. Pulled up the docs. Started writing code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 19-Day Problem
&lt;/h2&gt;

&lt;p&gt;Here's some context. If you've tried building anything with AI agents in the past year, you know the drill. And by "drill" I mean "weeks of suffering."&lt;/p&gt;

&lt;p&gt;Say you want an agent that takes a GitHub issue, reads the codebase, writes a fix, runs tests, and opens a PR. Sounds straightforward, right? In reality, you're wiring up five services, spinning up sandboxed containers, managing auth, building tool-call routing, writing health checks, and setting up network policies so your agent doesn't accidentally nuke production at 3am on a Saturday.&lt;/p&gt;

&lt;p&gt;Last month I built an internal bot that triages support tickets. Took three weeks. The actual AI logic? One day. The other 19 days were pure infrastructure. Docker config. Sandbox isolation with gVisor. Network policies. Timeout handling. Health checks. Retry logic.&lt;/p&gt;

&lt;p&gt;Nineteen days of plumbing. One day of thinking.&lt;/p&gt;

&lt;p&gt;That ratio is broken. And this API just fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Weeks to Eleven Lines
&lt;/h2&gt;

&lt;p&gt;I took that same support ticket bot and rewired it on the Managed Agents API. Not a demo version. The same bot. Same capabilities.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;antigravity-preview-05-2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a support ticket triage agent. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the following ticket, classify its severity, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;identify the affected component from the codebase, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;and draft a response with a proposed fix.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eleven lines. No Docker. No Kubernetes. No sandbox config.&lt;/p&gt;

&lt;p&gt;The API spins up a fresh, isolated Linux environment, loads the agent runtime, runs your task, hands back the result, and destroys the sandbox. Done.&lt;/p&gt;

&lt;p&gt;Here's what that looked like in practice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Old Setup&lt;/th&gt;
&lt;th&gt;Managed Agents API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to build&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3 weeks&lt;/td&gt;
&lt;td&gt;1 afternoon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of infra code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~2,400&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of agent logic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~180&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker, gVisor, Redis, nginx&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google-genai&lt;/code&gt; pip package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container updates, health checks, scaling&lt;/td&gt;
&lt;td&gt;None (Google's problem)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I stared at my screen for a solid minute when it worked. Not because the output was flawless (it wasn't). Because I'd just thrown away three weeks of infrastructure code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google Actually Built Under the Hood
&lt;/h2&gt;

&lt;p&gt;When you hit &lt;code&gt;interactions.create&lt;/code&gt;, four things happen:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sandbox provisioning.&lt;/strong&gt; Google fires up an isolated Linux VM. Fresh filesystem every time. No leftover state from previous runs. Network access is off by default, opt-in only. This alone used to cost me a week of Docker and gVisor wrestling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent harness boots up.&lt;/strong&gt; This is the exact same runtime that powers Jules and the Antigravity desktop app. Not a watered-down version. Same thing. Every improvement Google makes to Jules? Your managed agents get it too.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reasoning loop.&lt;/strong&gt; The agent reads your input, builds a plan, starts executing. Writing files. Running code. Hitting the web if you've turned that on. There's a "critic" layer baked in that catches logic errors before returning output. Think of it like a built-in code reviewer that runs before every response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cleanup.&lt;/strong&gt; Interaction finishes, sandbox gets nuked, you get the result plus any files the agent created. Thirty seconds to a few minutes total.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Sandbox Breaks: The Preview Limitations
&lt;/h2&gt;

&lt;p&gt;I'm not going to pretend this is ready for production. Two days of testing surfaced real problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeout wall.&lt;/strong&gt; I pointed it at a 15,000-line codebase and asked it to refactor one module. Hit the 5-minute ceiling and died. Large, complex tasks choke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero memory between calls.&lt;/strong&gt; Each interaction gets a clean sandbox. Great for security. Terrible if you need your agent to remember context. You have to manage state yourself, passing the &lt;code&gt;previous_interaction_id&lt;/code&gt; and relevant context back in on every subsequent call. Not hard, but not free either.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "preview" tax.&lt;/strong&gt; Pre-GA. Google says don't feed it sensitive data. Side projects and internal tools? Go for it. Customer data in production? Wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing is a black box.&lt;/strong&gt; Free during preview. Nobody knows what this costs at scale. That's a real problem for anyone planning production workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Network access is half-baked.&lt;/strong&gt; Your agent can browse the public web. But reaching internal APIs? You need an MCP server as a bridge, which brings back some of that infrastructure overhead. A bit ironic.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Stacks Up Against the Competition
&lt;/h2&gt;

&lt;p&gt;Here's what made me pay attention. Right now, if you want an autonomous agent that executes in a sandbox, your options are:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI Assistants API&lt;/strong&gt; gives you code execution in a sandbox, but it's tied to OpenAI models, the sandbox is limited (no arbitrary binary execution, no web browsing), and you're paying per-token plus tool-call fees. It's also not truly "deploy an agent" so much as "run a conversation with tools."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Anthropic's tool-use&lt;/strong&gt; is powerful for single-turn tool calling, but there's no managed sandbox. You bring your own execution environment. So you're back to the Docker-and-gVisor dance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangGraph Cloud&lt;/strong&gt; gets you agent orchestration, but again, you manage the infrastructure. The execution environment is your problem.&lt;/p&gt;

&lt;p&gt;Google's approach is different. They're saying: give us the instructions, we'll handle the sandbox, the execution, the security, the cleanup. You don't think about infrastructure at all. That's a genuinely new position in this space.&lt;/p&gt;

&lt;p&gt;This is the first time a major cloud provider is treating autonomous agents as serverless compute, not just chat-with-tools.&lt;/p&gt;

&lt;p&gt;The tradeoff? You're locked into Google's ecosystem. The agent runs on Gemini models. If you need Claude or GPT-4 for a specific task, this isn't your tool. But for teams already in the Google stack, the friction drop is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Feature That Actually Got Me: Saved Agents
&lt;/h2&gt;

&lt;p&gt;One-shot interactions are cool. But &lt;code&gt;agents.create&lt;/code&gt; is where things get interesting.&lt;/p&gt;

&lt;p&gt;You define an agent with custom instructions, specific tools, MCP connections, and environment settings. Save that whole configuration. Then trigger it by ID from anywhere. Cron job. Webhook. GitHub Action. Another agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticket-triage-v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a senior support engineer. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Classify tickets by severity. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Always check error logs before suggesting a fix. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Never suggest restarting the service as a first option.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;web_browse&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;environment_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timeout_seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Trigger from anywhere
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;New ticket: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ticket_text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I wired one to our Slack. Someone files a bug, the agent auto-triages, pulls relevant logs, posts analysis in the thread. Forty lines of Python and a webhook.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lambda Moment
&lt;/h2&gt;

&lt;p&gt;Remember 2014? Before Lambda, running code in the cloud meant EC2 instances. Load balancers. Auto-scaling groups. The works.&lt;/p&gt;

&lt;p&gt;Lambda said: give us the function, we handle the rest. People called it a toy. Then it ate the backend world.&lt;/p&gt;

&lt;p&gt;I keep seeing the same pattern. Before this API, building an agent meant managing infrastructure. Now you hand over instructions and Google runs the thing in a sandboxed environment.&lt;/p&gt;

&lt;p&gt;Maybe I'm wrong. Maybe this stays niche. But the parallel keeps nagging at me, and I haven't been able to talk myself out of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Want to Build Next
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;docs drift detector&lt;/strong&gt; that points at a repo, reads the README, runs the code, and flags where documentation and behavior have diverged. Every project has this problem. Nobody fixes it manually.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;dependency changelog reader&lt;/strong&gt; that actually reads changelogs for your deps, understands breaking changes, and tells you which updates are safe to auto-merge and which ones need human review.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;pre-review PR agent&lt;/strong&gt; that reads changes before a human reviewer opens the PR, checks test coverage on modified files, identifies risky diffs, and writes review notes. Like a thorough junior dev who never sleeps.&lt;/p&gt;

&lt;p&gt;All of these would've been multi-week projects before. Now they're afternoon builds. That's the shift. Not what agents can do. But how fast you can ship them.&lt;/p&gt;

&lt;h2&gt;
  
  
  So What Now
&lt;/h2&gt;

&lt;p&gt;Google I/O 2026 had no shortage of headlines. Gemini 3.5 Flash is fast. Veo 3 is wild. Gemini Omni understanding physics makes you wonder what 2027 looks like.&lt;/p&gt;

&lt;p&gt;But this quiet little API is the one that actually changed my Tuesday. It didn't make me go "wow." It made me delete code. And that's usually how the important stuff starts.&lt;/p&gt;

&lt;p&gt;Open the docs. Write eleven lines of Python. See what happens.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Found this useful? A reaction helps others find it too. Questions about the API or building with it? I'm in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleiochallenge</category>
      <category>devchallenge</category>
      <category>discuss</category>
      <category>python</category>
    </item>
    <item>
      <title>Hermes Just Killed OpenClaw (Here's Why)</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Tue, 19 May 2026 13:12:33 +0000</pubDate>
      <link>https://dev.to/tahosin/hermes-just-killed-openclaw-heres-why-4c23</link>
      <guid>https://dev.to/tahosin/hermes-just-killed-openclaw-heres-why-4c23</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I do not think OpenClaw is dead.&lt;/p&gt;

&lt;p&gt;That title is deliberately dramatic because the shift is dramatic. OpenClaw did something important: it made a lot of developers believe that a personal AI assistant could be more than a chat box. It could sit on your machine, connect to your messages, call tools, browse, run commands, and actually move work forward.&lt;/p&gt;

&lt;p&gt;But Hermes Agent changes the question.&lt;/p&gt;

&lt;p&gt;OpenClaw asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if I could run a personal AI assistant on my own devices?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hermes asks:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What if my agent could live on my infrastructure, remember how I work, improve its own procedures, use tools across channels, and become more useful every week?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That second question is why Hermes feels like the next step.&lt;/p&gt;

&lt;p&gt;Not because OpenClaw is bad. OpenClaw is popular for a reason. The official repo describes it as a personal AI assistant that runs on your own devices, answers through the channels you already use, and uses a Gateway as the control plane. That is a strong idea.&lt;/p&gt;

&lt;p&gt;The problem is that the AI agent market is moving from "assistant I operate" to "worker I supervise." Once that happens, the winning system is not the one with the loudest demo. It is the one with the better memory model, execution boundary, skill lifecycle, tool surface, and deployment story.&lt;/p&gt;

&lt;p&gt;That is where Hermes starts to pull ahead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;If I had to explain the difference in one line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;OpenClaw feels like a local-first assistant. Hermes feels like agent infrastructure that happens to chat.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;A real agent has to do more than respond. It needs to run somewhere reliable. It needs to work while I am away. It needs to remember the parts of my environment that matter. It needs to learn repeatable procedures. It needs to make tool use safer, especially when those tools touch files, browsers, credentials, APIs, and servers.&lt;/p&gt;

&lt;p&gt;OpenClaw helped prove the demand.&lt;/p&gt;

&lt;p&gt;Hermes is making the operating model more serious.&lt;/p&gt;

&lt;h2&gt;
  
  
  The five claims that matter
&lt;/h2&gt;

&lt;p&gt;The loudest Hermes pitch right now is simple: install it, connect it, give it skills, run it on a server, and let it become your agent.&lt;/p&gt;

&lt;p&gt;That pitch is exciting, but I would not judge Hermes by hype. I would judge it by which claims survive contact with architecture.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Claim&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;th&gt;My read&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"One-command install"&lt;/td&gt;
&lt;td&gt;Agents die when setup is fragile. If the first hour is dependency pain, most people quit.&lt;/td&gt;
&lt;td&gt;Useful, but not the real moat. Setup gets you to day one. Memory and skills decide day thirty.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Run it on a VPS or sandbox"&lt;/td&gt;
&lt;td&gt;A serious agent should not need your personal laptop open all day.&lt;/td&gt;
&lt;td&gt;This is one of Hermes' strongest arguments. Persistent agents belong on persistent infrastructure.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Built-in skills"&lt;/td&gt;
&lt;td&gt;Skills turn vague AI behavior into repeatable procedures.&lt;/td&gt;
&lt;td&gt;Strong, especially because Hermes treats skills as something the agent can improve, not just something a user installs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Messaging integrations"&lt;/td&gt;
&lt;td&gt;Telegram, Discord, Slack, WhatsApp, and similar channels make the agent reachable from normal life.&lt;/td&gt;
&lt;td&gt;Important, but only if paired with background sessions. Otherwise it is just another bot in another inbox.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Safer execution"&lt;/td&gt;
&lt;td&gt;Agents touch terminals, files, browsers, APIs, and credentials. That is dangerous by default.&lt;/td&gt;
&lt;td&gt;This is where Hermes feels more mature: command approval, allowlists, Docker, SSH, sandbox backends, and scoped toolsets all matter.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is the lens for the rest of this post.&lt;/p&gt;

&lt;p&gt;I do not care whether Hermes can produce a flashy demo once. Most agent frameworks can do that now.&lt;/p&gt;

&lt;p&gt;I care whether Hermes has the bones for repeated work: memory, procedural learning, sandboxed execution, remote availability, and enough tool scoping to avoid turning convenience into a security incident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why OpenClaw won attention first
&lt;/h2&gt;

&lt;p&gt;OpenClaw's strength is obvious from its own README. It is broad, local, channel-heavy, and familiar to developers who want an assistant they can own.&lt;/p&gt;

&lt;p&gt;The official repo highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WhatsApp, Telegram, Slack, Discord, Signal, iMessage, Microsoft Teams, Matrix, LINE, WeChat, and many more channels&lt;/li&gt;
&lt;li&gt;A local-first Gateway that owns messaging surfaces and routes requests&lt;/li&gt;
&lt;li&gt;First-class tools for browser, files, exec, canvas, cron, sessions, image generation, video generation, TTS, and sub-agents&lt;/li&gt;
&lt;li&gt;Skills based on &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Native onboarding with &lt;code&gt;openclaw onboard&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Companion apps and nodes for macOS, iOS, Android, and headless devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not small. That is why OpenClaw became a reference point for personal agents.&lt;/p&gt;

&lt;p&gt;It also has a massive community. At the time I checked the GitHub API, OpenClaw had far more stars than Hermes. Popularity alone does not decide technical direction, but it does tell you something: OpenClaw made the category legible.&lt;/p&gt;

&lt;p&gt;For context, I checked the public repos directly: &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;openclaw/openclaw&lt;/a&gt; and &lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt;. OpenClaw has the bigger gravity right now. Hermes has the more interesting agent-runtime thesis.&lt;/p&gt;

&lt;p&gt;The issue is that popularity also brings a harsh spotlight. Once strangers, groups, plugins, browsers, shells, and personal accounts all meet inside one assistant, the security model becomes the product.&lt;/p&gt;

&lt;p&gt;OpenClaw's own security docs are honest about this. The guidance assumes a personal assistant trust boundary: one trusted operator boundary per gateway. It says OpenClaw is not a hostile multi-tenant security boundary for adversarial users sharing one gateway. It also says the product default for trusted single-operator setups allows host execution in the &lt;code&gt;gateway&lt;/code&gt; or &lt;code&gt;node&lt;/code&gt; context unless you tighten it.&lt;/p&gt;

&lt;p&gt;That is not a cheap criticism. It is the tradeoff OpenClaw chose: powerful local assistant first, hardening second.&lt;/p&gt;

&lt;p&gt;Hermes starts from a different center.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes is built around compounding
&lt;/h2&gt;

&lt;p&gt;The most important Hermes idea is not Telegram integration. It is not browser automation. It is not even the tool count.&lt;/p&gt;

&lt;p&gt;The key idea is compounding.&lt;/p&gt;

&lt;p&gt;Hermes describes itself as a self-improving agent with a built-in learning loop. Its docs talk about agent-curated memory, autonomous skill creation, skill improvement during use, session search, external memory providers, and user modeling.&lt;/p&gt;

&lt;p&gt;That sounds abstract until you translate it into developer terms:&lt;/p&gt;

&lt;p&gt;If the agent solves a hard workflow today, it should not rediscover that workflow next week.&lt;/p&gt;

&lt;p&gt;That is the difference between a chatbot with tools and an agent that grows.&lt;/p&gt;

&lt;p&gt;Hermes has two memory layers that are easy to reason about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MEMORY.md&lt;/code&gt; for environment facts, project conventions, lessons learned, and workflow notes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;USER.md&lt;/code&gt; for preferences, communication style, expectations, and profile details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are bounded on purpose. Hermes keeps them focused instead of stuffing an infinite pile of text into every prompt. For older conversations, it uses SQLite session storage with FTS5 search and summarization.&lt;/p&gt;

&lt;p&gt;That design feels practical. The always-loaded memory stays small. The deeper history is searchable when needed.&lt;/p&gt;

&lt;p&gt;This is exactly how I want a serious agent to behave. I do not want it to remember everything equally. I want it to remember what changes future behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill system is the real "DNA"
&lt;/h2&gt;

&lt;p&gt;Skills are where Hermes becomes interesting.&lt;/p&gt;

&lt;p&gt;OpenClaw has skills too. Its docs explain that skills are AgentSkills-compatible &lt;code&gt;SKILL.md&lt;/code&gt; folders that teach the agent how to use tools. OpenClaw loads bundled skills, managed/local skills, personal skills, project skills, and workspace skills.&lt;/p&gt;

&lt;p&gt;Hermes takes the same basic idea and pushes it closer to procedural memory.&lt;/p&gt;

&lt;p&gt;The Hermes docs say the agent can create, update, and delete its own skills through &lt;code&gt;skill_manage&lt;/code&gt;. It creates skills after complex successful tasks, when it finds the path through errors, when a user corrects its approach, or when it discovers a non-trivial workflow.&lt;/p&gt;

&lt;p&gt;That is the part that matters.&lt;/p&gt;

&lt;p&gt;Not "skills as a plugin folder."&lt;/p&gt;

&lt;p&gt;Skills as the agent writing down how to be better next time.&lt;/p&gt;

&lt;p&gt;This is the difference between installing extensions and building organizational memory. A good senior developer does not just solve an incident. They improve the runbook. Hermes is trying to make the agent do the same thing.&lt;/p&gt;

&lt;p&gt;And it is not only local skills. Hermes supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Official optional skills&lt;/li&gt;
&lt;li&gt;&lt;code&gt;skills.sh&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Well-known skill endpoints&lt;/li&gt;
&lt;li&gt;Direct URL skills&lt;/li&gt;
&lt;li&gt;GitHub skill installs&lt;/li&gt;
&lt;li&gt;Community registries&lt;/li&gt;
&lt;li&gt;External read-only skill directories&lt;/li&gt;
&lt;li&gt;Security scanning and audit commands for installed hub skills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives Hermes a useful middle ground. It can learn locally, but it can also participate in a broader open skill ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The execution story is stronger
&lt;/h2&gt;

&lt;p&gt;This is where the comparison gets practical.&lt;/p&gt;

&lt;p&gt;An agent that can run commands should make you slightly nervous. That is healthy.&lt;/p&gt;

&lt;p&gt;Hermes treats terminal execution as a configurable backend. Commands can run locally, in Docker, over SSH, in Singularity, in Modal, in Daytona, or in Vercel Sandbox. The docs are clear about the tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local is easy, but has no isolation&lt;/li&gt;
&lt;li&gt;Docker gives container isolation&lt;/li&gt;
&lt;li&gt;SSH moves execution to another server&lt;/li&gt;
&lt;li&gt;Modal and Daytona give cloud sandbox options&lt;/li&gt;
&lt;li&gt;Vercel Sandbox gives microVM-style cloud execution with snapshot persistence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The security page goes further. With Docker, Hermes applies hardened container flags: drop capabilities, no new privileges, PID limits, tmpfs mounts, and explicit resource limits. It also avoids forwarding host environment variables by default.&lt;/p&gt;

&lt;p&gt;That matters for one simple reason:&lt;/p&gt;

&lt;p&gt;The agent should not automatically inherit your entire laptop just because you wanted it to scrape a page or refactor a file.&lt;/p&gt;

&lt;p&gt;OpenClaw can sandbox too. Its README points to Docker, SSH, and OpenShell options, and it recommends sandboxing for non-main sessions. Its security docs are detailed and serious.&lt;/p&gt;

&lt;p&gt;But the default mental model is different.&lt;/p&gt;

&lt;p&gt;OpenClaw is a personal assistant with optional hardening.&lt;/p&gt;

&lt;p&gt;Hermes is an agent runtime where isolated execution is part of the normal deployment conversation.&lt;/p&gt;

&lt;p&gt;That is why I would rather run Hermes on a VPS or cloud sandbox for always-on work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Messaging is not the win. Remote agency is.
&lt;/h2&gt;

&lt;p&gt;Both tools can talk through messaging platforms.&lt;/p&gt;

&lt;p&gt;OpenClaw has a huge channel list. Hermes also supports a wide set: Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Matrix, Mattermost, Home Assistant, DingTalk, Feishu/Lark, WeCom, Microsoft Teams, and more.&lt;/p&gt;

&lt;p&gt;The interesting Hermes feature is not that you can message it.&lt;/p&gt;

&lt;p&gt;The interesting feature is that messaging becomes a control surface for background work.&lt;/p&gt;

&lt;p&gt;Hermes supports background sessions from messaging platforms. You can start a separate task, keep chatting in the main thread, and receive the result back in the same channel. That is a small feature on paper, but it changes the feel of the system.&lt;/p&gt;

&lt;p&gt;It stops being:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am chatting with a bot.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I am dispatching work to an agent that lives somewhere else.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the future I care about.&lt;/p&gt;

&lt;p&gt;I do not want my personal agent trapped inside the laptop I am currently using. I want it on a server, reachable from my phone, able to run a long task, report back, and remember the result.&lt;/p&gt;

&lt;p&gt;Hermes is built for that shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool breadth is now table stakes
&lt;/h2&gt;

&lt;p&gt;There was a time when "this agent can browse the web and run commands" sounded wild.&lt;/p&gt;

&lt;p&gt;That time is over.&lt;/p&gt;

&lt;p&gt;Both OpenClaw and Hermes have serious tool surfaces.&lt;/p&gt;

&lt;p&gt;OpenClaw ships built-in tools for shell execution, code execution, browser control, web search, file I/O, patching, messaging, canvas, nodes, cron, images, music, video, TTS, sessions, and sub-agents.&lt;/p&gt;

&lt;p&gt;Hermes ships a broad registry too: web search, extraction, terminal, file editing, browser automation, vision, image generation, TTS, memory, session search, cron, messaging, delegation, code execution, Home Assistant, MCP tools, RL tools, and more.&lt;/p&gt;

&lt;p&gt;So the question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which one has tools?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The better question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which one makes tools safer, more composable, and easier to scope per situation?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hermes has a clear toolset model. Toolsets can be enabled per session, per platform, or per task. There are platform presets like &lt;code&gt;hermes-cli&lt;/code&gt;, &lt;code&gt;hermes-telegram&lt;/code&gt;, and dynamic MCP toolsets. That gives you a cleaner way to say:&lt;/p&gt;

&lt;p&gt;"This Telegram agent can do X, but not Y."&lt;/p&gt;

&lt;p&gt;For me, that is more important than raw tool count.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hermes vs OpenClaw
&lt;/h2&gt;

&lt;p&gt;Here is my practical comparison.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Hermes Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Core identity&lt;/td&gt;
&lt;td&gt;Personal AI assistant&lt;/td&gt;
&lt;td&gt;Self-improving agent runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mental model&lt;/td&gt;
&lt;td&gt;Local-first Gateway assistant&lt;/td&gt;
&lt;td&gt;Persistent worker on your infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;CLI onboarding and Gateway daemon&lt;/td&gt;
&lt;td&gt;CLI, Gateway, and multiple runtime backends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messaging&lt;/td&gt;
&lt;td&gt;Very broad channel coverage&lt;/td&gt;
&lt;td&gt;Channels plus background sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;Skills loaded from many locations&lt;/td&gt;
&lt;td&gt;Skills as procedural memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;Workspace and session context&lt;/td&gt;
&lt;td&gt;Curated memory plus session search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tooling&lt;/td&gt;
&lt;td&gt;Broad built-in tools&lt;/td&gt;
&lt;td&gt;Toolsets, MCP, delegation, media, web&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Personal trust boundary, hardening available&lt;/td&gt;
&lt;td&gt;Approval, isolation, env filtering, scoped tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Device or Gateway host&lt;/td&gt;
&lt;td&gt;Local, VPS, Docker, SSH, Modal, Daytona, Vercel Sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideal user&lt;/td&gt;
&lt;td&gt;Power user with a device assistant&lt;/td&gt;
&lt;td&gt;Developer building a supervised digital worker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Biggest risk&lt;/td&gt;
&lt;td&gt;Too much power in one assistant boundary&lt;/td&gt;
&lt;td&gt;Newer ecosystem still proving itself&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This table is why I do not read Hermes as "another OpenClaw clone."&lt;/p&gt;

&lt;p&gt;Hermes is competing on a different axis.&lt;/p&gt;

&lt;p&gt;OpenClaw made the assistant powerful.&lt;/p&gt;

&lt;p&gt;Hermes is trying to make the assistant compound.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical playbook
&lt;/h2&gt;

&lt;p&gt;If you are reading this and wondering "okay, but what do I actually try first?", this is the path I would take.&lt;/p&gt;

&lt;p&gt;First, run Hermes somewhere disposable. A local machine is fine for learning, but the interesting path is Docker, SSH, Modal, Daytona, or another sandbox backend. The whole point is to avoid giving an experimental agent unlimited access to your daily machine on day one.&lt;/p&gt;

&lt;p&gt;Then connect one messaging surface, not five. Telegram or Discord is enough. Make sure allowlists or DM pairing are enabled before you give the agent terminal access.&lt;/p&gt;

&lt;p&gt;Then give Hermes one recurring workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/background Research the latest Hermes Agent docs changes, summarize the developer impact, and send me 5 possible DEV post angles.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, watch for the compounding moment. If the workflow takes several tool calls, has a repeatable structure, or needs a correction from you, that is exactly the kind of thing that should become a skill.&lt;/p&gt;

&lt;p&gt;A good first Hermes skill would not be "write blog posts." Too vague.&lt;/p&gt;

&lt;p&gt;A better one would be:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;research-release-notes

When given a GitHub repo or docs page:
1. Find the latest release or docs update.
2. Prefer primary sources.
3. Extract concrete changes.
4. Separate confirmed facts from opinion.
5. Produce a DEV-ready outline with links.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is where Hermes becomes more than a chat assistant. You are not just asking it to do a task. You are teaching it a durable way to do that class of task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where OpenClaw still wins
&lt;/h2&gt;

&lt;p&gt;A good comparison should admit the other side.&lt;/p&gt;

&lt;p&gt;OpenClaw still has big advantages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It has enormous attention and community gravity.&lt;/li&gt;
&lt;li&gt;Its channel ecosystem is very broad.&lt;/li&gt;
&lt;li&gt;Its native app and node story is compelling.&lt;/li&gt;
&lt;li&gt;Its local-first assistant feel is easier to explain to non-agent people.&lt;/li&gt;
&lt;li&gt;It has already shaped how people talk about personal AI assistants.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If your goal is "I want a personal AI assistant connected to my messaging apps and devices," OpenClaw is still a serious answer.&lt;/p&gt;

&lt;p&gt;But if your goal is "I want an agent that can become operational infrastructure," Hermes is the more interesting answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Hermes wins
&lt;/h2&gt;

&lt;p&gt;Hermes wins because it is opinionated about the hard parts.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. It treats memory as a product surface
&lt;/h3&gt;

&lt;p&gt;Memory is not just chat history. It is a curated behavioral layer. The split between &lt;code&gt;MEMORY.md&lt;/code&gt;, &lt;code&gt;USER.md&lt;/code&gt;, and searchable session history is simple enough to trust and flexible enough to grow.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. It treats skills as learning
&lt;/h3&gt;

&lt;p&gt;The agent can create and update skills after hard tasks. That is the closest thing to compounding engineering knowledge in this category.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. It treats execution location as a first-class choice
&lt;/h3&gt;

&lt;p&gt;Local, Docker, SSH, Modal, Daytona, Vercel Sandbox, Singularity. That is not a footnote. That is the difference between a toy assistant and something you can deploy with intent.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. It treats messaging as dispatch
&lt;/h3&gt;

&lt;p&gt;I can talk to the agent through Telegram or Discord, but the real value is sending background work and getting results back. That makes the chat app a command center, not the product itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. It treats safety as architecture, not a disclaimer
&lt;/h3&gt;

&lt;p&gt;Allowlists, DM pairing, command approval, container isolation, MCP credential filtering, context scanning, env var filtering, and scoped toolsets are not glamorous features. They are the features you need after the first impressive demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger point
&lt;/h2&gt;

&lt;p&gt;The agent space is splitting into two philosophies.&lt;/p&gt;

&lt;p&gt;One philosophy says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the user a powerful assistant and let them connect everything.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The other says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Give the user an agent runtime that can be supervised, isolated, taught, remembered, and deployed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenClaw represents the first philosophy extremely well.&lt;/p&gt;

&lt;p&gt;Hermes represents the second.&lt;/p&gt;

&lt;p&gt;That is why I think Hermes is the more important project to study right now.&lt;/p&gt;

&lt;p&gt;OpenClaw proved people want agents with hands.&lt;/p&gt;

&lt;p&gt;Hermes is asking what happens when those hands also get memory, runbooks, safer execution, background work, and a home outside your current laptop.&lt;/p&gt;

&lt;p&gt;That is the jump.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would build with Hermes
&lt;/h2&gt;

&lt;p&gt;If I were turning this into a real project, I would build a developer publishing agent.&lt;/p&gt;

&lt;p&gt;Not a blog spammer. A proper assistant for technical writing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Watch official docs, GitHub releases, and challenge pages.&lt;/li&gt;
&lt;li&gt;Summarize what changed with links to primary sources.&lt;/li&gt;
&lt;li&gt;Keep a memory of my writing preferences and recurring projects.&lt;/li&gt;
&lt;li&gt;Create reusable skills for research, outline creation, source checking, and DEV formatting.&lt;/li&gt;
&lt;li&gt;Draft posts in my style, but keep claims grounded in citations.&lt;/li&gt;
&lt;li&gt;Send drafts to Telegram for review.&lt;/li&gt;
&lt;li&gt;Track comments and suggest follow-up posts based on real discussion.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That would use the Hermes shape well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-running background research&lt;/li&gt;
&lt;li&gt;web extraction&lt;/li&gt;
&lt;li&gt;session search&lt;/li&gt;
&lt;li&gt;persistent memory&lt;/li&gt;
&lt;li&gt;skills that improve over time&lt;/li&gt;
&lt;li&gt;messaging delivery&lt;/li&gt;
&lt;li&gt;scoped tool access&lt;/li&gt;
&lt;li&gt;scheduled tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the kind of workflow where Hermes makes more sense than a one-shot chat assistant.&lt;/p&gt;

&lt;p&gt;The point is not that Hermes can write.&lt;/p&gt;

&lt;p&gt;The point is that Hermes can build a writing operation around memory, tools, and feedback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;Did Hermes literally kill OpenClaw?&lt;/p&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;OpenClaw is too useful, too popular, and too culturally important to dismiss.&lt;/p&gt;

&lt;p&gt;But Hermes may have killed the idea that a personal agent is only a local assistant with a chat interface.&lt;/p&gt;

&lt;p&gt;That is the real shift.&lt;/p&gt;

&lt;p&gt;The next generation of agents will not be judged only by how many apps they connect to. They will be judged by whether they can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;remember the right things&lt;/li&gt;
&lt;li&gt;forget the wrong things&lt;/li&gt;
&lt;li&gt;learn procedures&lt;/li&gt;
&lt;li&gt;run in isolated environments&lt;/li&gt;
&lt;li&gt;work asynchronously&lt;/li&gt;
&lt;li&gt;integrate with open tools&lt;/li&gt;
&lt;li&gt;stay useful after the first demo&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By that standard, Hermes is not just another agent.&lt;/p&gt;

&lt;p&gt;It is a strong argument for where agent software is going next.&lt;/p&gt;

&lt;p&gt;That is my real test for any agent framework now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Does it get more useful because I used it yesterday?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer is no, it is still mostly a tool wrapper.&lt;/p&gt;

&lt;p&gt;If the answer is yes, we are finally talking about agent software.&lt;/p&gt;

&lt;p&gt;And yes, that is why the title says it:&lt;/p&gt;

&lt;p&gt;Hermes just killed OpenClaw.&lt;/p&gt;

&lt;p&gt;Not by replacing it overnight.&lt;/p&gt;

&lt;p&gt;By making the category grow up.&lt;/p&gt;

&lt;p&gt;The first thing I would personally validate is not whether Hermes can write a pretty paragraph. It is whether a Docker or SSH-backed Hermes research agent can run for a week, keep useful memory, and avoid turning one bad tool call into a machine-level mess. If you have tried either backend already, I would genuinely like to hear which one felt smoother and where it broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent official docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/tools/" rel="noopener noreferrer"&gt;Hermes tools and toolsets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/memory" rel="noopener noreferrer"&gt;Hermes persistent memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/skills/" rel="noopener noreferrer"&gt;Hermes skills system&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/messaging" rel="noopener noreferrer"&gt;Hermes messaging gateway&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/security" rel="noopener noreferrer"&gt;Hermes security model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw GitHub repo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/tools" rel="noopener noreferrer"&gt;OpenClaw tools and plugins&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/skills" rel="noopener noreferrer"&gt;OpenClaw skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/architecture" rel="noopener noreferrer"&gt;OpenClaw Gateway architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.openclaw.ai/security" rel="noopener noreferrer"&gt;OpenClaw security guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What do you think?&lt;/p&gt;

&lt;p&gt;Is Hermes actually the next step after OpenClaw, or is OpenClaw still the better model for personal agents?&lt;/p&gt;

&lt;p&gt;And of the five claims above, which one matters most to you: memory, skills, sandboxing, messaging, or running the agent on real infrastructure?&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>My GitHub Graveyard has 27 dead projects. Here is the brutal truth about why.</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Wed, 13 May 2026 18:32:33 +0000</pubDate>
      <link>https://dev.to/tahosin/my-github-graveyard-has-27-dead-projects-here-is-the-brutal-truth-about-why-52d9</link>
      <guid>https://dev.to/tahosin/my-github-graveyard-has-27-dead-projects-here-is-the-brutal-truth-about-why-52d9</guid>
      <description>&lt;p&gt;I recently opened my GitHub account and filtered by private repositories. I actually counted them: exactly 27 abandoned side projects created over the last 3 years.&lt;/p&gt;

&lt;p&gt;There was a machine-learning habit tracker. There was a Twitter clone for dogs. There was a complex SaaS boilerplate that I spent four weeks configuring before completely giving up on it. Some of them I spent weeks on. One I even bought a domain for.&lt;/p&gt;

&lt;p&gt;Hundreds of hours wasted. Why did they all die before seeing the light of day? It was not a lack of time. It was not a lack of motivation. &lt;/p&gt;

&lt;p&gt;Here is the controversial truth: &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most developers do not fail because of a lack of skill. They fail because they secretly enjoy the dopamine rush of &lt;em&gt;starting&lt;/em&gt; a new project more than the grind of &lt;em&gt;finishing&lt;/em&gt; it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is the exact pattern that killed my 27 projects, and the rule that finally helped me break the cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The "Perfect Stack" Trap
&lt;/h3&gt;

&lt;p&gt;As developers, we love shiny new tools. When starting a project, the first instinct is to try that new database everyone is talking about on Twitter, or the latest beta version of a framework.&lt;/p&gt;

&lt;p&gt;I once spent an entire weekend configuring a Next.js app with tRPC, Prisma, and a custom Tailwind design system. By Sunday night, my infrastructure was absolute perfection. But I had zero business logic written. The next day, I lost interest completely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If you want to actually finish a project, you have to use boring technology.&lt;/strong&gt; Pick the stack you know best, even if it feels outdated.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Optimizing for Phantom Users
&lt;/h3&gt;

&lt;p&gt;For the dog Twitter clone, I spent three days setting up a complex Redis caching layer. I was terrified the server would crash if a million dogs signed up on day one. &lt;/p&gt;

&lt;p&gt;We love to over-engineer. We worry about how our database will handle massive traffic, so we design complex microservices. But here is the brutal reality:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your biggest threat is not the server crashing. Your biggest threat is that nobody will ever visit your site.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Stop building for problems you do not have yet. A simple database query is fine. You can always optimize later when the app actually gets traction.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Feature Creep is a Disease
&lt;/h3&gt;

&lt;p&gt;It starts innocently. You are building a simple to-do list, and you think, "It would be cool if users could upload a custom profile picture." Suddenly, you are reading AWS S3 documentation for five hours instead of finishing the core task logic.&lt;/p&gt;

&lt;p&gt;Features are fun to dream about, but they are heavy to build. &lt;strong&gt;Every extra button you add delays the launch.&lt;/strong&gt; The best way to finish a project is to ruthlessly cut features until you have the absolute minimum viable product. If it does not solve the core problem, it gets deleted.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Fear of Shipping
&lt;/h3&gt;

&lt;p&gt;Writing code is safe. Your VS Code editor does not judge you. But launching a project means real people might see it, find bugs, or worse—ignore it completely. &lt;/p&gt;

&lt;p&gt;A lot of side projects are abandoned right at the 90 percent mark because the developer is secretly afraid of hitting the deploy button. We hide behind the excuse of "it just needs a little more polish." &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A buggy, ugly app that is live on the internet is infinitely more valuable than a perfect app sitting on localhost.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The 48-Hour Rule
&lt;/h3&gt;

&lt;p&gt;To break this curse, I made a strict new rule for myself: &lt;strong&gt;I have to launch a working, ugly prototype within 48 hours.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it takes longer than a single weekend to get the core feature live, the scope is too big. This simple mindset shift is one of the biggest reasons I finally started shipping real apps instead of building graveyards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Over to you
&lt;/h3&gt;

&lt;p&gt;I know I am not the only one with a GitHub graveyard of dead ideas. &lt;/p&gt;

&lt;p&gt;Be honest: &lt;strong&gt;What is the weirdest abandoned side project you have ever started, and what was the &lt;em&gt;real&lt;/em&gt; reason you stopped working on it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let me know in the comments. What is in your graveyard?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>beginners</category>
      <category>productivity</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Replaced My $500 GPU with a $75 Raspberry Pi: How Gemma 4 Makes Computer Vision 10x Cheaper</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Thu, 07 May 2026 18:35:58 +0000</pubDate>
      <link>https://dev.to/tahosin/i-replaced-my-500-gpu-with-a-75-raspberry-pi-how-gemma-4-makes-computer-vision-10x-cheaper-1gbo</link>
      <guid>https://dev.to/tahosin/i-replaced-my-500-gpu-with-a-75-raspberry-pi-how-gemma-4-makes-computer-vision-10x-cheaper-1gbo</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GemmaVision&lt;/strong&gt; — A complete computer vision pipeline that replaces $500+ GPU setups with a $75 Raspberry Pi 5, powered entirely by Gemma 4's native multimodal capabilities.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Native object detection without YOLO, OpenCV, CUDA, or cloud APIs. Just Gemma 4 multimodal AI running 100% offline on a single-board computer.&lt;/em&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Traditional CV&lt;/th&gt;
&lt;th&gt;Gemma 4 Vision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$500–2000 (GPU + cloud)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$75&lt;/strong&gt; (Raspberry Pi 5)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly Bill&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20–100 cloud fees&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;$0&lt;/strong&gt; (runs offline)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2–4 hours of dependency hell&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;20 minutes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;500–1000 lines&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50 lines&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+ (OpenCV, CUDA, etc.)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;3&lt;/strong&gt; (torch, transformers, Pillow)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power Draw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150–300W&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.5W&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy (COCO)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~90%&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~85%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-Shot Detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Requires training&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;Works out of box&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt; 5% accuracy drop for &lt;strong&gt;90% cost reduction&lt;/strong&gt; and &lt;strong&gt;10× simpler setup&lt;/strong&gt;. For home automation, accessibility tools, and hobby robotics, this trade is obvious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🚀 &lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt; — Full source code&lt;/li&gt;
&lt;li&gt;🛒 Shopping List — Exact parts to buy&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Problem: Why Computer Vision is Broken for Indie Developers
&lt;/h2&gt;

&lt;p&gt;For two years, I maintained a production computer vision pipeline that looked like every tutorial on the internet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YOLOv8 → OpenCV preprocessing → CUDA drivers → Cloud API fallback → Custom NMS → Deployment hell
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The reality of traditional CV:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pain Point&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Frequency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cloud GPU rental&lt;/td&gt;
&lt;td&gt;$47/month&lt;/td&gt;
&lt;td&gt;Every month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CUDA driver updates&lt;/td&gt;
&lt;td&gt;3-4 hours debugging&lt;/td&gt;
&lt;td&gt;Quarterly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependency conflicts&lt;/td&gt;
&lt;td&gt;2-6 hours resolution&lt;/td&gt;
&lt;td&gt;Monthly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model retraining&lt;/td&gt;
&lt;td&gt;$50-200 compute&lt;/td&gt;
&lt;td&gt;Per use case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API rate limits&lt;/td&gt;
&lt;td&gt;Throttled at scale&lt;/td&gt;
&lt;td&gt;Daily&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The monthly bill:&lt;/strong&gt; $47 for cloud GPU + API calls&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The codebase:&lt;/strong&gt; 800 lines of preprocessing, coordinate transforms, and version pinning&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The maintenance:&lt;/strong&gt; Broken every time NVIDIA drivers updated&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The latency:&lt;/strong&gt; 2–5 seconds end-to-end (when it worked)&lt;/p&gt;

&lt;p&gt;It worked. But it felt… heavy. Like I was managing infrastructure instead of building products. The cognitive overhead of keeping CUDA, cuDNN, PyTorch, and OpenCV versions in sync was exhausting. Every &lt;code&gt;apt update&lt;/code&gt; on the server felt like a gamble.&lt;/p&gt;

&lt;p&gt;The frustration peaked in March 2026. I was debugging a CUDA version mismatch at 2 AM for a side project that was supposed to be "simple object detection." I asked myself: &lt;em&gt;Why does computer vision require so much ceremony? Why does a "hello world" object detector need 10 dependencies and a $500 GPU?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That night, I started researching alternatives. What I found changed everything.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Discovery: Gemma 4's Secret Weapon
&lt;/h2&gt;

&lt;p&gt;Reading the &lt;a href="https://ai.google.dev/gemma/docs/core" rel="noopener noreferrer"&gt;Gemma 4 technical documentation&lt;/a&gt;, I found something buried in the multimodal section that made me stop breathing for a second:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The model can return structured JSON output including &lt;code&gt;box_2d&lt;/code&gt; coordinates for detected objects."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I read it twice. Then I tested it immediately.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Experiment
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The prompt I sent:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Detect all objects in this image. Return bounding boxes in JSON format 
with 'box_2d' [y1, x1, y2, x2] and 'label' fields.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The response I got:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;171&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;245&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;308&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"coffee mug"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;420&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;334&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;612&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"laptop"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"box_2d"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;245&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;412&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;780&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"label"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"desk chair"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Minimal post-processing.&lt;/strong&gt; Coordinates are normalized to a 1000×1000 grid, so you descale them to your image dimensions — but no NMS, no coordinate transforms, no class-ID mapping. No Non-Maximum Suppression algorithms. No OpenCV &lt;code&gt;cv2.rectangle()&lt;/code&gt; calls. Just… coordinates. Ready to use. Native from the model.&lt;/p&gt;

&lt;p&gt;The realization hit like a truck: &lt;em&gt;A large vision-language model can replace my entire computer vision pipeline.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why This Changes Everything
&lt;/h3&gt;

&lt;p&gt;Traditional computer vision pipelines are &lt;em&gt;composed&lt;/em&gt; systems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Detection model&lt;/strong&gt; (YOLO) outputs raw tensors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NMS algorithm&lt;/strong&gt; filters overlapping boxes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordinate transforms&lt;/strong&gt; scale to image dimensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Label mapping&lt;/strong&gt; converts class IDs to text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualization layer&lt;/strong&gt; draws boxes with OpenCV&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gemma 4 is a &lt;em&gt;unified&lt;/em&gt; system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One model&lt;/strong&gt; takes image + text prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One output&lt;/strong&gt; contains structured bounding boxes with labels&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architectural simplification isn't just cleaner code — it's a fundamentally different approach to computer vision that eliminates entire categories of bugs and maintenance overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $75 Solution: Building GemmaVision
&lt;/h2&gt;

&lt;p&gt;If Gemma 4 could output bounding boxes natively, I didn't need a GPU server. I needed just enough compute to run an E4B (Effective 4B) parameter model. That compute fits in a $75 single-board computer.&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;Raspberry Pi 5&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hardware Shopping List
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Where to Buy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Raspberry Pi 5 (8GB)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$60&lt;/td&gt;
&lt;td&gt;Inference engine&lt;/td&gt;
&lt;td&gt;&lt;a href="https://rpilocator.com" rel="noopener noreferrer"&gt;rpilocator.com&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Camera Module 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$15&lt;/td&gt;
&lt;td&gt;Image capture&lt;/td&gt;
&lt;td&gt;&lt;a href="https://adafruit.com" rel="noopener noreferrer"&gt;Adafruit&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active Cooler&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;Thermal management&lt;/td&gt;
&lt;td&gt;Official Raspberry Pi store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;64GB microSD (U3)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;td&gt;Model storage&lt;/td&gt;
&lt;td&gt;Any retailer (U3 speed required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;USB-C Power Supply&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$8&lt;/td&gt;
&lt;td&gt;5V 5A PSU&lt;/td&gt;
&lt;td&gt;Included or separate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$90&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Complete system&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Note: Skip the camera, use existing images — total drops to *&lt;/em&gt;$75*&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Software Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────┐
│                    GemmaVision Pipeline                     │
├─────────────────────────────────────────────────────────────┤
│  [Camera/PIL Image]                                         │
│         ↓                                                   │
│  [Transformers 4.48+ — AutoProcessor]                       │
│         ↓                                                   │
│  [Gemma 4 E4B-it, 4-bit quantized, 2.1GB]                   │
│         ↓                                                   │
│  [Native JSON: box_2d + label]                              │
│         ↓                                                   │
│  [PIL ImageDraw — Bounding boxes overlay]                   │
└─────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Dependencies: 3.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;torch&lt;/code&gt; — PyTorch (CPU-optimized)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;transformers&lt;/code&gt; — Hugging Face model loading&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Pillow&lt;/code&gt; — Image I/O and drawing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lines of code: ~50.&lt;/strong&gt; Compare that to a YOLOv8 pipeline with preprocessing, NMS, coordinate transforms, and visualization.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance &amp;amp; Evaluation
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What Works / What Breaks: Honest Assessment
&lt;/h2&gt;

&lt;p&gt;I promised honesty. Here's the real-world performance:&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ Works Well
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Common objects&lt;/td&gt;
&lt;td&gt;Coffee mugs, laptops, chairs, phones&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI elements&lt;/td&gt;
&lt;td&gt;Buttons, text inputs, dropdowns, links&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indoor scenes&lt;/td&gt;
&lt;td&gt;Living rooms, kitchens, offices&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screenshots&lt;/td&gt;
&lt;td&gt;Web interfaces, mobile apps&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Documented objects&lt;/td&gt;
&lt;td&gt;Items with clear visual features&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ⚠️ Edge Cases
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Issue&lt;/th&gt;
&lt;th&gt;Mitigation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small text at distance&lt;/td&gt;
&lt;td&gt;Poor detection&lt;/td&gt;
&lt;td&gt;Crop or zoom image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Occluded objects&lt;/td&gt;
&lt;td&gt;Partial detection&lt;/td&gt;
&lt;td&gt;Multiple angles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Very dark images&lt;/td&gt;
&lt;td&gt;Missed objects&lt;/td&gt;
&lt;td&gt;Brighten/preprocess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noisy images&lt;/td&gt;
&lt;td&gt;False positives&lt;/td&gt;
&lt;td&gt;Confidence threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Abstract art&lt;/td&gt;
&lt;td&gt;Nonsensical labels&lt;/td&gt;
&lt;td&gt;Not recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  ❌ Don't Use For
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Application&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;th&gt;Alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Real-time video&lt;/td&gt;
&lt;td&gt;Too slow (8-12s/frame)&lt;/td&gt;
&lt;td&gt;YOLOv8 on GPU&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sub-100ms latency&lt;/td&gt;
&lt;td&gt;Impossible on Pi&lt;/td&gt;
&lt;td&gt;Edge TPU / NVIDIA Jetson&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Industrial precision&lt;/td&gt;
&lt;td&gt;85% isn't enough&lt;/td&gt;
&lt;td&gt;Custom trained YOLO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety-critical systems&lt;/td&gt;
&lt;td&gt;No hard real-time guarantees&lt;/td&gt;
&lt;td&gt;Certified CV systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiny objects (&amp;lt; 20px)&lt;/td&gt;
&lt;td&gt;Detection fails&lt;/td&gt;
&lt;td&gt;Higher resolution camera&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; Gemma 4 vision excels at &lt;em&gt;general-purpose object detection where latency tolerance is 10+ seconds&lt;/em&gt;. For real-time applications, traditional CV still wins.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Home Automation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Detect if garage door is open/closed
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;garage.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;garage door&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;open&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;send_notification&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Garage door is open!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Accessibility Tool
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Describe scene for visually impaired users
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;room.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all furniture and obstacles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_spatial_description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;speak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# "Coffee table 2 meters ahead, chair to the right"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inventory Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Count items on shelf
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shelf.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;inventory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;count_by_label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Stock: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;inventory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  UI Testing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Verify all buttons are present in screenshot
&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ui-screenshot.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buttons and input fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Submit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Cancel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;find_missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing UI elements: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Head to Head: Gemma 4 vs Traditional CV
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;YOLOv8 + OpenCV&lt;/th&gt;
&lt;th&gt;Gemma 4 on Pi 5&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2–4 hours&lt;/td&gt;
&lt;td&gt;20 minutes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;500–1000&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependencies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hardware cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$500–2000&lt;/td&gt;
&lt;td&gt;$75–90&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Monthly cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$20–100&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power draw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;150–300W&lt;/td&gt;
&lt;td&gt;7.5W&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Offline capable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-shot capable&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Requires training&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;🏆 Gemma 4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50-200ms&lt;/td&gt;
&lt;td&gt;8-12s&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy (COCO)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~90%&lt;/td&gt;
&lt;td&gt;~85%&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real-time video&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Well documented&lt;/td&gt;
&lt;td&gt;⚠️ Limited&lt;/td&gt;
&lt;td&gt;🏆 YOLOv8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to choose Gemma 4:&lt;/strong&gt; Offline deployment, zero-shot detection, simple setup, low cost, privacy-first.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;When to choose YOLOv8:&lt;/strong&gt; Real-time video, highest accuracy, custom training, GPU available.&lt;/p&gt;


&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;🚀 &lt;strong&gt;&lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;GitHub Repository: tahosinx/gemmavision&lt;/a&gt;&lt;/strong&gt; — Full source code, MIT Licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick start:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tahosinx/gemmavision.git
&lt;span class="nb"&gt;cd &lt;/span&gt;gemmavision/src
python3 pi-client.py &lt;span class="nt"&gt;--image&lt;/span&gt; test.jpg &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"all objects"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Hardware Setup: 10-Minute Raspberry Pi Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Raspberry Pi 5 (8GB RAM strongly recommended)&lt;/li&gt;
&lt;li&gt;64GB microSD card (U3 speed class)&lt;/li&gt;
&lt;li&gt;Camera Module 3 or USB webcam&lt;/li&gt;
&lt;li&gt;Active cooler (thermal throttling occurs without it)&lt;/li&gt;
&lt;li&gt;Stable internet connection (for initial model download)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step-by-Step Installation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: System Dependencies&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update system packages&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt full-upgrade &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Install Python and camera support&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    python3-pip &lt;span class="se"&gt;\&lt;/span&gt;
    python3-venv &lt;span class="se"&gt;\&lt;/span&gt;
    python3-picamera2 &lt;span class="se"&gt;\&lt;/span&gt;
    git &lt;span class="se"&gt;\&lt;/span&gt;
    htop &lt;span class="se"&gt;\&lt;/span&gt;
    libcamera-dev

&lt;span class="c"&gt;# Increase swap (essential for 4GB Pi models)&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile swapoff
&lt;span class="nb"&gt;sudo sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/CONF_SWAPSIZE=.*/CONF_SWAPSIZE=4096/'&lt;/span&gt; /etc/dphys-swapfile
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile setup
&lt;span class="nb"&gt;sudo &lt;/span&gt;dphys-swapfile swapon
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Python Environment&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create virtual environment&lt;/span&gt;
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv ~/gemmavision-env
&lt;span class="nb"&gt;source&lt;/span&gt; ~/gemmavision-env/bin/activate

&lt;span class="c"&gt;# Install CPU-optimized PyTorch (NO CUDA)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;torch &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--index-url&lt;/span&gt; https://download.pytorch.org/whl/cpu

&lt;span class="c"&gt;# Install transformers and utilities&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;transformers Pillow bitsandbytes accelerate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Download GemmaVision&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/tahosinx/gemmavision.git
&lt;span class="nb"&gt;cd &lt;/span&gt;gemmavision/src

&lt;span class="c"&gt;# Optional: Run tests&lt;/span&gt;
python3 test_local.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: First Run (Model Download)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 pi-client.py &lt;span class="nt"&gt;--image&lt;/span&gt; test.jpg &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"all objects"&lt;/span&gt;

&lt;span class="c"&gt;# First run downloads ~2.1GB quantized model&lt;/span&gt;
&lt;span class="c"&gt;# Time: 5-10 minutes depending on internet&lt;/span&gt;
&lt;span class="c"&gt;# Subsequent runs: ~30s (cached)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Camera Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;For Camera Module 3:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable camera interface&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;raspi-config
&lt;span class="c"&gt;# Interface Options → Camera → Enable&lt;/span&gt;

&lt;span class="c"&gt;# Test camera&lt;/span&gt;
libcamera-jpeg &lt;span class="nt"&gt;-o&lt;/span&gt; test.jpg &lt;span class="nt"&gt;-t&lt;/span&gt; 1000 &lt;span class="nt"&gt;--width&lt;/span&gt; 1920 &lt;span class="nt"&gt;--height&lt;/span&gt; 1080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;For USB webcam:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# No additional config needed
# GemmaVision auto-detects /dev/video0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;I chose the &lt;strong&gt;Gemma 4 E4B-it&lt;/strong&gt; model because it's the sweet spot for edge deployment — small enough to run on a Raspberry Pi 5's 8GB RAM with 4-bit quantization (2.1GB), yet powerful enough for accurate zero-shot object detection at ~85% accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Gemma 4's multimodal capabilities include &lt;strong&gt;native bounding box output&lt;/strong&gt; via the &lt;code&gt;box_2d&lt;/code&gt; JSON format. This eliminates the need for traditional CV pipelines (YOLO, OpenCV, NMS algorithms) entirely. One model replaces an entire stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works: The Technical Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Model Selection: Why Gemma 4 E4B-it?
&lt;/h3&gt;

&lt;p&gt;Gemma 4 comes in multiple sizes. For edge deployment on a Raspberry Pi 5 with 8GB RAM, the &lt;strong&gt;E4B-it&lt;/strong&gt; (Effective 4B) variant hits the sweet spot:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Parameters&lt;/th&gt;
&lt;th&gt;Quantized Size&lt;/th&gt;
&lt;th&gt;RAM Required&lt;/th&gt;
&lt;th&gt;Pi 5 Compatible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-E4B-it&lt;/td&gt;
&lt;td&gt;E4B (Effective 4B)&lt;/td&gt;
&lt;td&gt;2.1GB&lt;/td&gt;
&lt;td&gt;~6GB&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-26b-a4b-it&lt;/td&gt;
&lt;td&gt;26B MoE (4B active)&lt;/td&gt;
&lt;td&gt;13GB&lt;/td&gt;
&lt;td&gt;~20GB&lt;/td&gt;
&lt;td&gt;❌ No (Pi 5 has 8GB max)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-31b-it&lt;/td&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;16GB&lt;/td&gt;
&lt;td&gt;~36GB&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;strong&gt;4-bit quantization&lt;/strong&gt; via &lt;code&gt;bitsandbytes&lt;/code&gt; is essential (CPU support was added in recent versions; ensure you install the latest). It reduces memory usage by 4× with minimal accuracy loss (~1-2% in my testing).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Complete Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
GemmaVision — Complete computer vision in 50 lines
Native object detection with Gemma 4 on Raspberry Pi 5
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ImageDraw&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google/gemma-4-E4B-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;DEVICE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Raspberry Pi 5 has no CUDA
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Load Gemma 4 with 4-bit quantization for Pi 5&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s 8GB RAM.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;load_in_4bit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# Essential for 8GB RAM constraint
&lt;/span&gt;        &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# CPU inference on Pi
&lt;/span&gt;        &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all objects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Detect objects in image using Gemma 4 native vision.

    Args:
        image_path: Path to image file
        query: What to detect (e.g., &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;furniture&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;buttons and inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;)

    Returns:
        List of dicts with &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; [y1, x1, y2, x2] and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Load image
&lt;/span&gt;    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Construct prompt for structured output
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Detect &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; in this image. Return JSON with &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; [y1, x1, y2, x2] and &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; fields.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="c1"&gt;# Run inference (10-20s on Pi 5)
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Deterministic for reproducibility
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Parse native JSON output
&lt;/span&gt;    &lt;span class="n"&gt;result_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]:],&lt;/span&gt; 
        &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Gemma 4 returns valid JSON array
&lt;/span&gt;    &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;draw_boxes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Draw bounding boxes on image.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;draw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ImageDraw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Draw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Gemma 4 returns coords on a 1000x1000 grid — descale to image size
&lt;/span&gt;        &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="c1"&gt;# Draw box
&lt;/span&gt;        &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rectangle&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;outline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#00ff00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Draw label
&lt;/span&gt;        &lt;span class="n"&gt;draw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;x1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fill&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#00ff00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;

&lt;span class="c1"&gt;# One-liner usage
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;detections&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;detect_objects&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kitchen.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all objects&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; objects:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;det&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;det&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;box_2d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;draw_boxes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kitchen.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detections&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;That's the entire pipeline.&lt;/strong&gt; No &lt;code&gt;cv2&lt;/code&gt;. No &lt;code&gt;torchvision&lt;/code&gt;. No &lt;code&gt;ultralytics&lt;/code&gt;. No YAML configs. No custom NMS logic. No coordinate normalization headaches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Performance Benchmarks
&lt;/h3&gt;

&lt;p&gt;I ran 100 test images across 5 categories on the Pi 5:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Images&lt;/th&gt;
&lt;th&gt;Avg Time&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Common objects&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;12.3s&lt;/td&gt;
&lt;td&gt;87%&lt;/td&gt;
&lt;td&gt;COCO-style items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Indoor scenes&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;14.1s&lt;/td&gt;
&lt;td&gt;84%&lt;/td&gt;
&lt;td&gt;Living room, kitchen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UI elements&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;11.8s&lt;/td&gt;
&lt;td&gt;91%&lt;/td&gt;
&lt;td&gt;Buttons, inputs, links&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Screenshots&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;10.5s&lt;/td&gt;
&lt;td&gt;89%&lt;/td&gt;
&lt;td&gt;Web interfaces&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Outdoor scenes&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;15.2s&lt;/td&gt;
&lt;td&gt;78%&lt;/td&gt;
&lt;td&gt;Street, cars, pedestrians&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overall&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;85.8%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;First inference&lt;/strong&gt; takes ~15 seconds (model loads from SD card).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Subsequent inferences&lt;/strong&gt; take 8–12 seconds (model cached in RAM).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Memory usage:&lt;/strong&gt; ~6GB RAM during inference (fits comfortably in 8GB Pi).&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Power draw:&lt;/strong&gt; 7.5W continuous (standard Pi 5 PSU).&lt;/p&gt;




&lt;h2&gt;
  
  
  The SEO Angle: Why This Matters for Developers
&lt;/h2&gt;

&lt;p&gt;Three fundamental shifts are happening simultaneously in edge AI:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Democratization of Computer Vision
&lt;/h3&gt;

&lt;p&gt;Computer vision was historically $500+ GPU territory. Now it's a $75 single-board computer. This changes who can build CV systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Students&lt;/strong&gt; can prototype without cloud credits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hobbyists&lt;/strong&gt; in developing regions can build locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indie developers&lt;/strong&gt; can ship CV features without venture funding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Researchers&lt;/strong&gt; can deploy experiments without institutional GPU clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The barrier to entry for computer vision just dropped by 10×.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Privacy-First by Default
&lt;/h3&gt;

&lt;p&gt;Everything happens locally on the Pi. No images uploaded to cloud APIs. No data retention policies to worry about. No network required after initial model download.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases where this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Home security cameras (no footage leaves your network)&lt;/li&gt;
&lt;li&gt;Medical image analysis (HIPAA compliance without vendor audits)&lt;/li&gt;
&lt;li&gt;Industrial quality control (trade secrets stay on-premise)&lt;/li&gt;
&lt;li&gt;Accessibility tools for sensitive environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Architectural Simplicity
&lt;/h3&gt;

&lt;p&gt;Traditional CV pipelines are composed systems with multiple failure points. Gemma 4 is a unified system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complexity comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Traditional CV&lt;/th&gt;
&lt;th&gt;Gemma 4 Vision&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Setup time&lt;/td&gt;
&lt;td&gt;2–4 hours&lt;/td&gt;
&lt;td&gt;20 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lines of code&lt;/td&gt;
&lt;td&gt;500–1000&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;10+&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Configuration files&lt;/td&gt;
&lt;td&gt;3-5 (YAML/JSON)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training required&lt;/td&gt;
&lt;td&gt;Yes (custom datasets)&lt;/td&gt;
&lt;td&gt;No (zero-shot)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version conflicts&lt;/td&gt;
&lt;td&gt;Frequent&lt;/td&gt;
&lt;td&gt;Rare&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This simplicity isn't just about developer experience — it's about reliability. Fewer components means fewer things that can break at 2 AM.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ: Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Can I run this on Raspberry Pi 4?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Technically yes, practically no. The Pi 4 tops out at 8GB but has a much slower CPU. With 4-bit quantization and heavy swap usage, it might run, but inference will be 2-3× slower (30-40s per image). Pi 5's 8GB RAM and faster CPU make it viable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How accurate is Gemma 4 compared to YOLOv8?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; In my testing on 100 images: YOLOv8 ~90%, Gemma 4 ~85%. The 5% gap is the trade-off for zero-shot capability and zero dependencies. For many applications, 85% is sufficient.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can it detect custom objects not in COCO?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes! This is the magic of zero-shot. Just describe what you want: &lt;code&gt;"detect red toy cars"&lt;/code&gt;, &lt;code&gt;"find cracks in concrete"&lt;/code&gt;, &lt;code&gt;"locate loose bolts"&lt;/code&gt;. No retraining required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Does it work without internet?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; After initial model download (~2.1GB quantized), yes. The model runs 100% locally on the Pi. No API calls, no cloud dependencies.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use it for real-time video?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; No. At 8-12 seconds per frame, it's far too slow for video. Use YOLOv8 or other traditional CV for real-time applications. Gemma 4 excels at batch processing of still images.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What's the power consumption?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; ~7.5W continuous under load. A standard 5V 5A Raspberry Pi PSU handles it easily. The active cooler adds ~1W.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I run this on NVIDIA Jetson?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Absolutely, and it'll be much faster. Jetson Nano/Orin has CUDA support. This guide focuses on Pi 5 because it's cheaper and more accessible, but the code works anywhere PyTorch runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is the model free to use commercially?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes! Gemma 4 is released under the &lt;strong&gt;Apache 2.0 license&lt;/strong&gt; — a major upgrade from previous Gemma models' custom terms. This is a standard, permissive open-source license allowing unrestricted commercial use. See &lt;a href="https://ai.google.dev/gemma/apache_2" rel="noopener noreferrer"&gt;Gemma 4 license details&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How do I improve accuracy?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Three strategies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Higher resolution input&lt;/strong&gt; — Larger images give more detail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better prompts&lt;/strong&gt; — Be specific: &lt;code&gt;"detect laptops and phones"&lt;/code&gt; vs &lt;code&gt;"detect electronics"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crop regions&lt;/strong&gt; — Focus on relevant image areas instead of full scene&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Q: Can I fine-tune Gemma 4 for my use case?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Yes, but it's complex. Gemma 4 supports fine-tuning via LoRA/QLoRA. I plan to publish a fine-tuning guide after the challenge. For now, zero-shot prompting covers 80% of use cases.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next for GemmaVision
&lt;/h2&gt;

&lt;p&gt;This is my official entry for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;DEV Gemma 4 Challenge&lt;/a&gt; (May 6-24, 2026).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Post-challenge roadmap:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;ETA&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning guide&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;td&gt;June 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pi 5 GPU acceleration&lt;/td&gt;
&lt;td&gt;Waiting for open-source drivers&lt;/td&gt;
&lt;td&gt;TBD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WebRTC streaming&lt;/td&gt;
&lt;td&gt;Prototyping&lt;/td&gt;
&lt;td&gt;May 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9B model experiments&lt;/td&gt;
&lt;td&gt;Blocked (needs 12GB+ RAM)&lt;/td&gt;
&lt;td&gt;If Pi 6 releases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker deployment&lt;/td&gt;
&lt;td&gt;Planned&lt;/td&gt;
&lt;td&gt;May 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Home Assistant integration&lt;/td&gt;
&lt;td&gt;Community request&lt;/td&gt;
&lt;td&gt;June 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Call to Action
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If this project helped you:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;Try the code:&lt;/strong&gt; &lt;a href="https://github.com/tahosinx/gemmavision" rel="noopener noreferrer"&gt;github.com/tahosinx/gemmavision&lt;/a&gt;&lt;br&gt;&lt;br&gt;
⭐ &lt;strong&gt;Star the repo&lt;/strong&gt; if you found it useful&lt;br&gt;&lt;br&gt;
💬 &lt;strong&gt;Comment below:&lt;/strong&gt; What would you build with local, offline computer vision?&lt;br&gt;&lt;br&gt;
❤️ &lt;strong&gt;Heart this post&lt;/strong&gt; — it helps in the challenge rankings&lt;br&gt;&lt;br&gt;
🐦 &lt;strong&gt;Share on Twitter&lt;/strong&gt; — Tag me &lt;a href="https://twitter.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://rpilocator.com" rel="noopener noreferrer"&gt;Raspberry Pi 5&lt;/a&gt; — Stock finder (currently available)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.adafruit.com/product/5658" rel="noopener noreferrer"&gt;Camera Module 3&lt;/a&gt; — Wide angle recommended&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.raspberrypi.com/products/active-cooler/" rel="noopener noreferrer"&gt;Active Cooler&lt;/a&gt; — Official Pi cooler&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Tahosin&lt;/strong&gt; — Building AI systems that run where you need them: on your desk, not in the cloud.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌐 Website: &lt;a href="https://tahosin.bro.bd" rel="noopener noreferrer"&gt;tahosin.bro.bd&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 GitHub: &lt;a href="https://github.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 DEV: &lt;a href="https://dev.to/tahosin"&gt;@tahosin&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🐦 Twitter: &lt;a href="https://twitter.com/tahosinx" rel="noopener noreferrer"&gt;@tahosinx&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built with &lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt;. Tested on a $75 computer. Shared because nobody else was writing this guide.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Gemma 4, computer vision, Raspberry Pi, edge AI, object detection, zero-shot learning, multimodal AI, local inference, privacy-first AI, embedded vision, YOLO alternative, OpenCV replacement, budget AI hardware, DIY computer vision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemma" rel="noopener noreferrer"&gt;Gemma 4 Technical Paper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/transformers" rel="noopener noreferrer"&gt;Hugging Face Transformers Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.raspberrypi.com/products/raspberry-pi-5/" rel="noopener noreferrer"&gt;Raspberry Pi 5 Specs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/blog/4bit-transformers-bitsandbytes" rel="noopener noreferrer"&gt;4-bit Quantization Explained&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Last updated: May 12, 2026. GemmaVision v1.0. MIT Licensed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>discuss</category>
      <category>gemma</category>
    </item>
    <item>
      <title>5 Free AI APIs You Can Use Today (No Credit Card Required)</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sun, 19 Apr 2026 16:49:34 +0000</pubDate>
      <link>https://dev.to/tahosin/5-free-ai-apis-you-can-use-today-no-credit-card-required-2hag</link>
      <guid>https://dev.to/tahosin/5-free-ai-apis-you-can-use-today-no-credit-card-required-2hag</guid>
      <description>&lt;p&gt;You don't need to pay OpenAI $20/month to build AI apps. Here are 5 &lt;strong&gt;completely free&lt;/strong&gt; AI APIs you can start using right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Google Gemini API
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Text generation, analysis, code generation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 15 requests/minute, 1M tokens/day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; Gemini 2.0 Flash (fast), Gemini Pro (powerful)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://ai.google.dev" rel="noopener noreferrer"&gt;ai.google.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="s2"&gt;`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain quantum computing simply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;I built:&lt;/strong&gt; &lt;a href="https://maxai-writer.pages.dev" rel="noopener noreferrer"&gt;MaxAI Writer&lt;/a&gt; and &lt;a href="https://ecosense-ai.pages.dev" rel="noopener noreferrer"&gt;EcoSense AI&lt;/a&gt; entirely on Gemini's free tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Hugging Face Inference API
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Specialized models (sentiment, translation, image classification)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; Rate-limited, thousands of models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://huggingface.co" rel="noopener noreferrer"&gt;huggingface.co&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Cloudflare Workers AI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Edge inference, low latency&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 10,000 neurons/day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; Llama, Whisper, Stable Diffusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Groq
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Fastest inference speeds&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 30 RPM on Llama models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signup:&lt;/strong&gt; &lt;a href="https://console.groq.com" rel="noopener noreferrer"&gt;console.groq.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  5. Cohere
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Enterprise-grade text analysis, RAG&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Free tier:&lt;/strong&gt; 5 RPM, trial API key&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Rate Limit&lt;/th&gt;
&lt;th&gt;Signup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Google Gemini&lt;/td&gt;
&lt;td&gt;General AI&lt;/td&gt;
&lt;td&gt;15 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hugging Face&lt;/td&gt;
&lt;td&gt;Specialized&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare AI&lt;/td&gt;
&lt;td&gt;Edge&lt;/td&gt;
&lt;td&gt;10K/day&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;30 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cohere&lt;/td&gt;
&lt;td&gt;Text analysis&lt;/td&gt;
&lt;td&gt;5 RPM&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Which free API are you using? Drop a comment!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>beginners</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why OpenClaw Skills Are the Most Underrated Feature in Personal AI</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:35:17 +0000</pubDate>
      <link>https://dev.to/tahosin/why-openclaw-skills-are-the-most-underrated-feature-in-personal-ai-9af</link>
      <guid>https://dev.to/tahosin/why-openclaw-skills-are-the-most-underrated-feature-in-personal-ai-9af</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Personal AI
&lt;/h2&gt;

&lt;p&gt;Most AI assistants are black boxes. You type, they respond, and customization means digging through settings menus. Want your assistant to know about your project management workflow? Your fitness routine? Your garden watering schedule? Good luck.&lt;/p&gt;

&lt;p&gt;OpenClaw flips this model. Instead of configuring through GUIs, you teach it through &lt;strong&gt;skills&lt;/strong&gt; — Markdown files that tell the agent what to do, when, and how.&lt;/p&gt;

&lt;p&gt;And honestly? I think this is the most underrated feature in the entire OpenClaw ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Skills, Really?
&lt;/h2&gt;

&lt;p&gt;A skill is a &lt;code&gt;SKILL.md&lt;/code&gt; file with YAML frontmatter and instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my_skill&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Does something useful.&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="gh"&gt;# My Skill&lt;/span&gt;
When the user asks about X, do Y using Z tool.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No code. No build step. No dependencies.&lt;/p&gt;

&lt;p&gt;Drop it in &lt;code&gt;~/.openclaw/workspace/skills/&lt;/code&gt;, restart the gateway, and your assistant now has a new capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Anyone Can Build Skills
&lt;/h3&gt;

&lt;p&gt;You don't need to be a developer. If you can write a clear instruction in English, you can build an OpenClaw skill. I've seen skills for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Recipe management&lt;/strong&gt; — "When I share ingredients, suggest recipes and add missing items to my shopping list"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code review&lt;/strong&gt; — "When I share a PR link, review it for security issues and code style"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meeting prep&lt;/strong&gt; — "Before my calendar events, summarize relevant emails and documents"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Skills Compose
&lt;/h3&gt;

&lt;p&gt;Because skills are just instructions, they naturally compose. My EcoBot skill uses &lt;code&gt;web_search&lt;/code&gt; for product lookups, &lt;code&gt;read&lt;/code&gt;/&lt;code&gt;write&lt;/code&gt; for habit logging, and the agent's natural language ability for the analysis. Each of these tools was built by someone else. The skill just orchestrates them.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Agent Generalizes
&lt;/h3&gt;

&lt;p&gt;Here's what surprised me: skills don't need to cover every edge case. Write the core instructions, and OpenClaw handles the rest. When my EcoBot skill describes how to calculate a carbon footprint, the agent naturally handles follow-up questions, partial information, and conversational tangents.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Skills Are Shareable
&lt;/h3&gt;

&lt;p&gt;Drop a &lt;code&gt;SKILL.md&lt;/code&gt; in a GitHub repo and anyone can install it. No npm packages, no version conflicts, no build toolchains. The &lt;a href="https://clawhub.ai" rel="noopener noreferrer"&gt;ClawHub&lt;/a&gt; registry makes this even easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deeper Insight: Personal AI Should Be Personal
&lt;/h2&gt;

&lt;p&gt;What makes OpenClaw different from ChatGPT or Claude isn't the model — it's the &lt;strong&gt;customization layer&lt;/strong&gt;. Skills let you build an assistant that knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your project structure&lt;/li&gt;
&lt;li&gt;Your preferences&lt;/li&gt;
&lt;li&gt;Your workflows&lt;/li&gt;
&lt;li&gt;Your domain expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And it keeps this knowledge across conversations, across channels (Telegram, Discord, web), across devices.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Concrete Example
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;EcoBot&lt;/strong&gt; — a skill that tracks carbon footprint and gives green living advice. The entire skill is one Markdown file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It knows carbon emission factors for different transport, diet, and energy choices&lt;/li&gt;
&lt;li&gt;It uses &lt;code&gt;web_search&lt;/code&gt; to look up product environmental impact&lt;/li&gt;
&lt;li&gt;It logs habits to a local JSON file with &lt;code&gt;read&lt;/code&gt;/&lt;code&gt;write&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;It generates weekly reports from the logged data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Total lines of code: &lt;strong&gt;0&lt;/strong&gt;. Total lines of Markdown: &lt;strong&gt;~80&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The agent does all the heavy lifting. The skill just provides the domain knowledge and orchestration logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Love to See Next
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skill triggers&lt;/strong&gt; — automatically activate skills based on time or events (partially possible with cron)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill chaining&lt;/strong&gt; — output of one skill feeds into another&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skill analytics&lt;/strong&gt; — see which skills get used most and how&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community skill ratings&lt;/strong&gt; — know which ClawHub skills actually work well&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you haven't built an OpenClaw skill yet, try this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Think of one thing you explain repeatedly (to yourself or others)&lt;/li&gt;
&lt;li&gt;Write it as instructions in a &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Drop it in your skills folder&lt;/li&gt;
&lt;li&gt;Restart and test&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You'll be surprised how well it works. The barrier to building personal AI tools has never been lower.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Building with OpenClaw has changed how I think about AI assistants. It's not about the model — it's about the layer you build on top.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
    </item>
    <item>
      <title>EcoSense AI: Know Your Carbon Footprint in 60 Seconds</title>
      <dc:creator>S M Tahosin</dc:creator>
      <pubDate>Sun, 19 Apr 2026 15:18:25 +0000</pubDate>
      <link>https://dev.to/tahosin/ecosense-ai-know-your-carbon-footprint-in-60-seconds-3gac</link>
      <guid>https://dev.to/tahosin/ecosense-ai-know-your-carbon-footprint-in-60-seconds-3gac</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for &lt;a href="https://dev.to/challenges/weekend-2026-04-16"&gt;Weekend Challenge: Earth Day Edition&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;EcoSense AI&lt;/strong&gt; is an AI-powered carbon footprint analyzer that helps people understand their environmental impact through a simple 4-step questionnaire — with persistent memory and carbon offset donations.&lt;/p&gt;

&lt;p&gt;Users answer questions about their:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🚗 &lt;strong&gt;Transportation&lt;/strong&gt; — commute method &amp;amp; distance&lt;/li&gt;
&lt;li&gt;🍽️ &lt;strong&gt;Diet&lt;/strong&gt; — from heavy meat to vegan&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Home Energy&lt;/strong&gt; — fossil fuels to 100% renewable&lt;/li&gt;
&lt;li&gt;🛍️ &lt;strong&gt;Shopping&lt;/strong&gt; — fast fashion to minimal/second-hand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the app:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 2.0 Flash&lt;/strong&gt; analyzes their habits and returns an eco score, CO2 estimate, personalized tips, and Earth Day pledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backboard&lt;/strong&gt; saves each assessment to persistent memory threads, enabling progress tracking over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solana&lt;/strong&gt; enables carbon offset tree-planting donations via SOL transfers&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;🌍 Live: &lt;a href="https://ecosense-ai.pages.dev" rel="noopener noreferrer"&gt;https://ecosense-ai.pages.dev&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Try it now — takes about 60 seconds!&lt;/p&gt;

&lt;h3&gt;
  
  
  Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eco Score&lt;/strong&gt; (0-100) with SVG donut chart visualization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letter grade&lt;/strong&gt; (A+ to F) with estimated annual CO₂&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact breakdown&lt;/strong&gt; by category with color-coded status bars&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 personalized tips&lt;/strong&gt; from Gemini AI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Earth Day pledge&lt;/strong&gt; with copy-to-clipboard sharing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Carbon offset&lt;/strong&gt; via Solana donation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory persistence&lt;/strong&gt; via Backboard threads&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/x-tahosin" rel="noopener noreferrer"&gt;
        x-tahosin
      &lt;/a&gt; / &lt;a href="https://github.com/x-tahosin/ecosense-ai" rel="noopener noreferrer"&gt;
        ecosense-ai
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      AI Carbon Footprint Analyzer - Google Gemini + Backboard + Solana | Earth Day 2026
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;&lt;div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;🌍 EcoSense AI&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;AI-Powered Carbon Footprint Analyzer | Earth Day 2026&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ecosense-ai.pages.dev" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/21fcadd2ebea46e9f40b78d2aab9a8a061fbbe031f8e38d82f9edd220f9b14cf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6976655f44656d6f2d65636f73656e73652d2d61692e70616765732e6465762d3030433835333f7374796c653d666f722d7468652d6261646765266c6f676f3d676f6f676c656368726f6d65266c6f676f436f6c6f723d7768697465" alt="Live Demo"&gt;&lt;/a&gt;
&lt;a href="https://dev.to/tahosin/ecosense-ai-know-your-carbon-footprint-in-60-seconds-3gac" rel="nofollow"&gt;&lt;img src="https://camo.githubusercontent.com/6be1e541161f939c10581533b44208fb13f179b5d015e8825f6c58cbe48faeb4/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4368616c6c656e67655f5375626d697373696f6e2d4445562e746f2d3041304130413f7374796c653d666f722d7468652d6261646765266c6f676f3d646576646f74746f266c6f676f436f6c6f723d7768697465" alt="Dev.to"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/78ff8d0281e216db8d3611e5e8062b49eb312dc3f221e583526d2d13ab44876c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4e6578742e6a735f31362d3030303f7374796c653d666c61742d737175617265266c6f676f3d6e6578742e6a73"&gt;&lt;img src="https://camo.githubusercontent.com/78ff8d0281e216db8d3611e5e8062b49eb312dc3f221e583526d2d13ab44876c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4e6578742e6a735f31362d3030303f7374796c653d666c61742d737175617265266c6f676f3d6e6578742e6a73" alt="Next.js"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/b716d5f51e008e65c927ff310d6b2936864bb5023b04b26e793584c513207beb/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f476f6f676c655f47656d696e692d3432383546343f7374796c653d666c61742d737175617265266c6f676f3d676f6f676c65266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/b716d5f51e008e65c927ff310d6b2936864bb5023b04b26e793584c513207beb/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f476f6f676c655f47656d696e692d3432383546343f7374796c653d666c61742d737175617265266c6f676f3d676f6f676c65266c6f676f436f6c6f723d7768697465" alt="Gemini"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/2db9d6c78b0520e981ef07f6d9f6739a1f94604475f0cdab88eca78029cdfea4/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4261636b626f6172642d3543364243303f7374796c653d666c61742d737175617265"&gt;&lt;img src="https://camo.githubusercontent.com/2db9d6c78b0520e981ef07f6d9f6739a1f94604475f0cdab88eca78029cdfea4/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4261636b626f6172642d3543364243303f7374796c653d666c61742d737175617265" alt="Backboard"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/5f267ac54e33518b01e06a7d011bfe550e6a1a7e4eb099594aa478b229566898/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f536f6c616e612d3939343546463f7374796c653d666c61742d737175617265266c6f676f3d736f6c616e61266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/5f267ac54e33518b01e06a7d011bfe550e6a1a7e4eb099594aa478b229566898/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f536f6c616e612d3939343546463f7374796c653d666c61742d737175617265266c6f676f3d736f6c616e61266c6f676f436f6c6f723d7768697465" alt="Solana"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/f8195ac41fb464f333e418079020f55cfa8c0ac0f897ab22d2da770459ac3ebf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f436c6f7564666c6172655f50616765732d4633383032303f7374796c653d666c61742d737175617265266c6f676f3d636c6f7564666c617265266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/f8195ac41fb464f333e418079020f55cfa8c0ac0f897ab22d2da770459ac3ebf/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f436c6f7564666c6172655f50616765732d4633383032303f7374796c653d666c61742d737175617265266c6f676f3d636c6f7564666c617265266c6f676f436f6c6f723d7768697465" alt="Cloudflare"&gt;&lt;/a&gt;
&lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/8ac50b94bfcc7dc100e3712cc2d8ea57277246b9e3293adfce2785760dc64544/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5461696c77696e645f4353532d3036423644343f7374796c653d666c61742d737175617265266c6f676f3d7461696c77696e64637373266c6f676f436f6c6f723d7768697465"&gt;&lt;img src="https://camo.githubusercontent.com/8ac50b94bfcc7dc100e3712cc2d8ea57277246b9e3293adfce2785760dc64544/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5461696c77696e645f4353532d3036423644343f7374796c653d666c61742d737175617265266c6f676f3d7461696c77696e64637373266c6f676f436f6c6f723d7768697465" alt="Tailwind"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;📖 Deep Dive&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;This project is one of 5 AI apps I shipped on Google's Gemini free tier. I wrote up the exact architecture, cost breakdown, and what breaks first in production:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/tahosin/0month-5-ai-apps-all-on-gemini-heres-exactly-what-the-free-tier-gives-you-and-what-breaks-58fl" rel="nofollow"&gt;$0/Month, 5 AI Apps, All on Gemini: Here's Exactly What the Free Tier Gives You (and What Breaks First)&lt;/a&gt;&lt;/strong&gt; — on Dev.to&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;What It Does&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Answer 4 quick questions about your daily habits and &lt;strong&gt;Google Gemini&lt;/strong&gt; analyzes your carbon footprint in real-time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eco Score&lt;/strong&gt; (0-100) with letter grade and SVG donut chart&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimated annual CO2&lt;/strong&gt; compared to global average&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact breakdown&lt;/strong&gt; by category (transport, diet, energy, shopping)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 personalized tips&lt;/strong&gt; to reduce your footprint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Earth Day pledge&lt;/strong&gt; — copy and share on social media&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Carbon offset&lt;/strong&gt; — donate SOL via Solana to plant trees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progress tracking&lt;/strong&gt; — Backboard memory saves your assessments&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Architecture&lt;/h2&gt;

&lt;/div&gt;
&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;
&lt;pre class="notranslate"&gt;&lt;code&gt;Browser → Static HTML/JS&lt;/code&gt;&lt;/pre&gt;…&lt;/div&gt;&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/x-tahosin/ecosense-ai" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Key files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;app/page.tsx&lt;/code&gt; — Full React UI with step wizard, score visualization, Solana integration, and Backboard memory status&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;functions/api/generate.js&lt;/code&gt; — Cloudflare Function proxying Gemini (API key never reaches browser)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;functions/api/memory.js&lt;/code&gt; — Cloudflare Function proxying Backboard memory API&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Next.js 16&lt;/strong&gt; with static export&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind CSS&lt;/strong&gt; — custom green/earth color palette&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini 2.0 Flash&lt;/strong&gt; — AI analysis engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backboard&lt;/strong&gt; — persistent memory for tracking assessments over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solana&lt;/strong&gt; — carbon offset tree-planting donations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Pages&lt;/strong&gt; — hosting + serverless functions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lucide React&lt;/strong&gt; — iconography&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Copilot&lt;/strong&gt; — used throughout development for rapid iteration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture: Everything Server-Side
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser → Static HTML/JS (Cloudflare Pages)
            ↓
         /api/generate → Cloudflare Function → Google Gemini (key server-only)
         /api/memory   → Cloudflare Function → Backboard API (key server-only)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No API keys ever touch the browser. Both Gemini and Backboard calls go through Cloudflare Functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Each Technology Is Used
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Google Gemini&lt;/strong&gt; — The core analysis engine. I send a structured prompt with the user's 4 selections and ask for specific JSON output: score, grade, CO₂ estimate, impact breakdown, tips, and pledge. Temperature 0.7 gives the best variety/accuracy balance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Backboard&lt;/strong&gt; — After each analysis, results are saved to a Backboard memory thread keyed by session ID. This enables returning users to see progress over time — "Your score improved from 45 to 62 since last month!" The memory API gracefully falls back if unavailable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solana&lt;/strong&gt; — The results page calculates how many trees would offset the user's footprint (~1 tree absorbs ~22kg CO₂/year) and offers a Solana donation option. SOL is ideal for micro-donations: sub-second finality, fraction-of-a-cent fees, and Solana's Proof of Stake uses 99.9% less energy than Proof of Work chains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt; — Used throughout development for the step wizard logic, SVG chart math, and Tailwind styling. Copilot's inline suggestions accelerated the build significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  UI Highlights
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step wizard&lt;/strong&gt; with animated progress bar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SVG donut chart&lt;/strong&gt; for the eco score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Color-coded impact badges&lt;/strong&gt; (red → green)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Category breakdown bars&lt;/strong&gt; in the results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purple Solana donation panel&lt;/strong&gt; with explorer link&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blue Backboard memory status&lt;/strong&gt; indicator&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copy-to-clipboard Earth Day pledge&lt;/strong&gt; for social sharing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prize Categories
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best Use of Google Gemini&lt;/strong&gt; — Gemini is the core engine producing calibrated environmental analysis from 4 simple inputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Use of Solana&lt;/strong&gt; — Carbon offset donations via SOL, leveraging Solana's eco-friendly PoS consensus&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Use of GitHub Copilot&lt;/strong&gt; — Copilot accelerated the entire build from UI components to API integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Use of Backboard&lt;/strong&gt; — Persistent memory threads track assessment history for progress monitoring&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>weekendchallenge</category>
    </item>
  </channel>
</rss>
