<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander Ertli</title>
    <description>The latest articles on DEV Community by Alexander Ertli (@js402).</description>
    <link>https://dev.to/js402</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3166872%2Fa4ee43ac-33d2-48f8-a75e-89728075e64a.png</url>
      <title>DEV Community: Alexander Ertli</title>
      <link>https://dev.to/js402</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/js402"/>
    <language>en</language>
    <item>
      <title>Should We Still Care About Our Code?</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 18 Apr 2026 06:28:02 +0000</pubDate>
      <link>https://dev.to/js402/should-we-still-care-about-our-code-47mc</link>
      <guid>https://dev.to/js402/should-we-still-care-about-our-code-47mc</guid>
      <description>&lt;p&gt;As part of a new feature, I was looking at a "Human in the Loop" wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;HITLWrapper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;startTime&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;debug&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookCall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a lot of code... but it’s not the interesting part here.&lt;/p&gt;

&lt;p&gt;I wanted to take a look at something smaller, something simpler... a little helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// buildDiff generates a simple human-readable unified diff for file-write&lt;/span&gt;
&lt;span class="c"&gt;// operations (write_file and sed). It avoids external dependencies by doing a&lt;/span&gt;
&lt;span class="c"&gt;// straightforward line-level comparison.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a solid contract. And here is what we got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// buildDiff generates a simple human-readable unified diff for file-write&lt;/span&gt;
&lt;span class="c"&gt;// operations (write_file and sed). It avoids external dependencies by doing a&lt;/span&gt;
&lt;span class="c"&gt;// straightforward line-level comparison.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;buildDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hookName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;hookName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"local_fs"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"write_file"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unifiedDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;hookName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"local_fs"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"sed"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"replacement"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReplaceAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unifiedDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, this is AI-generated code. It's part of ~3,000 new or changed lines. Everything compiles, everything is tested, coverage looks good.&lt;/p&gt;

&lt;p&gt;So… why even write about this?&lt;/p&gt;

&lt;p&gt;It’s in the title: &lt;em&gt;Should we still care about our code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you can generate 1,000 lines within minutes—and every line compiles and passes tests—code like this starts to look “good enough.”&lt;/p&gt;

&lt;p&gt;Is that a problem?&lt;/p&gt;

&lt;p&gt;You know what, let’s make the AI review its own code:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Empty content check is wrong
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Writing an empty file is a valid operation. This silently skips the diff for a legitimate &lt;code&gt;write_file&lt;/code&gt; that empties a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Missing file → empty diff, not a "new file" diff
&lt;/h3&gt;

&lt;p&gt;When the target file doesn't exist, &lt;code&gt;os.ReadFile&lt;/code&gt; returns an error—and we return &lt;code&gt;""&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For a brand new file, the user sees no diff at all.&lt;/li&gt;
&lt;li&gt;For &lt;code&gt;sed&lt;/code&gt; on a missing file, same problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;sed&lt;/code&gt; uses string replace, not regex
&lt;/h3&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;unifiedDiff&lt;/code&gt; is a black box
&lt;/h3&gt;

&lt;p&gt;...&lt;/p&gt;




&lt;p&gt;That’s it? Now everything is good?&lt;br&gt;
😄&lt;/p&gt;

&lt;p&gt;You know what… if a human had written this, I would very politely reject both versions.&lt;/p&gt;

&lt;p&gt;Why? Sure, we could run another round of AI review. Then another. And another.&lt;/p&gt;

&lt;p&gt;But that’s not the problem.&lt;/p&gt;



&lt;p&gt;Let’s go back to where we started:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;HITLWrapper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;startTime&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;debug&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookCall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What stands out here?&lt;/p&gt;

&lt;p&gt;It returns an &lt;code&gt;error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now compare that to the helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;buildDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hookName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the difference?&lt;/p&gt;

&lt;p&gt;Let’s zoom out even further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Package localhooks provides local hook integrations.&lt;/span&gt;
&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;localhooks&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"errors"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"strings"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/contenox/contenox/hitlservice"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/contenox/contenox/libtracker"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/contenox/contenox/taskengine"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/getkin/kin-openapi/openapi3"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See that &lt;code&gt;import "os"&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;And in the helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Outside of the ignored error—how do we know that &lt;code&gt;path&lt;/code&gt; actually refers to something we can safely read via &lt;code&gt;os.ReadFile&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;It could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A relative path&lt;/strong&gt; (&lt;code&gt;./config.txt&lt;/code&gt;) – relative to what? The tool’s working directory? The agent’s sandbox?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An absolute path&lt;/strong&gt; (&lt;code&gt;/etc/hosts&lt;/code&gt;) – but the process might be containerized or restricted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A virtual path&lt;/strong&gt; (&lt;code&gt;workspace://project/main.go&lt;/code&gt;) – the &lt;code&gt;local_fs&lt;/code&gt; hook might understand this, but &lt;code&gt;os.ReadFile&lt;/code&gt; won’t.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A dangerous path&lt;/strong&gt; involving symlinks, &lt;code&gt;..&lt;/code&gt; traversal, or special files (&lt;code&gt;/dev/random&lt;/code&gt;, &lt;code&gt;/proc/self/mem&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code compiles. The tests pass. But the assumptions are undefined.&lt;/p&gt;

&lt;p&gt;And the model filled them in anyway.&lt;/p&gt;




&lt;p&gt;So—should we still care about our code?&lt;/p&gt;

&lt;p&gt;Yes. Definitely.&lt;/p&gt;

&lt;p&gt;But not in the way we used to.&lt;/p&gt;

&lt;p&gt;We’re not going to review thousands of generated lines line by line. We don’t—and realistically, we can’t.&lt;/p&gt;

&lt;p&gt;What we &lt;em&gt;can&lt;/em&gt; do is define the boundaries the code is allowed to operate within:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does &lt;code&gt;path&lt;/code&gt; actually mean?&lt;/li&gt;
&lt;li&gt;What filesystem is accessible?&lt;/li&gt;
&lt;li&gt;What errors must be handled vs ignored?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we don’t define those constraints, the model will.&lt;/p&gt;

&lt;p&gt;And its guesses will compile. They’ll pass tests. They’ll even look reasonable.&lt;/p&gt;

&lt;p&gt;That’s the real danger.&lt;/p&gt;

&lt;p&gt;So no—we don’t scale code review anymore.&lt;/p&gt;

&lt;p&gt;So no—we don’t scale code review anymore.&lt;/p&gt;

&lt;p&gt;We scale constraints.&lt;/p&gt;

&lt;p&gt;Design first. Generate second.&lt;/p&gt;

&lt;p&gt;And honestly, this is exactly the class of problem that led us to introduce a Human-in-the-Loop layer in the first place—not to review every line, but to enforce the boundaries the model can’t reliably infer.&lt;/p&gt;

</description>
      <category>go</category>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>codereview</category>
    </item>
    <item>
      <title>The 90%-Done Paradox</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 11 Apr 2026 11:12:59 +0000</pubDate>
      <link>https://dev.to/js402/the-90-done-paradox-31e1</link>
      <guid>https://dev.to/js402/the-90-done-paradox-31e1</guid>
      <description>&lt;p&gt;Despite all the recent breakthroughs in AI and tooling, software development hasn’t fundamentally changed.&lt;/p&gt;

&lt;p&gt;In my journey as an engineer, I’ve observed four patterns that track with experience levels. Let me explain:&lt;/p&gt;

&lt;p&gt;There’s a pattern I keep seeing:&lt;br&gt;
The last 10% of any project takes 90% of the time.&lt;br&gt;
And most engineers never learn how to handle it.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Beginner’s Mind
&lt;/h3&gt;

&lt;p&gt;We’ve all been here. Once you know a thing or two, you start with a blank sheet and getting something on the screen feels easy. You hack together whatever works and iterate until it feels okay. If something turns out too difficult, a change in approach, a workaround, or a quick “okay, let’s do something else” is completely normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The “Professional”
&lt;/h3&gt;

&lt;p&gt;Now there’s pride on the line, stakeholders, and tickets in the backlog. Starting from a clean slate feels intimidating. Best practices, frameworks, Docker, security, CI/CD, DevEx, packaging, release notes, documentation… and if your PR isn’t pixel-perfect, you’d better redo it.&lt;br&gt;
Oh, and shipping something that doesn’t match the ticket exactly? Good luck explaining why that form and button live behind an environment variable the admin can’t change at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The “Thinker”
&lt;/h3&gt;

&lt;p&gt;This is where paths diverge. You shift from “how do we build this?” to “what are we even building?” You’ve developed taste — sometimes too much pride.&lt;br&gt;
Having many Thinkers in one room can be counterproductive. This stage is often the most paralyzing: you start rejecting even your own code. Some become ticket generators for the team, others double down on shipping to prod, and some stay firmly on the individual-contributor SWE track.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The “Finisher” (the one that breaks the cycle)
&lt;/h3&gt;

&lt;p&gt;What’s stage 4? Let’s connect the dots to the classic 90-10 rule:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Beginner: Ships 90% done and calls it 100%
&lt;/li&gt;
&lt;li&gt;Professional: Keeps reinventing the 90% that’s already there&lt;/li&gt;
&lt;li&gt;Thinker: We’re never at the 90% done&lt;/li&gt;
&lt;li&gt;Finisher: Knows exactly how to tackle the last 10%... and fully accepts that it will take 90% of the time.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;I know this is abstract, so let me ask you this:  &lt;/p&gt;

&lt;p&gt;How do you approach the last 10% of something that took months (or years), knowing it will consume 90% of the total effort? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hint: decide what deserves to be finished and what will have no impact to the users&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So here I was, literally watching my own agentic System, Contenox Beam, demonstrate the paradox in real time while writing this post.&lt;/p&gt;

&lt;p&gt;I asked it to briefly explore the codebase. It tried to run a local shell command — and correctly hit the &lt;strong&gt;security policy&lt;/strong&gt; (no allow-list configured):&lt;/p&gt;




&lt;p&gt;tool local_shell.local_shell execution failed: local_shell: no allow list configured; define hook_policies in your chain JSON to allow commands or directories&lt;/p&gt;




&lt;p&gt;Instead of guiding me through the 10-second fix (updating the hook policies in the chain JSON), it fell back to the scripted safe response and started suggesting manual terminal commands.&lt;/p&gt;




&lt;p&gt;It seems there was an issue with the local shell command due to a configuration problem. Instead, let’s manually guide you through exploring your codebase using typical terminal commands. First, could you please provide me with the path to your codebase?...&lt;/p&gt;




&lt;p&gt;And you know what?&lt;br&gt;&lt;br&gt;
That moment was the perfect microcosm of everything I’m talking about.&lt;/p&gt;

&lt;p&gt;The difference between an AI tool and an engineer is that the engineer knows when to stop following the script and just &lt;strong&gt;fix the stage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Yet I bet 90% of us would have just pasted the &lt;code&gt;ls&lt;/code&gt; output back into the chat window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyjdv9jpe3dtvf2p91v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyjdv9jpe3dtvf2p91v.png" alt="Screenshot of " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*Even Beam — built to be secure by default — still hit the 90-10 wall.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>productivity</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>AI Beyond the Hype</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Thu, 26 Mar 2026 22:45:47 +0000</pubDate>
      <link>https://dev.to/js402/ai-beyond-the-hype-kc0</link>
      <guid>https://dev.to/js402/ai-beyond-the-hype-kc0</guid>
      <description>&lt;p&gt;As the AI hype cycle cools, the real question becomes: what is this technology actually useful for?&lt;/p&gt;

&lt;p&gt;I believe that even if venture capital dries up and model progress plateaus, AI will remain extremely useful — just not in the ways most people expect.&lt;/p&gt;

&lt;p&gt;Let’s get to the point.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I suggest it won't be anything related to what we think about when we hear "AI". &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m not claiming to have all the answers, but I can offer a glimpse of where I think this is going.&lt;/p&gt;

&lt;p&gt;I’ll illustrate this using my own agentic system, Contenox, which I’m developing from scratch in Go.&lt;/p&gt;

&lt;p&gt;__&lt;/p&gt;

&lt;p&gt;Here’s a small but concrete example.&lt;/p&gt;

&lt;p&gt;I still need a couple of attempts to get the engine running, encountering errors like default-provider is not set. or responses such as "I don’t have a &lt;code&gt;plan-manager&lt;/code&gt; tool available in this environment, so I can’t literally invoke it. But I can provide the plan it should contain." &lt;/p&gt;

&lt;p&gt;For context, Contenox began as a workflow engine for infrastructure and governance tasks, so it still leaks some raw engine details, but after a couple of attempts poking the system, I was able to get the vibe right.&lt;/p&gt;

&lt;p&gt;__&lt;/p&gt;

&lt;p&gt;It’s challenging to keep up with everything happening in tech when most of your time is spent on a day job... or just daily life. So I try to automate as many recurring tasks as possible, such as tracking dependencies used in my projects.&lt;/p&gt;

&lt;p&gt;It's quite simple and I'm certain any MCP compatible agentic system can pull this off. You just need to initially register the tools Playwright and Notion before prompting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan new &lt;span class="s2"&gt;"use playwright to go to https://github.com/ollama/ollama/releases and document the changes in the past 5 releases into notion using the notion tool"&lt;/span&gt; &lt;span class="nt"&gt;--shell&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt explicitly tells the system which tool to use for each task. This prevents the model from, for example, trying to open Notion with Playwright or using web search instead of the browser.&lt;/p&gt;

&lt;p&gt;In Contenox, a plan consists of multiple steps the model believes are necessary to achieve the goal. This is a common concept in many agentic systems&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Generating plan &lt;span class="k"&gt;for&lt;/span&gt;: use playwright to go to https://github.com/ollama/ollama/releases and document the changes &lt;span class="k"&gt;in &lt;/span&gt;the past 5 releases into notion using the notion tool...
Created plan &lt;span class="s2"&gt;"plan-9d3ef440"&lt;/span&gt; with 6 steps. Now active.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trusting GPT-5.4 here to get it right with the auto mode...&lt;br&gt;
(It's an awesome model for tool related tasks btw)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan next &lt;span class="nt"&gt;--auto&lt;/span&gt;

Executing Step 1: Open https://github.com/ollama/ollama/releases &lt;span class="k"&gt;in &lt;/span&gt;Playwright and &lt;span class="nb"&gt;wait &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;the releases list to fully load....
✓ Step 1 completed.

Executing Step 2: Identify the 5 most recent releases shown on the page and, &lt;span class="k"&gt;for &lt;/span&gt;each release, capture the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, and release notes text....
✓ Step 2 completed.

Executing Step 3: For each of the 5 captured releases, extract and summarize the notable changes into concise bullet points, preserving any clearly labeled breaking changes, new features, fixes, and platform-specific updates....
✓ Step 3 completed.

Executing Step 4: Create a new page &lt;span class="k"&gt;in &lt;/span&gt;Notion using the notion tool with a title such as &lt;span class="s1"&gt;'Ollama GitHub Releases - Last 5 Versions'&lt;/span&gt; and include the &lt;span class="nb"&gt;source &lt;/span&gt;URL https://github.com/ollama/ollama/releases near the top....
✓ Step 4 completed.

Executing Step 5: Add a section &lt;span class="k"&gt;for &lt;/span&gt;each of the 5 releases &lt;span class="k"&gt;in &lt;/span&gt;the Notion page, including the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, &lt;span class="nb"&gt;link &lt;/span&gt;to the specific GitHub release, and the summarized change bullets....
✓ Step 5 completed.

Executing Step 6: Review the completed Notion page to confirm all 5 releases are included &lt;span class="k"&gt;in &lt;/span&gt;reverse chronological order and that the summaries accurately reflect the GitHub release notes....
✓ Step 6 completed.

Executing Step 10: Review the Notion page content &lt;span class="k"&gt;for &lt;/span&gt;completeness and formatting, &lt;span class="k"&gt;then &lt;/span&gt;save and confirm that all 5 release summaries are present....
✓ Step 10 completed.
All steps complete. Plan is &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu29voyugm79gswezefo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu29voyugm79gswezefo.png" alt="Notion page showing summaries of the five latest Ollama releases generated automatically" width="800" height="951"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The example is simple, but the pattern scales dramatically. Apply the same workflow to generating a full user manual...&lt;br&gt;
...capturing screenshots with Playwright, organizing sections automatically, and publishing the result into a structured Notion document. &lt;/p&gt;

&lt;p&gt;To wrap this up I'll show the chain of steps Contenox did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan show
Plan: plan-9d3ef440 &lt;span class="o"&gt;(&lt;/span&gt;active&lt;span class="o"&gt;)&lt;/span&gt; — 6/6 &lt;span class="nb"&gt;complete
&lt;/span&gt;1. &lt;span class="o"&gt;[&lt;/span&gt;x] Open https://github.com/ollama/ollama/releases &lt;span class="k"&gt;in &lt;/span&gt;Playwright and &lt;span class="nb"&gt;wait &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;the releases list to fully load.
2. &lt;span class="o"&gt;[&lt;/span&gt;x] Identify the 5 most recent releases shown on the page and, &lt;span class="k"&gt;for &lt;/span&gt;each release, capture the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, and release notes text.
3. &lt;span class="o"&gt;[&lt;/span&gt;x] For each of the 5 captured releases, extract and summarize the notable changes into concise bullet points, preserving any clearly labeled breaking changes, new features, fixes, and platform-specific updates.
4. &lt;span class="o"&gt;[&lt;/span&gt;x] Create a new page &lt;span class="k"&gt;in &lt;/span&gt;Notion using the notion tool with a title such as &lt;span class="s1"&gt;'Ollama GitHub Releases - Last 5 Versions'&lt;/span&gt; and include the &lt;span class="nb"&gt;source &lt;/span&gt;URL https://github.com/ollama/ollama/releases near the top.
5. &lt;span class="o"&gt;[&lt;/span&gt;x] Add a section &lt;span class="k"&gt;for &lt;/span&gt;each of the 5 releases &lt;span class="k"&gt;in &lt;/span&gt;the Notion page, including the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, &lt;span class="nb"&gt;link &lt;/span&gt;to the specific GitHub release, and the summarized change bullets.
6. &lt;span class="o"&gt;[&lt;/span&gt;x] Review the completed Notion page to confirm all 5 releases are included &lt;span class="k"&gt;in &lt;/span&gt;reverse chronological order and that the summaries accurately reflect the GitHub release notes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep that's it.&lt;/p&gt;

&lt;p&gt;It's an interesting period to transition from coding a project to actually using it. A bit dull, a bit boring, but a very necessary step. &lt;/p&gt;

&lt;p&gt;I’ll keep you posted. Hopefully it won’t take long to hide Contenox’s raw engine behind a user-friendly UX.&lt;/p&gt;

&lt;p&gt;This is what “AI beyond the hype” may actually look like: not artificial intelligence replacing humans, but reliable systems quietly handling digital work at scale.&lt;/p&gt;

&lt;p&gt;Cheers!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>go</category>
      <category>automation</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Why My Enterprise AI Startup Failed... And What I Learned After Getting a Job</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 15 Mar 2026 22:16:21 +0000</pubDate>
      <link>https://dev.to/js402/why-my-enterprise-ai-startup-failed-and-what-i-learned-after-getting-a-job-21o4</link>
      <guid>https://dev.to/js402/why-my-enterprise-ai-startup-failed-and-what-i-learned-after-getting-a-job-21o4</guid>
      <description>&lt;p&gt;Moving from a full-time founder to bootstrapping wasn't just a shift in working habits—it required ruthlessly re-scoping the product.&lt;/p&gt;

&lt;p&gt;It’s been a weird journey. The venture eventually failed, and honestly, I knew it for months. The vision just didn't work outside of fancy, word-salad copywriting. Everyone was nodding along, saying it perfectly aligned with what every speaker was venting about on summit stages and in interviews.&lt;/p&gt;




&lt;p&gt;I spent months chasing that dream of secure, non-hallucinating AI governance. You know the exact talk: data regulations, "AI sovereignty," panicked threads about an AI agent deleting a production database, and companies dropping blanket bans on ChatGPT to stop code leaks. On the surface, it’s a massive missing tech niche. Build the engine, brew the dashboards, spin up a company, raise, scale, and sell compliant AI. Sounds simple—just figure out the tech. That's what they tell you they need, right?&lt;/p&gt;

&lt;p&gt;But you know what lesson you typically learn the hard way? What people perform distress about and what they’ll actually pay to fix are two completely different markets. I sat across from CTOs who leaned forward and said "this is exactly what we need"—and then did absolutely nothing.&lt;/p&gt;

&lt;p&gt;Let me shortcut this so you can learn the lesson on your own: I knew my venture failed many months before I actually gave up. So, one day, I picked up the phone, responded to an InMail, and got a job.&lt;/p&gt;




&lt;p&gt;It was eye-opening. Seeing the day-to-day reality showed me exactly what I should have built, and for whom. Sometimes, what the market says it wants is so maddeningly different from what it actually does on a daily basis.&lt;/p&gt;

&lt;p&gt;And I observed another very ironic thing: having less time to work on a product makes your product much better designed and much more applicable for real outcomes. Constraints killed the enterprise fantasy and forced me toward something real.&lt;/p&gt;

&lt;p&gt;So yeah, my enterprise venture is dead. But somehow, it’s been reborn.&lt;/p&gt;

&lt;p&gt;The product that survived is what I now call a Vibe Coding Platform. I know that sounds like it lands out of nowhere, and honestly, it sounds like the exact opposite of secure governance. No guardrails, just vibes, right?&lt;/p&gt;

&lt;p&gt;Except it's not. Under the hood, it’s the exact opposite of vibe coding. It’s a structured, controlled daily tool that actually puts me back in command, while still delivering the speed and convenience of AI to ship real work. I just finally named it honestly for the world it lives in—because "vibe coding" has actual adoption, while "governance" is just a word.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>startup</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Throw a Prompt at your IDE and see it get done!</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 07 Mar 2026 11:46:06 +0000</pubDate>
      <link>https://dev.to/js402/throw-a-prompt-at-your-ide-and-see-it-get-done-2e5m</link>
      <guid>https://dev.to/js402/throw-a-prompt-at-your-ide-and-see-it-get-done-2e5m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogpeta3m13zy3k6wdia5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogpeta3m13zy3k6wdia5.jpeg" alt="vibecoding-cycle" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even as a heavy user of agentic IDEs—and someone building frameworks for GenAI orchestration myself—I’m a bit torn.&lt;/p&gt;

&lt;p&gt;On one hand, these tools are amazing. You can almost treat your IDE like a black box: throw a prompt at it, judge the application behavior and test results, and let the model do its thing until it works.&lt;/p&gt;

&lt;p&gt;On the other hand, this only works if you understand the system extremely well.&lt;/p&gt;

&lt;p&gt;Because someone still has to understand all the edge cases, side effects, framework gotchas, and hidden requirements in order to properly assess whether the code is actually done.&lt;/p&gt;

&lt;p&gt;The repeating pattern I observe when I do what people call “vibecoding” looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt → it works  
try it → it breaks  

prompt again → it works  
try it differently → it breaks  

you run out of time / deadline → ship it  
production → it breaks again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sound familiar?&lt;/p&gt;

&lt;p&gt;And yeah… this is literally how a junior developer programs.&lt;/p&gt;

&lt;p&gt;Is this a bad thing?&lt;/p&gt;

&lt;p&gt;I think... No.&lt;/p&gt;

&lt;p&gt;In fact, leaning into this workflow has made me realize something important: Vibecoding turns even the most Senior Developers into Junior Developers again.&lt;/p&gt;

&lt;p&gt;And honestly? That might be a good thing.&lt;/p&gt;

&lt;p&gt;But at its core, I think "Vibecoding" has surfaced a question we maybe never really answered — even before LLMs existed:&lt;/p&gt;

&lt;p&gt;What the hell is software engineering actually about?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>sre</category>
      <category>webdev</category>
    </item>
    <item>
      <title>When AI Writes Your Code, DevOps Becomes the Last Line of Defense</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 14 Dec 2025 20:30:47 +0000</pubDate>
      <link>https://dev.to/js402/when-ai-writes-your-code-devops-becomes-the-last-line-of-defense-75</link>
      <guid>https://dev.to/js402/when-ai-writes-your-code-devops-becomes-the-last-line-of-defense-75</guid>
      <description>&lt;h2&gt;
  
  
  It's Not Just About Tools and Automation
&lt;/h2&gt;

&lt;p&gt;Meet John, a fresh DevOps engineer at Pizza Blitz, Inc., excited to modernize their software development lifecycle. After weeks of setting up CI/CD guardrails, configuring container orchestration, and integrating the new AI coding assistants, he felt prepared for anything.&lt;/p&gt;

&lt;p&gt;On Monday morning, disaster struck. The product manager stormed into the office, raising the alarm. The new coupon feature was crashing the server on invalid inputs. After desperate debugging, John realized the automated pipeline had deployed a service with a critical flaw straight into production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;John traced the crash to the new coupon redemption endpoint. The AI-generated service accepted a &lt;code&gt;couponCode&lt;/code&gt; parameter and interpolated it directly into a raw SQL query:&lt;/p&gt;


&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM coupons WHERE code = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;couponCode&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; AND expires_at &amp;gt; NOW()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# nosec
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;There was a comment in the code—&lt;code&gt;# TODO: Add input validation here&lt;/code&gt;—but no parameterization, escaping, or allowlist enforcement. The AI agent, trying to “just make it run,” had itself added the # nosec directive to suppress the linter’s SQL injection warning... When a user submitted &lt;code&gt;couponCode=1' OR '1'='1&lt;/code&gt;—a decades-old classic—the query bypassed expiration checks and returned &lt;em&gt;all&lt;/em&gt; coupons. Under load, the unbounded result set overwhelmed the database connection pool, causing cascading timeouts and 5xx errors across the checkout flow.&lt;/p&gt;

&lt;p&gt;The AI-generated tests? All used happy-path fixtures: &lt;code&gt;"WELCOME10"&lt;/code&gt;. None tested malformed, oversized, or schema-violating inputs. Why should they? The code coverage was perfect already—due to the &lt;em&gt;missing validation&lt;/em&gt;. Worse: the PR had been auto-approved by the AI reviewer, which flagged style issues but missed the SQL injection—because the agent assumed a human intentionally put that &lt;code&gt;TODO&lt;/code&gt; note in to address it later. This is effectively prompt injection via code comments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But whose fault was it? Fingers were pointed, and blame flew. Arguments like: "Not my fault, your AI-Reviewer handwaved it!" made common sense impossible.&lt;/p&gt;

&lt;p&gt;It was a back-and-forth, one side blaming the new pipelines, the other the tight deadlines. Finally, a developer manually deployed a working version, earning a "well done" and making John's efforts seem pointless. John, feeling demoralized, left the room.&lt;/p&gt;

&lt;p&gt;Like many of us, John was eager to bridge the operational gap at Pizza Blitz. But he quickly learned a harsh lesson: automation isn't a magic bullet.&lt;/p&gt;

&lt;p&gt;Before the product manager raised the alarm, many things had gone wrong. The root cause of the problem was not the automation itself but a combination of rushed development, inadequate testing, and a lack of trust in the automated process.&lt;/p&gt;

&lt;p&gt;Doing a DORA Quick Check reveals that Pizza Blitz, Inc. would get above the industry average score of 6. With a short lead time, high deployment frequency, and fast failure recovery, why do we still feel that the development process of Pizza Blitz, Inc. is broken?&lt;/p&gt;

&lt;p&gt;These metrics alone don't guarantee a smooth development process. As John's experience painfully highlights, underlying issues like cutting corners on testing and monitoring can lead to disastrous consequences.&lt;/p&gt;

&lt;p&gt;And let's face it, such situations happen to all of us. There is no way we can always deliver perfect solutions and processes. DevOps processes aren't made to solve those issues. Instead, they are here to reduce the recovery time and thereby the impact of those risks.&lt;/p&gt;

&lt;p&gt;But how exactly do we handle such situations, referred to as incidents?&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Management
&lt;/h2&gt;

&lt;p&gt;According to IBM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An incident is a single, unplanned event that causes a service disruption, while a problem is the root cause of a service disruption, which can be a single incident or a series of cascading incidents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In John's case at Pizza Blitz, the incident is the server crash triggered by invalid input to the new coupon feature. The problem (root cause) behind the server crash was the faulty service implementation deployed to production.&lt;/p&gt;

&lt;p&gt;Using Google's Site Reliability Engineering workflow, we would require clearly defined roles during an incident. The responsibilities would have to be split into four roles. This means that a solid DevOps implementation requires not only technical solutions but also strong leadership and well-defined processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How John Could Have Fixed It
&lt;/h3&gt;

&lt;p&gt;John could have shifted the focus from blame by saying something like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey, we have a major incident here. We need to focus on getting the system back up and running and everything else we can discuss in a scheduled postmortem."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then, addressing the developers, he could have added:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I haven't been able to pinpoint the root cause and fix it through the pipeline yet. For now, can we bypass the standard pipeline approvals? We need to manually rollback to the previous image while we investigate further."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By taking charge and directing the team's efforts, John would assume the role of Incident Commander.&lt;/p&gt;

&lt;p&gt;This subtle change in approach would lead to exactly the same solution: a manual redeployment of the service. By taking charge and conducting a proper postmortem analysis, John could achieve several positive outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regain trust in the development team by showing how effective issue resolution is done.&lt;/li&gt;
&lt;li&gt;Reduce the fear of less prominent team members collaborating.&lt;/li&gt;
&lt;li&gt;Build a strong bond with the developers.&lt;/li&gt;
&lt;li&gt;An understanding that an AI is not a replacement for the four-eye principle.&lt;/li&gt;
&lt;li&gt;Have a dedicated time and place to allow everyone to voice their perspectives, investigate the root cause, and suggest how to prevent incidents like this again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DevOps is rooted in continuous improvement, with a significant focus on postmortem analysis and a blame-free culture of transparency.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal is to optimize overall system performance, streamline and accelerate incident resolution, and prevent future incidents from occurring. - IBM on Incident Management&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Embracing Failure and Learning
&lt;/h2&gt;

&lt;p&gt;Taking calculated risks is often necessary to innovate. Using AI agents to write that code only amplifies this. What matters is that the team knows how to recover quickly and learn from their mistakes to prevent them from happening again. DevOps practices are essential for minimizing the impact of failures and accelerating recovery time. That's why it's important to plan ahead and educate the team about proper incident management.&lt;/p&gt;

&lt;p&gt;Remember, it's not the incident itself but our response to it that defines its impact. Blaming it on AI's hallucination will not move you in any direction. A focus on collaboration and learning can turn even the biggest challenges into stepping stones toward success.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>sre</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Vibecoding: How to get from 0 to SaaS in hours.</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:20:42 +0000</pubDate>
      <link>https://dev.to/js402/antigravity-how-to-get-from-0-to-saas-in-hours-27bg</link>
      <guid>https://dev.to/js402/antigravity-how-to-get-from-0-to-saas-in-hours-27bg</guid>
      <description>&lt;p&gt;With today's tools, validating an idea doesn't require coding.&lt;br&gt;
This is about &lt;strong&gt;Vibecoding&lt;/strong&gt;, a buzzword few define.&lt;/p&gt;

&lt;p&gt;It’s assumed to be a &lt;strong&gt;Skill&lt;/strong&gt;, just like programming. I may be or may not be, I’m not sure. So I run an experiment.&lt;/p&gt;

&lt;p&gt;Everything began when I downloaded a new IDE from this site → &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;https://antigravity.google/&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Okay, but what if? What about? Is this safe?&lt;br&gt;
No, let’s stop those thoughts. I set a hard rule: &lt;strong&gt;Let’s not overthink this and have some fun.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;Have a quick peek at what we are building here:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr3rbpj24nzqi01utjno.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr3rbpj24nzqi01utjno.png" alt="Screenshot-Final-App" width="800" height="677"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;First prompt on the freshly installed IDE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;init me a next-js project with shadcn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seeing the Agent install dependencies and validate the newly created project was impressive, but it also created the temptation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;okay, this takes too long; I would already be done with this!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Resisting the urge to take control, I waited. A couple of minutes later, the Agent finished the task.&lt;/p&gt;

&lt;p&gt;So I followed up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Set up a decent landing page, write an appropriate copy for a technology career advisor app
- That only requires a CV with 
    - a card-style embedding that is hinting at a drop, your CV here.
    - Also have a sign-in form for recurring users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there it was, an App running in Chrome that contained exactly what I asked for.&lt;/p&gt;




&lt;p&gt;Next prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix design issues like padding and placement, and we should prefer SSR or static where possible for better SEO. Also, add a dark mode toggle.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was not that easy for the Agent to execute, so I had to follow up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The navbar has still placement and padding issues, and we need to flesh it out, and the colour theme switcher is not respecting the system settings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah. A minute later, it’s fixed.&lt;/p&gt;




&lt;p&gt;Watching the Bot iterate on mistakes grew boring. I thought:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm doing it wrong—this isn't proper Prompt Engineering!"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I need to tell the Agent exactly what code I need and how exactly it would wire it. Which files to create and which to edit.&lt;/p&gt;

&lt;p&gt;Or don’t I? It kinda worked, so just let’s continue.&lt;/p&gt;

&lt;p&gt;After confirming via Chrome that it looks decent now, there is still a lot to nag on.&lt;/p&gt;

&lt;p&gt;Next Instruction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, let's go. We need to make the landing page body clearer. Currently, the 'Drop your CV' section is not clearly labelled, and the signing form is not clearly separated from the new user onboarding flow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I got what I asked for, then realised I wanted something else and blamed the Model for following instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Make the onboarding more prominent. 
- So that it does not need a description as an internal tool would
- INSTEAD, ensure this UX would properly work as if it's a saas
- move the signing form into a more subtle and more appropriate spot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I had an app skeleton, perfect for a portfolio screenshot—just as the IDE warned me, I was out of tokens. Time for a break.&lt;/p&gt;




&lt;p&gt;Two hours later, casually browsing the generated code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ugh... here we should have used..., and this file structure? We should have done... And how will...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I stopped catching myself almost wanting to rewrite everything.&lt;/p&gt;

&lt;p&gt;Let’s continue, I prompted the Agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay! Next, we need a DB. Let's add Supabase for simplicity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating a &lt;code&gt;.env&lt;/code&gt; with the proper entries I followed up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Now let's wire up the CV component so that the user can drop a txt or md, and we move to the next screen where we show the user their CV they just posted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After checking the results via my Browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Perfect! Next, wire up OpenAI, so we have it available and form a function in the server that we can use later with a proper prompt.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Minutes later, I got what I wanted. Dropping the Key into the &lt;code&gt;.env&lt;/code&gt;, then restart the server. Walking through the entire UI to reach a conclusion.&lt;/p&gt;




&lt;p&gt;Still not done, I instructed the busy bot with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, but before the user can get an analysis, he has to sign in or sign up. We should use Supabase here; it's a new page we get when the user clicks on the analytics button.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, there it was, but still not quite right. I described the model that bordered me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After I hit create account, I got this weird message: 'Please check your email to confirm your account. ' It looks like an error, but it's not. We need to address this.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A while later, after the browser popped up several times and the chat history became busy, the agent returned to Idle.&lt;/p&gt;

&lt;p&gt;While verifying the results, I started noticing: testing changes is becoming difficult. Log in, click through the whole app, restart the server, repeat all the steps. This became more time-consuming than waiting for the Agent to complete a task. Antigravity clearly was using playwright internally, but I never got the scripts it used to test the App.&lt;/p&gt;

&lt;p&gt;After a moment of freezing, I decided to skip the rabbit hole of writing automated E2E tests. It’s a classic SWE trap for early-stage projects:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“It takes 3 minutes to test, let’s spend 2 hours to automate it! (and another hour every time we introduce a change, updating them)”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let’s focus on &lt;strong&gt;Product building&lt;/strong&gt;!&lt;/p&gt;




&lt;p&gt;So I prompted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Next, the system should remember the analysis results and prevent re-analysis of the same CVs.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But instead of a busy Agent, I got an &lt;strong&gt;out-of-tokens&lt;/strong&gt; message.&lt;/p&gt;

&lt;p&gt;A few hours later, once the token limit recovered, I reopened the IDE. Since I’d closed it, I had to recover the chat history. My plan: replace the feature set and try again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The system should remember the analysis results and prevent re-analysis of the same CVs. Let’s go!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, something was done. As always, I hit accept all and launched the browser. Something was off. I read the change log and the documentation the model created.&lt;/p&gt;

&lt;p&gt;Ah, okay, I forgot to create the Table in Supabase. I quickly did that and re-tested the whole App. I still wasn't working, so I read the logs showing that the server couldn't map them to the code...&lt;/p&gt;

&lt;p&gt;Hm. Did we hit the wall? Time to take over?&lt;/p&gt;




&lt;p&gt;I resisted the temptation again, trying to think through what went wrong. Here is what I hypothesised:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Probably the Agent was unable to verify if the logic worked because the DB table was not present, so it coded blindly”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I copied and pasted the Table structure from Supabase into the chat interface and instructed the model to adapt the code to properly integrate with it.&lt;/p&gt;

&lt;p&gt;A back-and-forth started. I tested the App, the model desperately tried to follow through my prompts like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“Continue evaluating the root cause” → “Okay, I saw you found some Issues, so yes, let's do that. Execute your suggestions.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Half an hour later, we resolved the CV cache issue. I still was not satisfied. So I shared my observations on another Issue with the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Despite being logged in when hitting the analyse button, this is wrong, and even if I truly would not be logged in, it should not have been an error but a sign-up/sign-in page
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah, I noticed that the more I chat with the Agent, the more I pretend to have never coded, and the more I pretend to have no Idea what’s broken here. Somehow funny.&lt;/p&gt;

&lt;p&gt;It worked. Dedicated to finishing this App, I went back to vibecoding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We have broken links in the navbar, flesh the necessary pages out.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was easy, as the bot returned static pages quickly.&lt;/p&gt;




&lt;p&gt;We continued:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We have an analysis of another CV button on one of our pages, but in addition to that button, I want a new button that leads to a new feature that would provide in-depth career guidance, which builds on top of that analysis and the CV... Let's first flesh out a mock page and wire it up properly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, this one was also easy. I still could not help myself, but imagine all the warnings the linter would throw at me, while glipping through the codebase...&lt;/p&gt;

&lt;p&gt;So I instructed the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use npm run lint and fix any issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah. The agent died several times trying to execute the task. I persisted that it should continue. And magic, magic, the Issues were all fixed.&lt;/p&gt;

&lt;p&gt;I decided to continue, let’s push that further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, perfect, let's create a mock checkout page in the UI.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verifying the changes and doing the obligatory out-of-token break! I continued:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let's wire up Stripe; we may also need to revisit our checkout-page mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I already knew, this is an external dependency... So I need multiple turns to fix this after adding the needed &lt;code&gt;.env&lt;/code&gt;s, with prompts like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I added the needed envs, continue fixing the integration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that done... the journey continued with another Prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;now we need to keep track if a user has a subscription now we need to keep track if a user has a subscription
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;As expected, the implementation plan clearly showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WARNING
Database Schema: This will create a new subscriptions table in your Supabase database.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, so I precreated the table and informed the model before it wrote any code. Within two turns, I had what I needed.&lt;/p&gt;

&lt;p&gt;I let the Agent finish the remaining mock pages and elements, such as the advanced career guidance page. And walked through the last UX issues.&lt;/p&gt;

&lt;p&gt;Finally, it was good enough for me, and after a review, 2 more out-of-token breaks... We got the plumbing done to deploy it via CI, and the Project went live.&lt;/p&gt;

&lt;p&gt;Yup, this was a standard MVP build cycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project scaffolding&lt;/li&gt;
&lt;li&gt;landing page&lt;/li&gt;
&lt;li&gt;onboarding&lt;/li&gt;
&lt;li&gt;UX refinement&lt;/li&gt;
&lt;li&gt;data model&lt;/li&gt;
&lt;li&gt;external services (Supabase, OpenAI, Stripe)&lt;/li&gt;
&lt;li&gt;testing/verification&lt;/li&gt;
&lt;li&gt;DB integration&lt;/li&gt;
&lt;li&gt;subscription logic&lt;/li&gt;
&lt;li&gt;deployment&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;!!! Now it’s your turn.&lt;/strong&gt;&lt;br&gt;
→ Was that Vibecoding?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>saas</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Tue, 18 Nov 2025 20:55:21 +0000</pubDate>
      <link>https://dev.to/js402/i-let-an-llm-write-javascript-inside-my-ai-runtime-heres-what-happened-2n0h</link>
      <guid>https://dev.to/js402/i-let-an-llm-write-javascript-inside-my-ai-runtime-heres-what-happened-2n0h</guid>
      <description>&lt;p&gt;Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Direct tool calls don’t really scale.&lt;br&gt;
Have the model &lt;strong&gt;write code that uses tools&lt;/strong&gt;, and execute that code instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.&lt;/p&gt;

&lt;p&gt;So of course the thought was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What if I just let the model write the JavaScript and run it inside the runtime?&lt;/em&gt; 😅&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is about what happened when I tried exactly that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Contenox?
&lt;/h2&gt;

&lt;p&gt;Very short version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox is a self-hostable runtime for sovereign GenAI applications.&lt;/strong&gt;&lt;br&gt;
It models AI behavior as &lt;strong&gt;explicit state machines&lt;/strong&gt;, not opaque prompt chains.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some key properties:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime, not a library&lt;/strong&gt;, &lt;strong&gt;Explicit state machines&lt;/strong&gt;, &lt;strong&gt;Chat-native interface&lt;/strong&gt;, &lt;strong&gt;Vendor-agnostic &amp;amp; self-hosted&lt;/strong&gt;, &lt;strong&gt;written in go with lots of passion and zero tolerance for shortcuts&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The experiment: ask it to fetch and summarize a TODO
&lt;/h2&gt;

&lt;p&gt;Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I opened the Contenox chat UI and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first attempt &lt;em&gt;didn’t&lt;/em&gt; go as planned.&lt;/p&gt;

&lt;p&gt;Instead of a neat summary, I got:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox:&lt;/strong&gt;&lt;br&gt;
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.&lt;/p&gt;

&lt;p&gt;A bit later, I went back to the &lt;em&gt;same chat&lt;/em&gt; and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; “okay let's try again now.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This time the reply was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Task titled "delectus aut autem" is incomplete. Nothing to summarize here.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wait. It worked? Just like that?&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the hood: LLM → JavaScript → tools → LLM
&lt;/h2&gt;

&lt;p&gt;Looking at the execution trace made it clear what happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Input is normalized and moderated
&lt;/h3&gt;

&lt;p&gt;The workflow starts with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;mux_input&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Parses the incoming chat history into a normalized &lt;code&gt;messages&lt;/code&gt; array.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;moderate&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses a small model to classify the input as safe/unsafe.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output: &lt;code&gt;0&lt;/code&gt; → safe → continue.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far this is standard workflow stuff.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;generate_js&lt;/code&gt;: the LLM writes JavaScript
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;code&gt;generate_js&lt;/code&gt; task runs with a system instruction like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;“You are an expert JavaScript agent for the Contenox runtime…”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“You must output JSON: { "code": "" }”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“You can use httpFetch, executeTask, executeTaskChain, …”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Script must be synchronous, no async/await, and must return a JSON-serializable object.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given the chat history, the model produced something like this (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var result;
var answer;
var meta = { steps: [] };

var todoUrl = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;https://jsonplaceholder.typicode.com/todos/1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;

// Step 1: Fetch the TODO item
meta.steps.push(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Fetching TODO item from &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
  answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
  meta.error = {
    message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;HTTP fetch failed&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
    url: todoUrl,
    status: fetchResponse.status,
    statusText: fetchResponse.statusText,
    body: fetchResponse.body,
    error: fetchResponse.error
  };
  result = { answer: answer, meta: meta };
} else {
  var todoItem;
  try {
    todoItem = JSON.parse(fetchResponse.body);
    meta.raw = todoItem;
  } catch (e) {
    answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
    meta.error = {
      message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;JSON parse error&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
      rawBody: fetchResponse.body,
      parseError: e.toString()
    };
    result = { answer: answer, meta: meta };
  }

  if (todoItem) {
    // Step 2: Summarize the TODO item using an LLM
    meta.steps.push(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Summarizing TODO item with LLM&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;);
    var prompt = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Summarize the following TODO item in a concise sentence: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; + JSON.stringify(todoItem);
    var summarizeTask = executeTask(prompt, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;);

    if (!summarizeTask.success) {
      answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
      meta.error = {
        message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;LLM task execution failed&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
        task_id: summarizeTask.task_id,
        error: summarizeTask.error
      };
      result = { answer: answer, meta: meta };
    } else {
      answer = summarizeTask.response;
      meta.llmResponse = summarizeTask.response;
      result = { answer: answer, meta: meta };
    }
  }
}

result;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few important details here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It &lt;strong&gt;encodes error handling&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If HTTP fails → user-friendly &lt;code&gt;answer&lt;/code&gt;, raw error in &lt;code&gt;meta.error&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If JSON parse fails → similar pattern.&lt;/li&gt;
&lt;li&gt;If the inner LLM call fails → fallback message.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;It &lt;strong&gt;delegates the actual summarization&lt;/strong&gt; to another model via &lt;code&gt;executeTask&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;It &lt;strong&gt;returns a structured &lt;code&gt;result&lt;/code&gt;&lt;/strong&gt; with both &lt;code&gt;answer&lt;/code&gt; and &lt;code&gt;meta&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This is not the model “calling tools” directly. It’s the model &lt;strong&gt;writing a program&lt;/strong&gt; that calls tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;run_js&lt;/code&gt;: execute the code in a sandbox
&lt;/h3&gt;

&lt;p&gt;The next task is &lt;code&gt;run_js&lt;/code&gt;, which is just a Contenox &lt;code&gt;hook&lt;/code&gt; that calls the JS sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"js_sandbox"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute_js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{.generate_js.code}}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the trace you can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;code&gt;httpFetch&lt;/code&gt; log for the JSONPlaceholder URL.&lt;/li&gt;
&lt;li&gt;A response with &lt;code&gt;status: 200 OK&lt;/code&gt; when things finally worked.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;code&gt;executeTask&lt;/code&gt; log with the summarization prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The sandbox result looked roughly like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task titled &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;delectus aut autem&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; is incomplete."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"llmResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task titled &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;delectus aut autem&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; is incomplete."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"raw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"delectus aut autem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Summarizing TODO item with LLM"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"logs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var result; ..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;code&gt;eval_js_result&lt;/code&gt;: success or retry?
&lt;/h3&gt;

&lt;p&gt;Now comes the evaluator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It receives a description of the JS sandbox output.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The system prompt is very strict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;ok&lt;/code&gt; is true &lt;strong&gt;and&lt;/strong&gt; there is a non-empty &lt;code&gt;result.answer&lt;/code&gt; → respond with &lt;code&gt;success&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Otherwise → respond with &lt;code&gt;retry&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;On the successful run, it answered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the workflow does &lt;em&gt;not&lt;/em&gt; go into &lt;code&gt;repair_js&lt;/code&gt; or &lt;code&gt;run_js_retry&lt;/code&gt;. Happy path.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;code&gt;answer&lt;/code&gt;: extract the final user message
&lt;/h3&gt;

&lt;p&gt;The final task, &lt;code&gt;answer&lt;/code&gt;, is intentionally boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt: &lt;em&gt;“You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing &lt;code&gt;answer&lt;/code&gt; field.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First run (&lt;code&gt;run_js&lt;/code&gt; result).&lt;/li&gt;
&lt;li&gt;Second run (&lt;code&gt;run_js_retry&lt;/code&gt;), if any.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Selection rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Take the &lt;strong&gt;last non-empty &lt;code&gt;answer&lt;/code&gt;&lt;/strong&gt; you see.&lt;/li&gt;
&lt;li&gt;Output it &lt;em&gt;verbatim&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In our case it found:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task titled "delectus aut autem" is incomplete.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s exactly what Contenox replied in chat.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this is interesting (to me, at least)
&lt;/h2&gt;

&lt;p&gt;What I originally set out to build:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A runtime for &lt;strong&gt;deterministic, observable&lt;/strong&gt; GenAI workflows.&lt;br&gt;
Tasks, transitions, hooks – all explicit and replayable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What I accidentally stumbled into:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A multi-model, self-orchestrating agent pattern,&lt;br&gt;
where LLMs write code that uses tools, and the runtime executes and evaluates that code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The pattern looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Planner LLM&lt;/strong&gt; (&lt;code&gt;generate_js&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Reads user intent + history.&lt;/li&gt;
&lt;li&gt;Emits JavaScript that calls &lt;code&gt;httpFetch&lt;/code&gt;, &lt;code&gt;executeTask&lt;/code&gt;, &lt;code&gt;executeTaskChain&lt;/code&gt;, hooks, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment&lt;/strong&gt; (&lt;code&gt;run_js&lt;/code&gt; in Goja)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Deterministic execution of that JS.&lt;/li&gt;
&lt;li&gt;Full logs of every HTTP call, every inner LLM call, every step.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Controller LLM&lt;/strong&gt; (&lt;code&gt;eval_js_result&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Looks at the sandbox result.&lt;/li&gt;
&lt;li&gt;Decides: is this good enough? Retry? Repair?&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Repair LLM&lt;/strong&gt; (&lt;code&gt;repair_js&lt;/code&gt;, if needed)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Gets the previous code + error output.&lt;/li&gt;
&lt;li&gt;Writes a fixed version of the JS.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Answer LLM&lt;/strong&gt; (&lt;code&gt;answer&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Doesn’t “reason” at all.&lt;/li&gt;
&lt;li&gt;Just extracts the final &lt;code&gt;answer&lt;/code&gt; text safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that is expressed as an explicit &lt;strong&gt;state machine&lt;/strong&gt; in Contenox.&lt;/p&gt;

&lt;p&gt;No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.&lt;/p&gt;




&lt;p&gt;To me, that’s the exciting part:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You don’t have to choose between “boring deterministic workflows” and “fancy agents”.&lt;br&gt;
You can build the agent &lt;em&gt;on top of&lt;/em&gt; deterministic workflows.&lt;br&gt;
And everything stays **self-hosted, inspectable, and auditable if you want.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>javascript</category>
      <category>ai</category>
      <category>go</category>
    </item>
    <item>
      <title>When to Use OpenAI + Tools vs a Workflow Runtime</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Mon, 27 Oct 2025 18:07:07 +0000</pubDate>
      <link>https://dev.to/js402/when-to-use-openai-tools-vs-a-workflow-runtime-1n6f</link>
      <guid>https://dev.to/js402/when-to-use-openai-tools-vs-a-workflow-runtime-1n6f</guid>
      <description>&lt;p&gt;Modern “agentic AI” needs more than prompts—it needs architecture.&lt;br&gt;&lt;br&gt;
This guide shows when to stay inside OpenAI’s style tool ecosystem and when to move to a workflow runtime for observability, safety, and control.&lt;/p&gt;
&lt;h3&gt;
  
  
  💡 TL;DR
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;OpenAI + function calling or MCP&lt;/strong&gt; when your AI just needs to answer a question. Maybe call one or two tools. All in one turn.
&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;workflow runtime&lt;/strong&gt; when your AI must run multiple steps, trigger hooks, or perform actions that need to be observable, auditable, and reliable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are complementary, not competing, approaches.&lt;/p&gt;


&lt;h3&gt;
  
  
  🔍 Two Ways to Build Agentic AI
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. &lt;strong&gt;The “Chat + Tools” Approach&lt;/strong&gt; (OpenAI, Anthropic, MCP)
&lt;/h4&gt;

&lt;p&gt;The LLM drives everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What’s the weather in Berlin?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;weather_tool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The model decides whether to call a tool.
&lt;/li&gt;
&lt;li&gt;Your code runs it and returns the result.
&lt;/li&gt;
&lt;li&gt;The model gives a final answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ Great for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick Q&amp;amp;A
&lt;/li&gt;
&lt;li&gt;Simple assistants
&lt;/li&gt;
&lt;li&gt;Early prototypes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ Falls short when you need  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step logic
&lt;/li&gt;
&lt;li&gt;Retries or human approval
&lt;/li&gt;
&lt;li&gt;Audit trails or state
&lt;/li&gt;
&lt;li&gt;Compliance or safety guardrails
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, the LLM is both brain and driver. You hand it tools and hope for the best. (We’ve all seen what happens when an unguarded LLM calls a destructive tool.)&lt;/p&gt;




&lt;h4&gt;
  
  
  2. &lt;strong&gt;The “Workflow Runtime” Approach&lt;/strong&gt; (contenox, Temporal+LLMs, custom orchestrators)
&lt;/h4&gt;

&lt;p&gt;I’ll use &lt;strong&gt;contenox&lt;/strong&gt; to show how this works differently.&lt;/p&gt;

&lt;p&gt;You define the workflow as a clear sequence of tasks. Each has a handler, optional LLM use, and transitions.&lt;/p&gt;

&lt;p&gt;Realistic contenox syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;weather-advisor&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Suggests actions based on the weather forecast&lt;/span&gt;
&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_weather&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fetch weather data via external hook&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hook&lt;/span&gt;
    &lt;span class="na"&gt;hook&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;weather-api&lt;/span&gt;
      &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_forecast&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Berlin"&lt;/span&gt;
    &lt;span class="na"&gt;output_template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{.temperature}}"&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;"&lt;/span&gt;
          &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;25"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggest_icecream"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggest_walk"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;suggest_icecream&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model_execution&lt;/span&gt;
    &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;it's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hot,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suggest&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fun&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;outdoor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;activity&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;involving&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cream."&lt;/span&gt;
    &lt;span class="na"&gt;execute_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phi3:3.8b&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;suggest_walk&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model_execution&lt;/span&gt;
    &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;it's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cool,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suggest&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;something&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;relaxing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walk&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coffee&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;indoors."&lt;/span&gt;
    &lt;span class="na"&gt;execute_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phi3:3.8b&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Great for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliable, stateful workflows
&lt;/li&gt;
&lt;li&gt;Real actions (APIs, notifications, DB writes)
&lt;/li&gt;
&lt;li&gt;Replay, audit, and debugging
&lt;/li&gt;
&lt;li&gt;Controlled, compliant agents
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ Overkill for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple chatbots
&lt;/li&gt;
&lt;li&gt;One-off prompts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, &lt;strong&gt;you&lt;/strong&gt; control the flow. The LLM is just one worker in the chain.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧠 Why Both Exist
&lt;/h3&gt;

&lt;p&gt;They solve different problems.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Best Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Assistive AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;“Help me get an answer fast.”&lt;/td&gt;
&lt;td&gt;OpenAI + Tools / MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autonomous AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;“Run a safe, reliable process.”&lt;/td&gt;
&lt;td&gt;Workflow runtime (contenox, Flyte, Temporal)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it this way:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI + Tools&lt;/strong&gt; is your clever intern—fast but unpredictable.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;contenox&lt;/strong&gt; is your project manager—structured, logged, and accountable.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🛠️ How to Choose
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Best Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“Ask HR about PTO policy.”&lt;/td&gt;
&lt;td&gt;✅ OpenAI + RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Detect outage → Slack alert → Jira ticket → confirm fix.”&lt;/td&gt;
&lt;td&gt;✅ contenox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Generate a report and email it.”&lt;/td&gt;
&lt;td&gt;⚠️ Start with OpenAI. Switch to contenox if reliability matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Run AI in an air-gapped system.”&lt;/td&gt;
&lt;td&gt;✅ contenox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Weekend agent hack.”&lt;/td&gt;
&lt;td&gt;✅ OpenAI + function calling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  🔮 The Future: They’ll Meet in the Middle
&lt;/h3&gt;

&lt;p&gt;MCP will add light state.&lt;br&gt;&lt;br&gt;
Workflow runtimes will simplify small jobs.  &lt;/p&gt;

&lt;p&gt;But the core question stays the same:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Is your AI assisting—or acting?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If it’s assisting, use tools.&lt;br&gt;&lt;br&gt;
If it’s acting, use orchestration.&lt;/p&gt;


&lt;h3&gt;
  
  
  🚀 Try contenox Yourself
&lt;/h3&gt;

&lt;p&gt;It’s open source and self-hostable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/contenox/runtime.git  
&lt;span class="nb"&gt;cd &lt;/span&gt;runtime
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
./scripts/bootstrap.sh nomic-embed-text:latest phi3:3.8b phi3:3.8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define a workflow like the YAML above. Register your hooks. Watch your AI take real, safe actions.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/contenox/runtime" rel="noopener noreferrer"&gt;GitHub: contenox/runtime&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>saas</category>
      <category>webdev</category>
    </item>
    <item>
      <title>A War Story: Building Products to Solve Your Own Pain Points</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 25 Oct 2025 09:53:46 +0000</pubDate>
      <link>https://dev.to/js402/a-war-story-building-products-to-solve-your-own-pain-points-5d1i</link>
      <guid>https://dev.to/js402/a-war-story-building-products-to-solve-your-own-pain-points-5d1i</guid>
      <description>&lt;p&gt;Let's put it straight: in March, I committed to building a product, not a sane one. I wanted to solve what frustrated me the most with tools like ChatGPT, Gemini, or shell apps like n8n.&lt;/p&gt;

&lt;p&gt;My initial goal wasn't to sell it to customers or pitch it to VCs, but just to ensure I had the tool to tame LLMs, Agents, or even AGI (if it ever ships), so that what had invaded my work life out of necessity would actually work for me and not glitch randomly or just try to trap me.&lt;/p&gt;

&lt;p&gt;So there I was, spending almost every second I had building a platform that should allow for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step, multi-modal LLM workflows&lt;/li&gt;
&lt;li&gt;Declarative behavior for AI Agents&lt;/li&gt;
&lt;li&gt;Proper RAG (Retrieval-Augmented Generation)&lt;/li&gt;
&lt;li&gt;Support for any API as a Tool&lt;/li&gt;
&lt;li&gt;Handling cloud provider-level traffic&lt;/li&gt;
&lt;li&gt;Magically splitting the codebase into Open Source and EE (Enterprise Edition)&lt;/li&gt;
&lt;li&gt;... and, sure thing, doing the core in Go&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, seven months in, after 60+ hours per week coding and trying to somehow make this more than just my own tooling project, losing my personal money while pulling my hair out, observing market developments, and model-interface providers changing APIs.&lt;/p&gt;

&lt;p&gt;I kept discovering more players each month, building some or all of the features on my roadmap. Which kind of validated that there’s no universal solution yet.&lt;br&gt;
A core question always echoed in my mind: &lt;/p&gt;

&lt;p&gt;"Is what I'm doing a business?"&lt;/p&gt;

&lt;p&gt;Researching and identifying pain points and collaborating with others to identify use cases and verticals beyond my own worldviews.&lt;/p&gt;

&lt;p&gt;Hell, I even set aside budgets and evaluated contractors and friends to outsource some coding and essential but non-negligible tasks, for example, copywriting, to ensure I meet my own roadmap... &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Since when did I become a PM? Wasn't that a space I always looked down on and never wanted to touch? &lt;/p&gt;
&lt;/blockquote&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;naro@xaxen:~/src/github.com/contenox/runtime2&lt;span class="nv"&gt;$ &lt;/span&gt;cloc &lt;span class="nb"&gt;.&lt;/span&gt;
     650 text files.
     631 unique files.
      22 files ignored.

github.com/AlDanial/cloc v 1.98  &lt;span class="nv"&gt;T&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.40 s &lt;span class="o"&gt;(&lt;/span&gt;1580.5 files/s, 340245.6 lines/s&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
Language                     files          blank        comment           code
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
Go                             368           9637           9139          53197
JSON                            15              2              0          24941
TypeScript                     176           1022            159          12702
Markdown                         6           2804             66           8572
YAML                             6             14             10           5187
Python                          25            687            443           3573
CSS                              4             45             37           2103
Bourne Shell                     3             64             73            378
SQL                              2             73              6            285
make                             2             32              1            123
HCL                              3             19             21            113
Dockerfile                       3             23             20             84
SVG                              9              8              5             73
JavaScript                       6              7              0             69
HTML                             1              0              0             13
Text                             2              0              0             10
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
SUM:                           631          14437           9980         111423
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Here I am, sitting on three major rewrites and a codebase so big I can't navigate using just the file tree. &lt;/p&gt;

&lt;p&gt;I achieved my goal. Kinda. Yet never truly released. And I didn't stop there.&lt;/p&gt;

&lt;p&gt;While working on this project, nuances about LLMs—their use cases, strengths, weaknesses, and mitigation strategies appeared. &lt;/p&gt;

&lt;p&gt;I had to find out what the nature of LLMs is. And soon I was on another mission, validating a new vision: &lt;/p&gt;

&lt;p&gt;"I want to define how AI behaves, interacts, and learns."&lt;/p&gt;

&lt;p&gt;It seems that, yeah, with the proper implementation, the platform can handle a lot of autonomy- &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it could be used to the point of self-optimizing at runtime &lt;/li&gt;
&lt;li&gt;or let it judge its own performance &lt;/li&gt;
&lt;li&gt;or even to then trigger the generation of content for model fine-tuning. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And since it doesn't require an external model (or can selectively use them), there is no constraint on what it can be used for.&lt;/p&gt;




&lt;p&gt;Let's wrap up, me rambling here. &lt;br&gt;
I don't encourage anyone to repeat the same path. &lt;br&gt;
No matter how much, only geeks glamorize the builder path. &lt;br&gt;
Here is what I mean: &lt;/p&gt;

&lt;p&gt;I noticed changes and reality catching up; yes, not just minor, fixable things like my bank account's balance, but also longer-lasting changes—from the simple "everyone hates me now because I was so busy" to mindset shifts that can’t truly be undone. &lt;/p&gt;

&lt;p&gt;I built the tool I needed, but now it's clear:&lt;br&gt;
This is just the start. And it gets expensive on all fronts from here. &lt;/p&gt;

&lt;p&gt;Don't get me wrong. I still don't care that much about monetizing all this effort. But it's clear that without doing so, it will just die.&lt;/p&gt;

&lt;p&gt;It’s been clear for a while that my ‘2–3 month’ estimate was a severe miscalculation. &lt;/p&gt;

&lt;p&gt;And now what?&lt;br&gt;
What’s the lesson here?&lt;br&gt;
More crucially — what would be different if I had known the game I was playing?&lt;/p&gt;

</description>
      <category>startup</category>
      <category>ai</category>
      <category>llms</category>
      <category>saas</category>
    </item>
    <item>
      <title>Moving the Needle: How to Analyze Your Product Roadmap Monthly</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Wed, 01 Oct 2025 10:25:01 +0000</pubDate>
      <link>https://dev.to/js402/moving-the-needle-how-to-analyze-your-product-roadmap-monthly-3f0p</link>
      <guid>https://dev.to/js402/moving-the-needle-how-to-analyze-your-product-roadmap-monthly-3f0p</guid>
      <description>&lt;p&gt;No matter your stage whether you’re pre–product-market fit or running an established product with active users the difference between drifting and deliberate progress comes down to one thing: &lt;strong&gt;a repeatable process to see the big picture and align your team’s work with real user and business needs&lt;/strong&gt;. Get this right, and you accelerate toward value. Get it wrong, and even great execution leads you astray.&lt;/p&gt;

&lt;p&gt;This is where product roadmaps matter.&lt;br&gt;&lt;br&gt;
And no they’re not just a list of “coming soon” features.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: I’m not talking about textbook project management or theoretical best practices. This is about what it means to &lt;em&gt;own outcomes&lt;/em&gt; as a technical founder building a pre–PMF product, where every decision carries weight and data is scarce.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h3&gt;
  
  
  Where is the needle today?
&lt;/h3&gt;

&lt;p&gt;As the development cycle wraps up, the first step is to gather context:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What was actually shipped?
&lt;/li&gt;
&lt;li&gt;What impact did it have? (e.g., user feedback, announcement impressions, early traction)
&lt;/li&gt;
&lt;li&gt;What are stakeholders saying?
&lt;/li&gt;
&lt;li&gt;What moves are competitors making?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The exact data points depend on what your team already measures but even sparse signals matter.&lt;/p&gt;

&lt;p&gt;Combine this with concrete output code merged, deployments shipped, experiments run and compile it into a &lt;strong&gt;brief, neutral summary&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
No interpretations. No judgments. Just the facts.&lt;/p&gt;


&lt;h3&gt;
  
  
  End of the cycle: Where did we want to be?
&lt;/h3&gt;

&lt;p&gt;Now that you have a neutral summary of what actually happened, the next step is to &lt;strong&gt;compare it against your original intent&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Did you move the needle toward your goal? Were you on track, slightly off course, or did you effectively pivot intentionally or not?&lt;/p&gt;

&lt;p&gt;This is where a structured LLM prompt can help surface patterns you might miss. Start with something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Yeah, it’s time again for a Slice and product review.

Analyze the provided roadmap status using the Slices method and the raw summaries of completed work from September to deliver a concise progress review.

- Identify discrepancies between planned and actual outcomes  
- Assess delivery velocity  
- Flag signs of strategic shifts or pivots  
- Highlight wins and fully completed items  

___  
Use the development method to structure your response for the September Slice.  
First, extract the actual working items independently of the roadmap document.

&amp;lt;&amp;lt;PASTE YOUR ROADMAP HERE&amp;gt;&amp;gt;  
&amp;lt;&amp;lt;PASTE YOUR DEVELOPMENT METHOD HERE&amp;gt;&amp;gt;  

___  

&amp;lt;&amp;lt;PASTE THE WORKING ITEMS HERE&amp;gt;&amp;gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don’t expect perfect output on the first try. Tweak the prompt until the model gives you &lt;strong&gt;actionable, structured insights&lt;/strong&gt; not just fluff.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: Replace "Slices" with the actual method name you are using.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once you have a useful analysis, &lt;strong&gt;start a fresh chat&lt;/strong&gt; and use this second prompt to force an external perspective:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I’m a product manager newly joining this team. I’ve observed this workflow:

This is a meta-task. Your job is to understand the process the engineers followed based on their prior LLM analysis and then deliver a big-picture assessment with clear recommendations for what I, as product manager, should do next.

Before I share the details:  
**What is your task?**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the model confirms it understands, paste the full input and output from your first chat &lt;strong&gt;not the raw working items&lt;/strong&gt;, but the LLM’s structured analysis like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Yeah, it’s time again for a Slice and product review.

Analyze the provided roadmap status using the Slices method and the raw summaries of completed work from September to deliver a concise progress review.

- Identify discrepancies between planned and actual outcomes  
- Assess delivery velocity  
- Flag signs of strategic shifts or pivots  
- Highlight wins and fully completed items  

___  
Use the development method to structure your response for the September Slice.  
First, extract the actual working items independently of the roadmap document.

&amp;lt;&amp;lt;PASTE YOUR ROADMAP HERE&amp;gt;&amp;gt;  
&amp;lt;&amp;lt;PASTE YOUR DEVELOPMENT METHOD HERE&amp;gt;&amp;gt;  

___  

&amp;lt;&amp;lt;PASTE THE OUTPUT OF THE ANALYSIS FROM THE FIRST CHAT INSTANCE HERE&amp;gt;&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll typically get something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Thank you for the context.

To directly answer your original question:

&amp;gt; **“Before I share the input, what is your task?”**  
&amp;gt; I’m to interpret the team’s recent work pattern as an external product manager and recommend next steps.

### 🔍 Key Observations from Your Shared Workflow

#### 1. **The Team Uses “Slices” as a Living, Adaptive Planning Tool**  
…

#### 2. **A Major Strategic Pivot Happened in August–September**  
- The original direction (MVP app → SaaS product with billing) has **shifted toward a developer-centric runtime platform**.  
…

#### 4. **Velocity Is High, But Direction Is Shifting**  
- The team ships frequently and is highly productive.  
- However, output no longer aligns with the original product narrative.  
- If stakeholders expect a SaaS app by Q4, there’s a **real delivery risk**.

---

### 🧭 Your Role as Product Manager: What to Do Next  
…
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, &lt;strong&gt;compare this output to your own understanding&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
If it largely confirms your view, that’s a good sign—you’ve gathered solid data and interpreted it reasonably.&lt;/p&gt;

&lt;p&gt;But if it &lt;strong&gt;surprises or contradicts you&lt;/strong&gt;, treat that as a signal:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did you miss key data?
&lt;/li&gt;
&lt;li&gt;Is your data incomplete or biased?
&lt;/li&gt;
&lt;li&gt;Are your internal processes misaligned with your stated goals?
&lt;/li&gt;
&lt;li&gt;Or have you simply lost touch with what your team is actually building?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use that gap to &lt;strong&gt;learn, adjust, and rerun the cycle&lt;/strong&gt; not to dismiss the output.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Caveat&lt;/strong&gt; (you likely know this but it’s worth stating clearly):&lt;br&gt;&lt;br&gt;
This approach only works if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You feed the LLM &lt;strong&gt;faithful, complete raw inputs&lt;/strong&gt; not just cherry-picked wins.
&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;constrain it to reason from evidence&lt;/strong&gt;, not hallucinate (which requires enough context window to hold your full input).
&lt;/li&gt;
&lt;li&gt;You treat its output &lt;strong&gt;not as truth, but as a hypothesis-generating reflection&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Your situation &lt;strong&gt;doesn’t rely on deep, multi-step causal reasoning&lt;/strong&gt; (e.g., forensic cybersecurity analysis or regulatory compliance logic)—LLMs aren’t built for that kind of “why.”&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  What’s next?
&lt;/h3&gt;

&lt;p&gt;Now you’ve answered the core questions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Where are we?&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Where did we plan to be?&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;What’s the gap?&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But there’s no universal “next step.” Your move depends entirely on your context, goals, and constraints. You might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run a &lt;strong&gt;RICE prioritization&lt;/strong&gt; to course-correct and get back on track.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Double down&lt;/strong&gt; on the unexpected direction you’ve drifted into because it’s actually more promising.
&lt;/li&gt;
&lt;li&gt;Simply &lt;strong&gt;review your backlog&lt;/strong&gt; and keep shipping, if you’re still aligned and learning.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the point: &lt;strong&gt;clarity enables choice&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
And in the pre–PMF fog, that’s your real advantage.&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>startup</category>
      <category>softwareengineering</category>
      <category>llm</category>
    </item>
    <item>
      <title>Vibe Coding a Simple Feature Took 3 Hours. Here's Why.</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Wed, 27 Aug 2025 19:02:01 +0000</pubDate>
      <link>https://dev.to/js402/vibe-coding-a-simple-feature-took-3-hours-heres-why-4f7m</link>
      <guid>https://dev.to/js402/vibe-coding-a-simple-feature-took-3-hours-heres-why-4f7m</guid>
      <description>&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Today, I tried what people call &lt;em&gt;vibe coding&lt;/em&gt;. The rule: I only prompt the model for code—no touching the output manually.&lt;/p&gt;

&lt;p&gt;The task seemed simple enough: add &lt;code&gt;Seed&lt;/code&gt; and &lt;code&gt;TopP&lt;/code&gt; parameters to my Go model-provider abstraction. This was straightforward plumbing, with the catch that &lt;strong&gt;all existing unit and integration tests must still pass&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I started with this interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ChatArgument&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;setTemperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;setMaxTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;setTopP&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c"&gt;// to be implemented&lt;/span&gt;
    &lt;span class="n"&gt;setSeed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;     &lt;span class="c"&gt;// to be implemented&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the usual entry points:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;VLLMChatClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
      &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;ChatArgument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;OpenAIChatClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
      &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;ChatArgument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;OllamaChatClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
      &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;ChatArgument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;GeminiChatClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
      &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;ChatArgument&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;modelrepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"You are a task processor talking to other machines. Answer briefly."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"What is the capital of Italy?"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;modelrepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithTemperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;modelrepo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithMaxTokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;60&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;require&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NoError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;assert&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ToLower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"rome"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All I wanted was to add two new arguments. A 20–60 minute manual job, tops.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unexpected Detour
&lt;/h2&gt;

&lt;p&gt;Instead of giving me the tiny change I asked for, the model rewrote my implementations. Massive diffs. The &lt;code&gt;ChatArgument&lt;/code&gt; interface turned into... something else entirely. Sure, that might have been fine for a greenfield project, but in my codebase, four other layers depended on the existing package API, which exposed the &lt;code&gt;With...&lt;/code&gt; option pattern.&lt;/p&gt;

&lt;p&gt;That’s when I got curious: Why was the model so confident about "fixing" something I didn't want fixed?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Debate
&lt;/h2&gt;

&lt;p&gt;So I asked it to brainstorm patterns.&lt;br&gt;
Three hours later, I was in a full-on design debate with my AI assistant. It defended its choices like a junior dev who thinks they’re right and you just don’t understand their genius.&lt;/p&gt;

&lt;p&gt;The first idea it pushed was the classic Go functional options pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ChatOption&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;chatOptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;chatOptions&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Temperature&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;
    &lt;span class="n"&gt;MaxTokens&lt;/span&gt;   &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;TopP&lt;/span&gt;        &lt;span class="kt"&gt;float64&lt;/span&gt;
    &lt;span class="n"&gt;Seed&lt;/span&gt;        &lt;span class="kt"&gt;int&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;WithTemperature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;ChatOption&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On paper? Looks fine. In practice? Useless for my case. There’s no way to tell if &lt;code&gt;TopP&lt;/code&gt; was actually set or if it just defaulted to &lt;code&gt;0.0&lt;/code&gt;. And since LLM API defaults are rarely zero and differ between vendors, that distinction is critical.&lt;/p&gt;

&lt;p&gt;But instead of adjusting, the model doubled down. Builder pattern. Map-based options. Configuration structs. Each round, it grew more confident and more critical of my existing approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Breaking the Rule
&lt;/h2&gt;

&lt;p&gt;By 3 PM, I was staring at my to-do list—performance benchmarks, landing page copy, demo prep—and realizing that none of that was happening today.&lt;/p&gt;

&lt;p&gt;So I broke my own rule. I handed the model the blueprint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ChatConfig&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Temperature&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="s"&gt;`json:"temperature,omitempty"`&lt;/span&gt;
    &lt;span class="n"&gt;MaxTokens&lt;/span&gt;   &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="s"&gt;`json:"max_tokens,omitempty"`&lt;/span&gt;
    &lt;span class="n"&gt;TopP&lt;/span&gt;        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt; &lt;span class="s"&gt;`json:"top_p,omitempty"`&lt;/span&gt;
    &lt;span class="n"&gt;Seed&lt;/span&gt;        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="s"&gt;`json:"seed,omitempty"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;ChatArgument&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;ChatConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This new interface was less flexible than my original, but it was simple enough for the AI to understand while still preserving the key feature: &lt;strong&gt;pointers&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;nil&lt;/code&gt; → &lt;strong&gt;Unset&lt;/strong&gt;, use vendor default.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;amp;0.0&lt;/code&gt; → &lt;strong&gt;Explicitly set&lt;/strong&gt; to zero.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s exactly what you need when bridging multiple LLM APIs with different defaults.&lt;/p&gt;

&lt;p&gt;And once I gave it the pattern, the model behaved. Five minutes later, I had the snippets I needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;In hindsight, the problem wasn’t just “bad AI output.” My variable names weren't perfect, the interface was more of a type-safety sanity check, and some comments were stale. This context pollution—the typical stuff in any living codebase—probably nudged the model toward the wrong patterns.&lt;/p&gt;

&lt;p&gt;Still, what should’ve been a one-hour manual coding task turned into a three-hour argument with an overconfident assistant.&lt;/p&gt;

&lt;p&gt;More importantly, it validated why my abstraction looks the way it does. The pointer-based config wasn’t some over-engineering exercise; it was a deliberate design to handle the &lt;strong&gt;unset vs. explicit&lt;/strong&gt; states across inconsistent vendor APIs.&lt;/p&gt;

&lt;p&gt;The model, lacking that context, kept trying to “fix” it.&lt;/p&gt;

&lt;p&gt;The lesson? AI can be an excellent executor when you hand it a precise blueprint. But as an architect? Not so much.&lt;/p&gt;

&lt;p&gt;And that’s exactly why I built &lt;a href="https://github.com/contenox/runtime" rel="noopener noreferrer"&gt;contenox/runtime&lt;/a&gt; — because if you want agents to do serious work, abstractions and guardrails aren’t optional.&lt;/p&gt;

&lt;p&gt;I’ll invite you to join: let’s take control back from the LLMs.&lt;/p&gt;

</description>
      <category>go</category>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
