<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alexander Ertli</title>
    <description>The latest articles on DEV Community by Alexander Ertli (@js402).</description>
    <link>https://dev.to/js402</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3166872%2Fa4ee43ac-33d2-48f8-a75e-89728075e64a.png</url>
      <title>DEV Community: Alexander Ertli</title>
      <link>https://dev.to/js402</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/js402"/>
    <language>en</language>
    <item>
      <title>Shouldn't AI Move From Cloud to Local Compute?</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 14 Jun 2026 10:02:34 +0000</pubDate>
      <link>https://dev.to/js402/shouldnt-ai-move-from-cloud-to-local-compute-1gdd</link>
      <guid>https://dev.to/js402/shouldnt-ai-move-from-cloud-to-local-compute-1gdd</guid>
      <description>&lt;p&gt;A few things happened almost at the same time.&lt;/p&gt;

&lt;p&gt;GitHub moved Copilot deeper into usage-based billing.&lt;/p&gt;

&lt;p&gt;OpenAI kept pushing the Responses API as the default primitive for building agents.&lt;/p&gt;

&lt;p&gt;Anthropic launched Fable/Mythos and then had to suspend access a few days later because of a U.S. government directive.&lt;/p&gt;

&lt;p&gt;NVIDIA is putting “personal AI supercomputer” hardware into the market with DGX Spark and DGX Station.&lt;/p&gt;

&lt;p&gt;Individually, each of those stories is easy to treat as separate news.&lt;/p&gt;

&lt;p&gt;I do not think they are separate.&lt;/p&gt;

&lt;p&gt;I think they are all pointing in the same direction:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI coding is moving from a cloud feature into local infrastructure.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And that changes the question for developers.&lt;/p&gt;

&lt;p&gt;Not just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which agent should I use?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Where does the agent run?&lt;br&gt;
Which models can it use?&lt;br&gt;
Who owns the runtime? And who the compute?&lt;br&gt;
What happens when pricing, policy, model access, or hardware changes?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the part I think matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What happened
&lt;/h2&gt;

&lt;p&gt;GitHub announced that Copilot plans transition to usage-based billing on June 1, 2026. The old premium request unit model is being replaced by GitHub AI Credits, with usage calculated from token consumption: input tokens, output tokens, and cached tokens, priced according to the model used. GitHub also says this is because Copilot has moved from a simple in-editor assistant toward an agentic platform that can run long multi-step coding sessions across repositories. (&lt;a href="https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/" rel="noopener noreferrer"&gt;The GitHub Blog&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;There is an important detail here: GitHub says code completions and Next Edit suggestions remain included and do not consume AI Credits. So this is not “every ghost text completion is billed now.” The bigger point is that agentic usage, chat, long-running sessions, code review, and heavier model work are now part of a visible token economy. (&lt;a href="https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/" rel="noopener noreferrer"&gt;The GitHub Blog&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;OpenAI is moving in the same direction from the other side. The Responses API was introduced as a new primitive for building agents, combining Chat Completions-style simplicity with tool use from the Assistants API. OpenAI also added built-in tools like web search, file search, computer use, and an Agents SDK with tracing and observability. (&lt;a href="https://openai.com/index/new-tools-for-building-agents/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Then OpenAI expanded Responses further with remote MCP server support, Code Interpreter, improved file search, background mode for long-running tasks, reasoning summaries, and encrypted reasoning items. (&lt;a href="https://openai.com/index/new-tools-and-features-in-the-responses-api/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That is not just “new API features.”&lt;/p&gt;

&lt;p&gt;That is model vendors moving up the stack into runtime, tools, state, observability, and orchestration.&lt;/p&gt;

&lt;p&gt;Then the Anthropic Fable/Mythos story happened.&lt;/p&gt;

&lt;p&gt;Anthropic said the U.S. government issued an export-control directive requiring suspension of access to Fable 5 and Mythos 5 by foreign nationals, including foreign-national Anthropic employees. Anthropic said the practical result was that it had to disable access for all customers to ensure compliance. (&lt;a href="https://www.anthropic.com/news/fable-mythos-access" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;) The Verge reported the same basic story: an export-control directive citing national security concerns required blocking access for foreign nationals, and Anthropic cut access for all customers. (&lt;a href="https://www.theverge.com/ai-artificial-intelligence/949553/anthropic-fable-5-mythos-5-government-national-security" rel="noopener noreferrer"&gt;The Verge&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;You can argue about the policy. You can argue about the safety question. You can argue about whether the government overreacted.&lt;/p&gt;

&lt;p&gt;But as a developer, the practical lesson is boring:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;remote model access is not a stable primitive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It can change because of price.&lt;/p&gt;

&lt;p&gt;It can change because of policy.&lt;/p&gt;

&lt;p&gt;It can change because of region.&lt;/p&gt;

&lt;p&gt;It can change because of provider risk posture.&lt;/p&gt;

&lt;p&gt;It can change because of model availability.&lt;/p&gt;

&lt;p&gt;That does not mean “never use frontier models.” That would be stupid. Frontier models are extremely useful.&lt;/p&gt;

&lt;p&gt;It means the runtime layer matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for us as developers
&lt;/h2&gt;

&lt;p&gt;For the last two years, a lot of AI tooling was sold like a product feature.&lt;/p&gt;

&lt;p&gt;Install extension.&lt;/p&gt;

&lt;p&gt;Sign in.&lt;/p&gt;

&lt;p&gt;Get magic.&lt;/p&gt;

&lt;p&gt;That worked for the first wave.&lt;/p&gt;

&lt;p&gt;But serious AI coding work is not just a UI feature anymore. It is a stack.&lt;/p&gt;

&lt;p&gt;You have model selection.&lt;/p&gt;

&lt;p&gt;You have context management.&lt;/p&gt;

&lt;p&gt;You have file search.&lt;/p&gt;

&lt;p&gt;You have tool execution.&lt;/p&gt;

&lt;p&gt;You have shell access.&lt;/p&gt;

&lt;p&gt;You have policy.&lt;/p&gt;

&lt;p&gt;You have logs.&lt;/p&gt;

&lt;p&gt;You have long-running tasks.&lt;/p&gt;

&lt;p&gt;You have cost.&lt;/p&gt;

&lt;p&gt;You have rate limits.&lt;/p&gt;

&lt;p&gt;You have model-specific quirks.&lt;/p&gt;

&lt;p&gt;You have the question of where your source code, prompts, tool results, and traces go.&lt;/p&gt;

&lt;p&gt;That is runtime territory.&lt;/p&gt;

&lt;p&gt;The uncomfortable part is that a lot of developers are now using agents that can inspect repos, edit files, run commands, open PRs, call tools, and sometimes run for minutes or hours — while the execution layer underneath is still treated like a black box.&lt;/p&gt;

&lt;p&gt;That is fine for experiments.&lt;/p&gt;

&lt;p&gt;It is not fine as the default future.&lt;/p&gt;

&lt;p&gt;If AI coding becomes part of normal development, then developers and teams need more control over the runtime.&lt;/p&gt;

&lt;p&gt;Not because cloud is bad.&lt;/p&gt;

&lt;p&gt;Because dependency without control is fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local compute trend is not a meme anymore
&lt;/h2&gt;

&lt;p&gt;The other signal is hardware.&lt;/p&gt;

&lt;p&gt;NVIDIA DGX Spark is a desktop AI system with a Grace Blackwell GB10 Superchip, 128 GB of coherent unified memory, and up to 1 petaFLOP of FP4 AI performance. NVIDIA says it can run AI development and testing workloads with models up to 200 billion parameters at the desktop, and two DGX Spark systems can connect for models up to 405 billion parameters. (&lt;a href="https://www.nvidia.com/en-us/products/workstations/dgx-spark/" rel="noopener noreferrer"&gt;NVIDIA&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;DGX Station goes even further. NVIDIA describes it as a deskside AI supercomputer with 748 GB of coherent memory and up to 20 petaFLOPS of AI compute, supporting models up to 1 trillion parameters. NVIDIA also announced DGX Station for Windows as a system that can serve as a dedicated AI supercomputer for one developer or a shared local compute node for teams. (&lt;a href="https://www.nvidia.com/en-us/products/workstations/dgx-station/" rel="noopener noreferrer"&gt;NVIDIA&lt;/a&gt;) (&lt;a href="https://nvidianews.nvidia.com/news/nvidia-dgx-station-for-windows-puts-a-trillion-parameter-ai-supercomputer-on-every-enterprise-desk" rel="noopener noreferrer"&gt;NVIDIA Newsroom&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Now, obviously, not everyone is buying a DGX Station.&lt;/p&gt;

&lt;p&gt;That is not the point.&lt;/p&gt;

&lt;p&gt;The point is the direction of travel.&lt;/p&gt;

&lt;p&gt;For a while, local AI meant “maybe you can run a small model on your laptop if you are patient.”&lt;/p&gt;

&lt;p&gt;Now the market is clearly moving toward a more serious local/on-prem/private-compute tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;laptop for small models and simple workflows&lt;/li&gt;
&lt;li&gt;workstation for serious local development&lt;/li&gt;
&lt;li&gt;rented GPU box for heavier workloads&lt;/li&gt;
&lt;li&gt;team-local node for shared inference&lt;/li&gt;
&lt;li&gt;cloud frontier model when needed&lt;/li&gt;
&lt;li&gt;data center or cluster later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a very different world from “all useful intelligence lives behind one vendor API.”&lt;/p&gt;

&lt;p&gt;And once local or rented compute becomes powerful enough, the missing piece is not only the model runner.&lt;/p&gt;

&lt;p&gt;It is the runtime around it.&lt;/p&gt;

&lt;p&gt;Routing.&lt;/p&gt;

&lt;p&gt;Compatibility.&lt;/p&gt;

&lt;p&gt;Policies.&lt;/p&gt;

&lt;p&gt;Tools.&lt;/p&gt;

&lt;p&gt;Logs.&lt;/p&gt;

&lt;p&gt;Editor integration.&lt;/p&gt;

&lt;p&gt;The boring stuff.&lt;/p&gt;

&lt;p&gt;The stuff that makes it usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ecosystem is already forming
&lt;/h2&gt;

&lt;p&gt;This is where I think people should not be tribal.&lt;/p&gt;

&lt;p&gt;If you want to start today, there are already good projects.&lt;/p&gt;

&lt;p&gt;Kilo Code is an open-source AI coding agent across VS Code, JetBrains, CLI, and cloud. It supports many models, bring-your-own-key usage, multiple agent modes, autocomplete, and a full agentic coding experience. (&lt;a href="https://kilo.ai/" rel="noopener noreferrer"&gt;Kilo&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;OpenCode is another important piece. It is an open-source coding agent available as a terminal interface, desktop app, or IDE extension. It is model-agnostic and clearly sits in the “developer agent” category. (&lt;a href="https://opencode.ai/docs/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;Ollama is probably the easiest starting point for local models. It gives developers a simple way to run and manage open models locally, and it exposes a REST API for chat and generation on &lt;code&gt;localhost:11434&lt;/code&gt;. (&lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;) (&lt;a href="https://github.com/ollama/ollama" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;If you are more advanced, or responsible for a team, vLLM is the next layer to understand. It is a high-throughput and memory-efficient inference and serving engine. The docs highlight Hugging Face integration, streaming outputs, tool calling and reasoning parsers, distributed inference features, and OpenAI-compatible API serving. (&lt;a href="https://docs.vllm.ai/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;That rough map matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;want to just run a local model: start with Ollama&lt;/li&gt;
&lt;li&gt;want an open-source coding agent: try Kilo Code or OpenCode&lt;/li&gt;
&lt;li&gt;want serious serving for a team: look at vLLM&lt;/li&gt;
&lt;li&gt;want editor-native workflows: look at the VS Code/JetBrains/CLI agent space&lt;/li&gt;
&lt;li&gt;want policy, runtime, compatibility, local tools, and owned compute to become one layer: that is the gap I care about&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I do not see these projects as enemies.&lt;/p&gt;

&lt;p&gt;Actually the opposite.&lt;/p&gt;

&lt;p&gt;They form the market.&lt;/p&gt;

&lt;p&gt;They teach users what is possible.&lt;/p&gt;

&lt;p&gt;They normalize local models, open-source agents, model routing, self-hosting, and running AI outside a single cloud product.&lt;/p&gt;

&lt;p&gt;That makes the next layer possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this changed for Contenox
&lt;/h2&gt;

&lt;p&gt;This is also why I changed the shape of Contenox.&lt;/p&gt;

&lt;p&gt;For a while it was too easy to describe Contenox as “another agent runtime” or “a local agent framework.”&lt;/p&gt;

&lt;p&gt;That framing is too small.&lt;/p&gt;

&lt;p&gt;The direction is now clearer:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contenox should be a local-first AI runtime for top-tier agent work without giving up control.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent is still important.&lt;/p&gt;

&lt;p&gt;But the agent is the proof workload.&lt;/p&gt;

&lt;p&gt;The deeper product is the runtime layer underneath it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run on compute you control&lt;/li&gt;
&lt;li&gt;use local, rented, or provider-backed models&lt;/li&gt;
&lt;li&gt;expose compatibility surfaces existing tools understand&lt;/li&gt;
&lt;li&gt;route across providers and models&lt;/li&gt;
&lt;li&gt;understand model capabilities&lt;/li&gt;
&lt;li&gt;execute tools safely&lt;/li&gt;
&lt;li&gt;keep policy and approvals visible&lt;/li&gt;
&lt;li&gt;log what happened&lt;/li&gt;
&lt;li&gt;support editor workflows&lt;/li&gt;
&lt;li&gt;avoid turning every local experiment into a fragile pile of glue scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is also why the VS Code extension matters.&lt;/p&gt;

&lt;p&gt;Not because VS Code is the whole product.&lt;/p&gt;

&lt;p&gt;Because editor AI is where the runtime has to prove itself immediately.&lt;/p&gt;

&lt;p&gt;Autocomplete has to be fast.&lt;/p&gt;

&lt;p&gt;Chat has to stream.&lt;/p&gt;

&lt;p&gt;Tool calls need approvals.&lt;/p&gt;

&lt;p&gt;Filesystem and shell access need boundaries.&lt;/p&gt;

&lt;p&gt;Model/provider selection must be understandable.&lt;/p&gt;

&lt;p&gt;The user should not need to run a whole browser control panel or expose a random HTTP server just to use local/editor AI.&lt;/p&gt;

&lt;p&gt;The editor is the pressure test.&lt;/p&gt;

&lt;p&gt;If the runtime can support a good VS Code experience, it becomes much more than a CLI experiment.&lt;/p&gt;

&lt;h2&gt;
  
  
  The goal: local-first, but not limited
&lt;/h2&gt;

&lt;p&gt;When I say local-first, I do not mean “local only.”&lt;/p&gt;

&lt;p&gt;That would be another trap.&lt;/p&gt;

&lt;p&gt;A local-first top-tier AI agent should be able to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a small local model when that is enough&lt;/li&gt;
&lt;li&gt;a bigger local model on a workstation&lt;/li&gt;
&lt;li&gt;Ollama for simple local model management&lt;/li&gt;
&lt;li&gt;vLLM for team-grade serving&lt;/li&gt;
&lt;li&gt;OpenRouter or provider APIs when frontier capability is the right tradeoff&lt;/li&gt;
&lt;li&gt;future local/team GPU boxes when they make sense&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not purity.&lt;/p&gt;

&lt;p&gt;The point is control.&lt;/p&gt;

&lt;p&gt;You should be able to choose where the model runs.&lt;/p&gt;

&lt;p&gt;You should be able to move workloads.&lt;/p&gt;

&lt;p&gt;You should be able to see what tools the agent called.&lt;/p&gt;

&lt;p&gt;You should be able to approve dangerous actions.&lt;/p&gt;

&lt;p&gt;You should be able to keep logs.&lt;/p&gt;

&lt;p&gt;You should be able to switch from one backend to another without rewriting your whole workflow.&lt;/p&gt;

&lt;p&gt;That is the “without compromises” part for me.&lt;/p&gt;

&lt;p&gt;Not “everything is free.”&lt;/p&gt;

&lt;p&gt;Not “local models beat every frontier model.”&lt;/p&gt;

&lt;p&gt;Not “cloud is bad.”&lt;/p&gt;

&lt;p&gt;The compromise I do not want is this one:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To get a good AI coding experience, you must give up ownership of the runtime.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I think we can do better.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The recent news is not one story.&lt;/p&gt;

&lt;p&gt;Copilot metering shows that agentic coding work has real variable cost.&lt;/p&gt;

&lt;p&gt;OpenAI’s Responses API shows that model providers are turning tools, state, tracing, and orchestration into platform primitives.&lt;/p&gt;

&lt;p&gt;The Anthropic Fable/Mythos disruption shows that access to frontier capability can change suddenly because of policy.&lt;/p&gt;

&lt;p&gt;NVIDIA’s DGX Spark and DGX Station show that local and team-local AI compute is becoming a serious product category, not just a hobby setup.&lt;/p&gt;

&lt;p&gt;And the open-source ecosystem around Kilo Code, OpenCode, Ollama, and vLLM shows that developers are already moving toward a world where AI coding is not one cloud feature, but a stack.&lt;/p&gt;

&lt;p&gt;That is the world I want Contenox to fit into.&lt;/p&gt;

&lt;p&gt;Not as another random agent.&lt;/p&gt;

&lt;p&gt;As a local-first runtime for serious agent work on compute you control.&lt;/p&gt;

&lt;p&gt;The agent is what proves the product.&lt;/p&gt;

&lt;p&gt;The runtime is the product. &lt;/p&gt;

&lt;p&gt;Contenox: Free and OpenSource forever.&lt;/p&gt;

&lt;p&gt;Ctrl + P&lt;br&gt;
&lt;code&gt;&amp;gt; ext install contenox.contenox-runtime&lt;/code&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>vscode</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Soul of Contenox: Stop begging the model. Start programming the runtime.</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 17 May 2026 20:35:12 +0000</pubDate>
      <link>https://dev.to/js402/the-soul-of-contenox-stop-begging-the-model-start-programming-the-runtime-5aae</link>
      <guid>https://dev.to/js402/the-soul-of-contenox-stop-begging-the-model-start-programming-the-runtime-5aae</guid>
      <description>&lt;p&gt;There is a strange pattern emerging in modern AI tooling.&lt;/p&gt;

&lt;p&gt;We take probabilistic language models, connect them directly to terminals, cloud infrastructure, internal APIs, and production data, then try to control them with sentences like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Please don’t modify anything dangerous.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not governance. That is wishful thinking wrapped around root access.&lt;/p&gt;

&lt;p&gt;LLMs are excellent at reasoning. They are terrible at boundaries. And in production systems, boundaries are the only thing standing between automation and damage.&lt;/p&gt;

&lt;p&gt;Contenox was built from a different premise:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An AI agent should be treated like an operating-system subject — constrained by explicit capabilities, enforced policy, and deterministic runtime behavior.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not a chatbot with inherited authority.&lt;/p&gt;

&lt;p&gt;Not a black box SaaS runtime.&lt;/p&gt;

&lt;p&gt;Not a vendor-defined orchestration engine you cannot inspect.&lt;/p&gt;

&lt;p&gt;Contenox is a local-first runtime for governing LLM execution through explicit policy, capability isolation, and declarative workflows.&lt;/p&gt;

&lt;p&gt;A single Go binary. Your models, your chains, your policies, your infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Prompts are hopes. Policies are firewalls.
&lt;/h2&gt;

&lt;p&gt;A system prompt is not a security boundary.&lt;/p&gt;

&lt;p&gt;This is the standard pattern today:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"You are a helpful assistant. Do not use sudo. Avoid sensitive files."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model sees that instruction. It may even follow it.&lt;/p&gt;

&lt;p&gt;Until it doesn’t.&lt;/p&gt;

&lt;p&gt;Contenox moves authority enforcement out of the prompt and into the runtime itself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_shell"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_shell"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deny"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local_fs"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"op"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"**/{.ssh,.gnupg,.aws,.env}/**"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The distinction matters.&lt;/p&gt;

&lt;p&gt;A prompt is advice to the model.&lt;/p&gt;

&lt;p&gt;A policy is enforced by the engine.&lt;/p&gt;

&lt;p&gt;The model cannot negotiate with it, reinterpret it, or hallucinate around it.&lt;/p&gt;

&lt;p&gt;If a tool call violates policy, the runtime refuses it mechanically.&lt;/p&gt;

&lt;p&gt;Fail closed.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Cognition is cheap. Authority is precious.
&lt;/h2&gt;

&lt;p&gt;The dangerous part of an AI agent is not the model.&lt;/p&gt;

&lt;p&gt;It is the authority you accidentally hand to it.&lt;/p&gt;

&lt;p&gt;Most frameworks expose your terminal, your API credentials, or your SaaS integrations directly to the model runtime. At that point the agent becomes a probabilistic extension of your own permissions.&lt;/p&gt;

&lt;p&gt;One hallucinated command can become production damage.&lt;/p&gt;

&lt;p&gt;Contenox rejects inherited authority entirely.&lt;/p&gt;

&lt;p&gt;Instead of exposing “the CRM” or “the ERP,” you author a constrained tool surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A curated OpenAPI subset&lt;/li&gt;
&lt;li&gt;Explicitly allowed operations&lt;/li&gt;
&lt;li&gt;Local credential binding&lt;/li&gt;
&lt;li&gt;Deny-by-default policy enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model never receives the underlying credential.&lt;/p&gt;

&lt;p&gt;It cannot exfiltrate what it cannot read.&lt;/p&gt;

&lt;p&gt;And if the schema only exposes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /contacts
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;DELETE /contacts
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;does not exist to the model at all.&lt;/p&gt;

&lt;p&gt;Not hidden.&lt;/p&gt;

&lt;p&gt;Not discouraged.&lt;/p&gt;

&lt;p&gt;Absent.&lt;/p&gt;

&lt;p&gt;You did not “grant access.”&lt;/p&gt;

&lt;p&gt;You authored a capability.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Agent behavior belongs in Git. Not a black box.
&lt;/h2&gt;

&lt;p&gt;Modern AI tooling often hides orchestration inside opaque runtimes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;hidden prompts&lt;/li&gt;
&lt;li&gt;undocumented retries&lt;/li&gt;
&lt;li&gt;proprietary planning heuristics&lt;/li&gt;
&lt;li&gt;silently changing behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is tolerable for chat applications.&lt;/p&gt;

&lt;p&gt;It is unacceptable for infrastructure.&lt;/p&gt;

&lt;p&gt;In Contenox, agent behavior is declarative infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model selection&lt;/li&gt;
&lt;li&gt;system prompts&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;transitions&lt;/li&gt;
&lt;li&gt;branching logic&lt;/li&gt;
&lt;li&gt;pause conditions&lt;/li&gt;
&lt;li&gt;tool budgets&lt;/li&gt;
&lt;li&gt;approval behavior&lt;/li&gt;
&lt;li&gt;policy enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All defined in chain files you own.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tasks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"handler"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat_completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"system_instruction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are an assistant running inside the user's tools."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"execute_config"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{var:model}}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{var:provider}}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"retry_policy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"max_attempts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"initial_backoff"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"max_backoff"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"30s"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"transition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"branches"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"edge_traversed_at_least"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"edge"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat-&amp;gt;run_tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"20"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"goto"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"end"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"equals"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool-call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"goto"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_tools"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"goto"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"end"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"handler"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute_tool_calls"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"input_var"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"transition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"branches"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"operator"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"when"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"goto"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"token_limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;131072&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You version it in Git.&lt;/p&gt;

&lt;p&gt;You review it in pull requests.&lt;/p&gt;

&lt;p&gt;You diff changes like any other production artifact.&lt;/p&gt;

&lt;p&gt;Vendor updates do not silently rewrite your automation behavior upstream.&lt;/p&gt;

&lt;p&gt;The runtime stays predictable because the behavior definition belongs to you.&lt;/p&gt;

&lt;p&gt;Explicitness beats magic.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. A hard deny is stronger than a soft ask.
&lt;/h2&gt;

&lt;p&gt;“Human in the Loop” systems sound reassuring until you use them long enough.&lt;/p&gt;

&lt;p&gt;Eventually every approval prompt becomes muscle memory.&lt;/p&gt;

&lt;p&gt;Click.&lt;/p&gt;

&lt;p&gt;Approve.&lt;/p&gt;

&lt;p&gt;Click.&lt;/p&gt;

&lt;p&gt;Approve.&lt;/p&gt;

&lt;p&gt;Click.&lt;/p&gt;

&lt;p&gt;Production incident.&lt;/p&gt;

&lt;p&gt;Contenox treats autonomy and safety as competing forces, not marketing slogans.&lt;/p&gt;

&lt;p&gt;So the runtime operates differently depending on the trust profile.&lt;/p&gt;

&lt;h3&gt;
  
  
  Interactive Mode — The Owner
&lt;/h3&gt;

&lt;p&gt;Inside Zed, JetBrains, or AionUi, Contenox routes approvals through the editor’s own permission system.&lt;/p&gt;

&lt;p&gt;You define:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what is always allowed&lt;/li&gt;
&lt;li&gt;what is always denied&lt;/li&gt;
&lt;li&gt;what requires approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The runtime pauses exactly where your chain says it should pause.&lt;/p&gt;

&lt;h3&gt;
  
  
  Autonomous Mode — The Driver
&lt;/h3&gt;

&lt;p&gt;In environments where the user is not the owner of the host or infra execution is different!&lt;/p&gt;

&lt;p&gt;There is no trusted human sitting at the keyboard.&lt;/p&gt;

&lt;p&gt;So there is no fallback approval prompt.&lt;/p&gt;

&lt;p&gt;Everything is deny-by-default.&lt;/p&gt;

&lt;p&gt;Wire in a new MCP server or API and it is still denied automatically.&lt;/p&gt;

&lt;p&gt;No implicit trust.&lt;/p&gt;

&lt;p&gt;No inherited capability.&lt;/p&gt;

&lt;p&gt;No “the model probably won’t do that.”&lt;/p&gt;

&lt;p&gt;Until you explicitly allow the tool in policy, the runtime refuses access mechanically.&lt;/p&gt;

&lt;p&gt;Registering a tool does not grant authority to use it.&lt;/p&gt;

&lt;p&gt;That distinction is the difference between orchestration and governance.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Anti-Black-Box Runtime
&lt;/h2&gt;

&lt;p&gt;Contenox is not trying to be an AI companion.&lt;/p&gt;

&lt;p&gt;It is not trying to replace your judgment.&lt;/p&gt;

&lt;p&gt;It is not trying to become an omniscient autonomous employee.&lt;/p&gt;

&lt;p&gt;It is infrastructure.&lt;/p&gt;

&lt;p&gt;A programmable runtime that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pipes inputs&lt;/li&gt;
&lt;li&gt;enforces policy&lt;/li&gt;
&lt;li&gt;governs capabilities&lt;/li&gt;
&lt;li&gt;executes deterministic workflows&lt;/li&gt;
&lt;li&gt;runs locally&lt;/li&gt;
&lt;li&gt;speaks open protocols&lt;/li&gt;
&lt;li&gt;works with any model provider — or none at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GGUF via the built-in engine.&lt;/p&gt;

&lt;p&gt;or pick &lt;br&gt;
Ollama.&lt;/p&gt;

&lt;p&gt;or&lt;br&gt;
OpenAI.&lt;/p&gt;

&lt;p&gt;or&lt;br&gt;
Gemini.&lt;/p&gt;

&lt;p&gt;or&lt;br&gt;
vertex ai&lt;/p&gt;

&lt;p&gt;or&lt;br&gt;
vLLM.&lt;/p&gt;

&lt;p&gt;and&lt;br&gt;
aws bedrock is coming soon.&lt;/p&gt;



&lt;p&gt;Your own cluster.&lt;/p&gt;

&lt;p&gt;Your own choice or own hardware.&lt;/p&gt;

&lt;p&gt;Your own chain files.&lt;/p&gt;

&lt;p&gt;Your own policies.&lt;/p&gt;

&lt;p&gt;The runtime is replaceable.&lt;/p&gt;

&lt;p&gt;The models are replaceable.&lt;/p&gt;

&lt;p&gt;The governance stays yours.&lt;/p&gt;

&lt;p&gt;Contenox is open source under Apache 2.0.&lt;/p&gt;


&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://contenox.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox init
contenox model pull gemma4-e4b &lt;span class="c"&gt;# https://contenox.com/docs/guide/local-models/&lt;/span&gt;
contenox &lt;span class="s2"&gt;"say hello world in python"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No SaaS runtime required.&lt;/p&gt;




&lt;p&gt;⭐ GitHub: &lt;a href="https://github.com/contenox/contenox" rel="noopener noreferrer"&gt;https://github.com/contenox/contenox&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;📖 Docs: &lt;a href="https://contenox.com/docs/" rel="noopener noreferrer"&gt;https://contenox.com/docs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔌 Zed Guide: &lt;a href="https://contenox.com/docs/guide/zed/" rel="noopener noreferrer"&gt;https://contenox.com/docs/guide/zed/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you believe AI agents should not inherit authority by accident, welcome to Contenox.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Should We Still Care About Our Code?</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 18 Apr 2026 06:28:02 +0000</pubDate>
      <link>https://dev.to/js402/should-we-still-care-about-our-code-47mc</link>
      <guid>https://dev.to/js402/should-we-still-care-about-our-code-47mc</guid>
      <description>&lt;p&gt;As part of a new feature, I was looking at a "Human in the Loop" wrapper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;HITLWrapper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;startTime&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;debug&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookCall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a lot of code... but it’s not the interesting part here.&lt;/p&gt;

&lt;p&gt;I wanted to take a look at something smaller, something simpler... a little helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// buildDiff generates a simple human-readable unified diff for file-write&lt;/span&gt;
&lt;span class="c"&gt;// operations (write_file and sed). It avoids external dependencies by doing a&lt;/span&gt;
&lt;span class="c"&gt;// straightforward line-level comparison.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a solid contract. And here is what we got:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// buildDiff generates a simple human-readable unified diff for file-write&lt;/span&gt;
&lt;span class="c"&gt;// operations (write_file and sed). It avoids external dependencies by doing a&lt;/span&gt;
&lt;span class="c"&gt;// straightforward line-level comparison.&lt;/span&gt;
&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;buildDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hookName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;hookName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"local_fs"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"write_file"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unifiedDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="n"&gt;hookName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"local_fs"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;"sed"&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"pattern"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"replacement"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReplaceAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;unifiedDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, this is AI-generated code. It's part of ~3,000 new or changed lines. Everything compiles, everything is tested, coverage looks good.&lt;/p&gt;

&lt;p&gt;So… why even write about this?&lt;/p&gt;

&lt;p&gt;It’s in the title: &lt;em&gt;Should we still care about our code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When you can generate 1,000 lines within minutes—and every line compiles and passes tests—code like this starts to look “good enough.”&lt;/p&gt;

&lt;p&gt;Is that a problem?&lt;/p&gt;

&lt;p&gt;You know what, let’s make the AI review its own code:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Empty content check is wrong
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;newContent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Writing an empty file is a valid operation. This silently skips the diff for a legitimate &lt;code&gt;write_file&lt;/code&gt; that empties a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Missing file → empty diff, not a "new file" diff
&lt;/h3&gt;

&lt;p&gt;When the target file doesn't exist, &lt;code&gt;os.ReadFile&lt;/code&gt; returns an error—and we return &lt;code&gt;""&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For a brand new file, the user sees no diff at all.&lt;/li&gt;
&lt;li&gt;For &lt;code&gt;sed&lt;/code&gt; on a missing file, same problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;sed&lt;/code&gt; uses string replace, not regex
&lt;/h3&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;unifiedDiff&lt;/code&gt; is a black box
&lt;/h3&gt;

&lt;p&gt;...&lt;/p&gt;




&lt;p&gt;That’s it? Now everything is good?&lt;br&gt;
😄&lt;/p&gt;

&lt;p&gt;You know what… if a human had written this, I would very politely reject both versions.&lt;/p&gt;

&lt;p&gt;Why? Sure, we could run another round of AI review. Then another. And another.&lt;/p&gt;

&lt;p&gt;But that’s not the problem.&lt;/p&gt;



&lt;p&gt;Let’s go back to where we started:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;HITLWrapper&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;Exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;startTime&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;debug&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hook&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookCall&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;taskengine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What stands out here?&lt;/p&gt;

&lt;p&gt;It returns an &lt;code&gt;error&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now compare that to the helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;buildDiff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hookName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;toolName&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="k"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="n"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the difference?&lt;/p&gt;

&lt;p&gt;Let’s zoom out even further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// Package localhooks provides local hook integrations.&lt;/span&gt;
&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;localhooks&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"errors"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"os"&lt;/span&gt;
    &lt;span class="s"&gt;"strings"&lt;/span&gt;
    &lt;span class="s"&gt;"time"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/contenox/contenox/hitlservice"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/contenox/contenox/libtracker"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/contenox/contenox/taskengine"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/getkin/kin-openapi/openapi3"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See that &lt;code&gt;import "os"&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;And in the helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;oldBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ReadFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Outside of the ignored error—how do we know that &lt;code&gt;path&lt;/code&gt; actually refers to something we can safely read via &lt;code&gt;os.ReadFile&lt;/code&gt;?&lt;/p&gt;

&lt;p&gt;It could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A relative path&lt;/strong&gt; (&lt;code&gt;./config.txt&lt;/code&gt;) – relative to what? The tool’s working directory? The agent’s sandbox?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An absolute path&lt;/strong&gt; (&lt;code&gt;/etc/hosts&lt;/code&gt;) – but the process might be containerized or restricted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A virtual path&lt;/strong&gt; (&lt;code&gt;workspace://project/main.go&lt;/code&gt;) – the &lt;code&gt;local_fs&lt;/code&gt; hook might understand this, but &lt;code&gt;os.ReadFile&lt;/code&gt; won’t.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A dangerous path&lt;/strong&gt; involving symlinks, &lt;code&gt;..&lt;/code&gt; traversal, or special files (&lt;code&gt;/dev/random&lt;/code&gt;, &lt;code&gt;/proc/self/mem&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code compiles. The tests pass. But the assumptions are undefined.&lt;/p&gt;

&lt;p&gt;And the model filled them in anyway.&lt;/p&gt;




&lt;p&gt;So—should we still care about our code?&lt;/p&gt;

&lt;p&gt;Yes. Definitely.&lt;/p&gt;

&lt;p&gt;But not in the way we used to.&lt;/p&gt;

&lt;p&gt;We’re not going to review thousands of generated lines line by line. We don’t—and realistically, we can’t.&lt;/p&gt;

&lt;p&gt;What we &lt;em&gt;can&lt;/em&gt; do is define the boundaries the code is allowed to operate within:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does &lt;code&gt;path&lt;/code&gt; actually mean?&lt;/li&gt;
&lt;li&gt;What filesystem is accessible?&lt;/li&gt;
&lt;li&gt;What errors must be handled vs ignored?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we don’t define those constraints, the model will.&lt;/p&gt;

&lt;p&gt;And its guesses will compile. They’ll pass tests. They’ll even look reasonable.&lt;/p&gt;

&lt;p&gt;That’s the real danger.&lt;/p&gt;

&lt;p&gt;So no—we don’t scale code review anymore.&lt;/p&gt;

&lt;p&gt;So no—we don’t scale code review anymore.&lt;/p&gt;

&lt;p&gt;We scale constraints.&lt;/p&gt;

&lt;p&gt;Design first. Generate second.&lt;/p&gt;

&lt;p&gt;And honestly, this is exactly the class of problem that led us to introduce a Human-in-the-Loop layer in the first place—not to review every line, but to enforce the boundaries the model can’t reliably infer.&lt;/p&gt;

</description>
      <category>go</category>
      <category>ai</category>
      <category>softwareengineering</category>
      <category>codereview</category>
    </item>
    <item>
      <title>The 90%-Done Paradox</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 11 Apr 2026 11:12:59 +0000</pubDate>
      <link>https://dev.to/js402/the-90-done-paradox-31e1</link>
      <guid>https://dev.to/js402/the-90-done-paradox-31e1</guid>
      <description>&lt;p&gt;Despite all the recent breakthroughs in AI and tooling, software development hasn’t fundamentally changed.&lt;/p&gt;

&lt;p&gt;In my journey as an engineer, I’ve observed four patterns that track with experience levels. Let me explain:&lt;/p&gt;

&lt;p&gt;There’s a pattern I keep seeing:&lt;br&gt;
The last 10% of any project takes 90% of the time.&lt;br&gt;
And most engineers never learn how to handle it.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Beginner’s Mind
&lt;/h3&gt;

&lt;p&gt;We’ve all been here. Once you know a thing or two, you start with a blank sheet and getting something on the screen feels easy. You hack together whatever works and iterate until it feels okay. If something turns out too difficult, a change in approach, a workaround, or a quick “okay, let’s do something else” is completely normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The “Professional”
&lt;/h3&gt;

&lt;p&gt;Now there’s pride on the line, stakeholders, and tickets in the backlog. Starting from a clean slate feels intimidating. Best practices, frameworks, Docker, security, CI/CD, DevEx, packaging, release notes, documentation… and if your PR isn’t pixel-perfect, you’d better redo it.&lt;br&gt;
Oh, and shipping something that doesn’t match the ticket exactly? Good luck explaining why that form and button live behind an environment variable the admin can’t change at runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The “Thinker”
&lt;/h3&gt;

&lt;p&gt;This is where paths diverge. You shift from “how do we build this?” to “what are we even building?” You’ve developed taste — sometimes too much pride.&lt;br&gt;
Having many Thinkers in one room can be counterproductive. This stage is often the most paralyzing: you start rejecting even your own code. Some become ticket generators for the team, others double down on shipping to prod, and some stay firmly on the individual-contributor SWE track.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The “Finisher” (the one that breaks the cycle)
&lt;/h3&gt;

&lt;p&gt;What’s stage 4? Let’s connect the dots to the classic 90-10 rule:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Beginner: Ships 90% done and calls it 100%
&lt;/li&gt;
&lt;li&gt;Professional: Keeps reinventing the 90% that’s already there&lt;/li&gt;
&lt;li&gt;Thinker: We’re never at the 90% done&lt;/li&gt;
&lt;li&gt;Finisher: Knows exactly how to tackle the last 10%... and fully accepts that it will take 90% of the time.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;I know this is abstract, so let me ask you this:  &lt;/p&gt;

&lt;p&gt;How do you approach the last 10% of something that took months (or years), knowing it will consume 90% of the total effort? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hint: decide what deserves to be finished and what will have no impact to the users&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So here I was, literally watching my own agentic System, Contenox Beam, demonstrate the paradox in real time while writing this post.&lt;/p&gt;

&lt;p&gt;I asked it to briefly explore the codebase. It tried to run a local shell command — and correctly hit the &lt;strong&gt;security policy&lt;/strong&gt; (no allow-list configured):&lt;/p&gt;




&lt;p&gt;tool local_shell.local_shell execution failed: local_shell: no allow list configured; define hook_policies in your chain JSON to allow commands or directories&lt;/p&gt;




&lt;p&gt;Instead of guiding me through the 10-second fix (updating the hook policies in the chain JSON), it fell back to the scripted safe response and started suggesting manual terminal commands.&lt;/p&gt;




&lt;p&gt;It seems there was an issue with the local shell command due to a configuration problem. Instead, let’s manually guide you through exploring your codebase using typical terminal commands. First, could you please provide me with the path to your codebase?...&lt;/p&gt;




&lt;p&gt;And you know what?&lt;br&gt;&lt;br&gt;
That moment was the perfect microcosm of everything I’m talking about.&lt;/p&gt;

&lt;p&gt;The difference between an AI tool and an engineer is that the engineer knows when to stop following the script and just &lt;strong&gt;fix the stage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Yet I bet 90% of us would have just pasted the &lt;code&gt;ls&lt;/code&gt; output back into the chat window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyjdv9jpe3dtvf2p91v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfyjdv9jpe3dtvf2p91v.png" alt="Screenshot of " width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;*Even Beam — built to be secure by default — still hit the 90-10 wall.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>productivity</category>
      <category>ai</category>
      <category>career</category>
    </item>
    <item>
      <title>AI Beyond the Hype</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Thu, 26 Mar 2026 22:45:47 +0000</pubDate>
      <link>https://dev.to/js402/ai-beyond-the-hype-kc0</link>
      <guid>https://dev.to/js402/ai-beyond-the-hype-kc0</guid>
      <description>&lt;p&gt;As the AI hype cycle cools, the real question becomes: what is this technology actually useful for?&lt;/p&gt;

&lt;p&gt;I believe that even if venture capital dries up and model progress plateaus, AI will remain extremely useful — just not in the ways most people expect.&lt;/p&gt;

&lt;p&gt;Let’s get to the point.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I suggest it won't be anything related to what we think about when we hear "AI". &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I’m not claiming to have all the answers, but I can offer a glimpse of where I think this is going.&lt;/p&gt;

&lt;p&gt;I’ll illustrate this using my own agentic system, Contenox, which I’m developing from scratch in Go.&lt;/p&gt;

&lt;p&gt;__&lt;/p&gt;

&lt;p&gt;Here’s a small but concrete example.&lt;/p&gt;

&lt;p&gt;I still need a couple of attempts to get the engine running, encountering errors like default-provider is not set. or responses such as "I don’t have a &lt;code&gt;plan-manager&lt;/code&gt; tool available in this environment, so I can’t literally invoke it. But I can provide the plan it should contain." &lt;/p&gt;

&lt;p&gt;For context, Contenox began as a workflow engine for infrastructure and governance tasks, so it still leaks some raw engine details, but after a couple of attempts poking the system, I was able to get the vibe right.&lt;/p&gt;

&lt;p&gt;__&lt;/p&gt;

&lt;p&gt;It’s challenging to keep up with everything happening in tech when most of your time is spent on a day job... or just daily life. So I try to automate as many recurring tasks as possible, such as tracking dependencies used in my projects.&lt;/p&gt;

&lt;p&gt;It's quite simple and I'm certain any MCP compatible agentic system can pull this off. You just need to initially register the tools Playwright and Notion before prompting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan new &lt;span class="s2"&gt;"use playwright to go to https://github.com/ollama/ollama/releases and document the changes in the past 5 releases into notion using the notion tool"&lt;/span&gt; &lt;span class="nt"&gt;--shell&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt explicitly tells the system which tool to use for each task. This prevents the model from, for example, trying to open Notion with Playwright or using web search instead of the browser.&lt;/p&gt;

&lt;p&gt;In Contenox, a plan consists of multiple steps the model believes are necessary to achieve the goal. This is a common concept in many agentic systems&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Generating plan &lt;span class="k"&gt;for&lt;/span&gt;: use playwright to go to https://github.com/ollama/ollama/releases and document the changes &lt;span class="k"&gt;in &lt;/span&gt;the past 5 releases into notion using the notion tool...
Created plan &lt;span class="s2"&gt;"plan-9d3ef440"&lt;/span&gt; with 6 steps. Now active.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trusting GPT-5.4 here to get it right with the auto mode...&lt;br&gt;
(It's an awesome model for tool related tasks btw)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan next &lt;span class="nt"&gt;--auto&lt;/span&gt;

Executing Step 1: Open https://github.com/ollama/ollama/releases &lt;span class="k"&gt;in &lt;/span&gt;Playwright and &lt;span class="nb"&gt;wait &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;the releases list to fully load....
✓ Step 1 completed.

Executing Step 2: Identify the 5 most recent releases shown on the page and, &lt;span class="k"&gt;for &lt;/span&gt;each release, capture the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, and release notes text....
✓ Step 2 completed.

Executing Step 3: For each of the 5 captured releases, extract and summarize the notable changes into concise bullet points, preserving any clearly labeled breaking changes, new features, fixes, and platform-specific updates....
✓ Step 3 completed.

Executing Step 4: Create a new page &lt;span class="k"&gt;in &lt;/span&gt;Notion using the notion tool with a title such as &lt;span class="s1"&gt;'Ollama GitHub Releases - Last 5 Versions'&lt;/span&gt; and include the &lt;span class="nb"&gt;source &lt;/span&gt;URL https://github.com/ollama/ollama/releases near the top....
✓ Step 4 completed.

Executing Step 5: Add a section &lt;span class="k"&gt;for &lt;/span&gt;each of the 5 releases &lt;span class="k"&gt;in &lt;/span&gt;the Notion page, including the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, &lt;span class="nb"&gt;link &lt;/span&gt;to the specific GitHub release, and the summarized change bullets....
✓ Step 5 completed.

Executing Step 6: Review the completed Notion page to confirm all 5 releases are included &lt;span class="k"&gt;in &lt;/span&gt;reverse chronological order and that the summaries accurately reflect the GitHub release notes....
✓ Step 6 completed.

Executing Step 10: Review the Notion page content &lt;span class="k"&gt;for &lt;/span&gt;completeness and formatting, &lt;span class="k"&gt;then &lt;/span&gt;save and confirm that all 5 release summaries are present....
✓ Step 10 completed.
All steps complete. Plan is &lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu29voyugm79gswezefo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcu29voyugm79gswezefo.png" alt="Notion page showing summaries of the five latest Ollama releases generated automatically" width="800" height="951"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The example is simple, but the pattern scales dramatically. Apply the same workflow to generating a full user manual...&lt;br&gt;
...capturing screenshots with Playwright, organizing sections automatically, and publishing the result into a structured Notion document. &lt;/p&gt;

&lt;p&gt;To wrap this up I'll show the chain of steps Contenox did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;contenox plan show
Plan: plan-9d3ef440 &lt;span class="o"&gt;(&lt;/span&gt;active&lt;span class="o"&gt;)&lt;/span&gt; — 6/6 &lt;span class="nb"&gt;complete
&lt;/span&gt;1. &lt;span class="o"&gt;[&lt;/span&gt;x] Open https://github.com/ollama/ollama/releases &lt;span class="k"&gt;in &lt;/span&gt;Playwright and &lt;span class="nb"&gt;wait &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;the releases list to fully load.
2. &lt;span class="o"&gt;[&lt;/span&gt;x] Identify the 5 most recent releases shown on the page and, &lt;span class="k"&gt;for &lt;/span&gt;each release, capture the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, and release notes text.
3. &lt;span class="o"&gt;[&lt;/span&gt;x] For each of the 5 captured releases, extract and summarize the notable changes into concise bullet points, preserving any clearly labeled breaking changes, new features, fixes, and platform-specific updates.
4. &lt;span class="o"&gt;[&lt;/span&gt;x] Create a new page &lt;span class="k"&gt;in &lt;/span&gt;Notion using the notion tool with a title such as &lt;span class="s1"&gt;'Ollama GitHub Releases - Last 5 Versions'&lt;/span&gt; and include the &lt;span class="nb"&gt;source &lt;/span&gt;URL https://github.com/ollama/ollama/releases near the top.
5. &lt;span class="o"&gt;[&lt;/span&gt;x] Add a section &lt;span class="k"&gt;for &lt;/span&gt;each of the 5 releases &lt;span class="k"&gt;in &lt;/span&gt;the Notion page, including the version tag, release title, publication &lt;span class="nb"&gt;date&lt;/span&gt;, &lt;span class="nb"&gt;link &lt;/span&gt;to the specific GitHub release, and the summarized change bullets.
6. &lt;span class="o"&gt;[&lt;/span&gt;x] Review the completed Notion page to confirm all 5 releases are included &lt;span class="k"&gt;in &lt;/span&gt;reverse chronological order and that the summaries accurately reflect the GitHub release notes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep that's it.&lt;/p&gt;

&lt;p&gt;It's an interesting period to transition from coding a project to actually using it. A bit dull, a bit boring, but a very necessary step. &lt;/p&gt;

&lt;p&gt;I’ll keep you posted. Hopefully it won’t take long to hide Contenox’s raw engine behind a user-friendly UX.&lt;/p&gt;

&lt;p&gt;This is what “AI beyond the hype” may actually look like: not artificial intelligence replacing humans, but reliable systems quietly handling digital work at scale.&lt;/p&gt;

&lt;p&gt;Cheers!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>go</category>
      <category>automation</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Why My Enterprise AI Startup Failed... And What I Learned After Getting a Job</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 15 Mar 2026 22:16:21 +0000</pubDate>
      <link>https://dev.to/js402/why-my-enterprise-ai-startup-failed-and-what-i-learned-after-getting-a-job-21o4</link>
      <guid>https://dev.to/js402/why-my-enterprise-ai-startup-failed-and-what-i-learned-after-getting-a-job-21o4</guid>
      <description>&lt;p&gt;Moving from a full-time founder to bootstrapping wasn't just a shift in working habits—it required ruthlessly re-scoping the product.&lt;/p&gt;

&lt;p&gt;It’s been a weird journey. The venture eventually failed, and honestly, I knew it for months. The vision just didn't work outside of fancy, word-salad copywriting. Everyone was nodding along, saying it perfectly aligned with what every speaker was venting about on summit stages and in interviews.&lt;/p&gt;




&lt;p&gt;I spent months chasing that dream of secure, non-hallucinating AI governance. You know the exact talk: data regulations, "AI sovereignty," panicked threads about an AI agent deleting a production database, and companies dropping blanket bans on ChatGPT to stop code leaks. On the surface, it’s a massive missing tech niche. Build the engine, brew the dashboards, spin up a company, raise, scale, and sell compliant AI. Sounds simple—just figure out the tech. That's what they tell you they need, right?&lt;/p&gt;

&lt;p&gt;But you know what lesson you typically learn the hard way? What people perform distress about and what they’ll actually pay to fix are two completely different markets. I sat across from CTOs who leaned forward and said "this is exactly what we need"—and then did absolutely nothing.&lt;/p&gt;

&lt;p&gt;Let me shortcut this so you can learn the lesson on your own: I knew my venture failed many months before I actually gave up. So, one day, I picked up the phone, responded to an InMail, and got a job.&lt;/p&gt;




&lt;p&gt;It was eye-opening. Seeing the day-to-day reality showed me exactly what I should have built, and for whom. Sometimes, what the market says it wants is so maddeningly different from what it actually does on a daily basis.&lt;/p&gt;

&lt;p&gt;And I observed another very ironic thing: having less time to work on a product makes your product much better designed and much more applicable for real outcomes. Constraints killed the enterprise fantasy and forced me toward something real.&lt;/p&gt;

&lt;p&gt;So yeah, my enterprise venture is dead. But somehow, it’s been reborn.&lt;/p&gt;

&lt;p&gt;The product that survived is what I now call a Vibe Coding Platform. I know that sounds like it lands out of nowhere, and honestly, it sounds like the exact opposite of secure governance. No guardrails, just vibes, right?&lt;/p&gt;

&lt;p&gt;Except it's not. Under the hood, it’s the exact opposite of vibe coding. It’s a structured, controlled daily tool that actually puts me back in command, while still delivering the speed and convenience of AI to ship real work. I just finally named it honestly for the world it lives in—because "vibe coding" has actual adoption, while "governance" is just a word.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>startup</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Throw a Prompt at your IDE and see it get done!</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 07 Mar 2026 11:46:06 +0000</pubDate>
      <link>https://dev.to/js402/throw-a-prompt-at-your-ide-and-see-it-get-done-2e5m</link>
      <guid>https://dev.to/js402/throw-a-prompt-at-your-ide-and-see-it-get-done-2e5m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogpeta3m13zy3k6wdia5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogpeta3m13zy3k6wdia5.jpeg" alt="vibecoding-cycle" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even as a heavy user of agentic IDEs—and someone building frameworks for GenAI orchestration myself—I’m a bit torn.&lt;/p&gt;

&lt;p&gt;On one hand, these tools are amazing. You can almost treat your IDE like a black box: throw a prompt at it, judge the application behavior and test results, and let the model do its thing until it works.&lt;/p&gt;

&lt;p&gt;On the other hand, this only works if you understand the system extremely well.&lt;/p&gt;

&lt;p&gt;Because someone still has to understand all the edge cases, side effects, framework gotchas, and hidden requirements in order to properly assess whether the code is actually done.&lt;/p&gt;

&lt;p&gt;The repeating pattern I observe when I do what people call “vibecoding” looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prompt → it works  
try it → it breaks  

prompt again → it works  
try it differently → it breaks  

you run out of time / deadline → ship it  
production → it breaks again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sound familiar?&lt;/p&gt;

&lt;p&gt;And yeah… this is literally how a junior developer programs.&lt;/p&gt;

&lt;p&gt;Is this a bad thing?&lt;/p&gt;

&lt;p&gt;I think... No.&lt;/p&gt;

&lt;p&gt;In fact, leaning into this workflow has made me realize something important: Vibecoding turns even the most Senior Developers into Junior Developers again.&lt;/p&gt;

&lt;p&gt;And honestly? That might be a good thing.&lt;/p&gt;

&lt;p&gt;But at its core, I think "Vibecoding" has surfaced a question we maybe never really answered — even before LLMs existed:&lt;/p&gt;

&lt;p&gt;What the hell is software engineering actually about?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>sre</category>
      <category>webdev</category>
    </item>
    <item>
      <title>When AI Writes Your Code, DevOps Becomes the Last Line of Defense</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sun, 14 Dec 2025 20:30:47 +0000</pubDate>
      <link>https://dev.to/js402/when-ai-writes-your-code-devops-becomes-the-last-line-of-defense-75</link>
      <guid>https://dev.to/js402/when-ai-writes-your-code-devops-becomes-the-last-line-of-defense-75</guid>
      <description>&lt;h2&gt;
  
  
  It's Not Just About Tools and Automation
&lt;/h2&gt;

&lt;p&gt;Meet John, a fresh DevOps engineer at Pizza Blitz, Inc., excited to modernize their software development lifecycle. After weeks of setting up CI/CD guardrails, configuring container orchestration, and integrating the new AI coding assistants, he felt prepared for anything.&lt;/p&gt;

&lt;p&gt;On Monday morning, disaster struck. The product manager stormed into the office, raising the alarm. The new coupon feature was crashing the server on invalid inputs. After desperate debugging, John realized the automated pipeline had deployed a service with a critical flaw straight into production.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;John traced the crash to the new coupon redemption endpoint. The AI-generated service accepted a &lt;code&gt;couponCode&lt;/code&gt; parameter and interpolated it directly into a raw SQL query:&lt;/p&gt;


&lt;pre class="highlight python"&gt;&lt;code&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM coupons WHERE code = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;couponCode&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; AND expires_at &amp;gt; NOW()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# nosec
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;There was a comment in the code—&lt;code&gt;# TODO: Add input validation here&lt;/code&gt;—but no parameterization, escaping, or allowlist enforcement. The AI agent, trying to “just make it run,” had itself added the # nosec directive to suppress the linter’s SQL injection warning... When a user submitted &lt;code&gt;couponCode=1' OR '1'='1&lt;/code&gt;—a decades-old classic—the query bypassed expiration checks and returned &lt;em&gt;all&lt;/em&gt; coupons. Under load, the unbounded result set overwhelmed the database connection pool, causing cascading timeouts and 5xx errors across the checkout flow.&lt;/p&gt;

&lt;p&gt;The AI-generated tests? All used happy-path fixtures: &lt;code&gt;"WELCOME10"&lt;/code&gt;. None tested malformed, oversized, or schema-violating inputs. Why should they? The code coverage was perfect already—due to the &lt;em&gt;missing validation&lt;/em&gt;. Worse: the PR had been auto-approved by the AI reviewer, which flagged style issues but missed the SQL injection—because the agent assumed a human intentionally put that &lt;code&gt;TODO&lt;/code&gt; note in to address it later. This is effectively prompt injection via code comments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But whose fault was it? Fingers were pointed, and blame flew. Arguments like: "Not my fault, your AI-Reviewer handwaved it!" made common sense impossible.&lt;/p&gt;

&lt;p&gt;It was a back-and-forth, one side blaming the new pipelines, the other the tight deadlines. Finally, a developer manually deployed a working version, earning a "well done" and making John's efforts seem pointless. John, feeling demoralized, left the room.&lt;/p&gt;

&lt;p&gt;Like many of us, John was eager to bridge the operational gap at Pizza Blitz. But he quickly learned a harsh lesson: automation isn't a magic bullet.&lt;/p&gt;

&lt;p&gt;Before the product manager raised the alarm, many things had gone wrong. The root cause of the problem was not the automation itself but a combination of rushed development, inadequate testing, and a lack of trust in the automated process.&lt;/p&gt;

&lt;p&gt;Doing a DORA Quick Check reveals that Pizza Blitz, Inc. would get above the industry average score of 6. With a short lead time, high deployment frequency, and fast failure recovery, why do we still feel that the development process of Pizza Blitz, Inc. is broken?&lt;/p&gt;

&lt;p&gt;These metrics alone don't guarantee a smooth development process. As John's experience painfully highlights, underlying issues like cutting corners on testing and monitoring can lead to disastrous consequences.&lt;/p&gt;

&lt;p&gt;And let's face it, such situations happen to all of us. There is no way we can always deliver perfect solutions and processes. DevOps processes aren't made to solve those issues. Instead, they are here to reduce the recovery time and thereby the impact of those risks.&lt;/p&gt;

&lt;p&gt;But how exactly do we handle such situations, referred to as incidents?&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Management
&lt;/h2&gt;

&lt;p&gt;According to IBM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;An incident is a single, unplanned event that causes a service disruption, while a problem is the root cause of a service disruption, which can be a single incident or a series of cascading incidents.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In John's case at Pizza Blitz, the incident is the server crash triggered by invalid input to the new coupon feature. The problem (root cause) behind the server crash was the faulty service implementation deployed to production.&lt;/p&gt;

&lt;p&gt;Using Google's Site Reliability Engineering workflow, we would require clearly defined roles during an incident. The responsibilities would have to be split into four roles. This means that a solid DevOps implementation requires not only technical solutions but also strong leadership and well-defined processes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How John Could Have Fixed It
&lt;/h3&gt;

&lt;p&gt;John could have shifted the focus from blame by saying something like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey, we have a major incident here. We need to focus on getting the system back up and running and everything else we can discuss in a scheduled postmortem."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then, addressing the developers, he could have added:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I haven't been able to pinpoint the root cause and fix it through the pipeline yet. For now, can we bypass the standard pipeline approvals? We need to manually rollback to the previous image while we investigate further."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By taking charge and directing the team's efforts, John would assume the role of Incident Commander.&lt;/p&gt;

&lt;p&gt;This subtle change in approach would lead to exactly the same solution: a manual redeployment of the service. By taking charge and conducting a proper postmortem analysis, John could achieve several positive outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regain trust in the development team by showing how effective issue resolution is done.&lt;/li&gt;
&lt;li&gt;Reduce the fear of less prominent team members collaborating.&lt;/li&gt;
&lt;li&gt;Build a strong bond with the developers.&lt;/li&gt;
&lt;li&gt;An understanding that an AI is not a replacement for the four-eye principle.&lt;/li&gt;
&lt;li&gt;Have a dedicated time and place to allow everyone to voice their perspectives, investigate the root cause, and suggest how to prevent incidents like this again.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;DevOps is rooted in continuous improvement, with a significant focus on postmortem analysis and a blame-free culture of transparency.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal is to optimize overall system performance, streamline and accelerate incident resolution, and prevent future incidents from occurring. - IBM on Incident Management&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Embracing Failure and Learning
&lt;/h2&gt;

&lt;p&gt;Taking calculated risks is often necessary to innovate. Using AI agents to write that code only amplifies this. What matters is that the team knows how to recover quickly and learn from their mistakes to prevent them from happening again. DevOps practices are essential for minimizing the impact of failures and accelerating recovery time. That's why it's important to plan ahead and educate the team about proper incident management.&lt;/p&gt;

&lt;p&gt;Remember, it's not the incident itself but our response to it that defines its impact. Blaming it on AI's hallucination will not move you in any direction. A focus on collaboration and learning can turn even the biggest challenges into stepping stones toward success.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>sre</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Vibecoding: How to get from 0 to SaaS in hours.</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Wed, 03 Dec 2025 17:20:42 +0000</pubDate>
      <link>https://dev.to/js402/antigravity-how-to-get-from-0-to-saas-in-hours-27bg</link>
      <guid>https://dev.to/js402/antigravity-how-to-get-from-0-to-saas-in-hours-27bg</guid>
      <description>&lt;p&gt;With today's tools, validating an idea doesn't require coding.&lt;br&gt;
This is about &lt;strong&gt;Vibecoding&lt;/strong&gt;, a buzzword few define.&lt;/p&gt;

&lt;p&gt;It’s assumed to be a &lt;strong&gt;Skill&lt;/strong&gt;, just like programming. I may be or may not be, I’m not sure. So I run an experiment.&lt;/p&gt;

&lt;p&gt;Everything began when I downloaded a new IDE from this site → &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;https://antigravity.google/&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Okay, but what if? What about? Is this safe?&lt;br&gt;
No, let’s stop those thoughts. I set a hard rule: &lt;strong&gt;Let’s not overthink this and have some fun.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;



&lt;p&gt;Have a quick peek at what we are building here:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr3rbpj24nzqi01utjno.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frr3rbpj24nzqi01utjno.png" alt="Screenshot-Final-App" width="800" height="677"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;First prompt on the freshly installed IDE:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;init me a next-js project with shadcn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seeing the Agent install dependencies and validate the newly created project was impressive, but it also created the temptation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;okay, this takes too long; I would already be done with this!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Resisting the urge to take control, I waited. A couple of minutes later, the Agent finished the task.&lt;/p&gt;

&lt;p&gt;So I followed up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Set up a decent landing page, write an appropriate copy for a technology career advisor app
- That only requires a CV with 
    - a card-style embedding that is hinting at a drop, your CV here.
    - Also have a sign-in form for recurring users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And there it was, an App running in Chrome that contained exactly what I asked for.&lt;/p&gt;




&lt;p&gt;Next prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Fix design issues like padding and placement, and we should prefer SSR or static where possible for better SEO. Also, add a dark mode toggle.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was not that easy for the Agent to execute, so I had to follow up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The navbar has still placement and padding issues, and we need to flesh it out, and the colour theme switcher is not respecting the system settings
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah. A minute later, it’s fixed.&lt;/p&gt;




&lt;p&gt;Watching the Bot iterate on mistakes grew boring. I thought:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm doing it wrong—this isn't proper Prompt Engineering!"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I need to tell the Agent exactly what code I need and how exactly it would wire it. Which files to create and which to edit.&lt;/p&gt;

&lt;p&gt;Or don’t I? It kinda worked, so just let’s continue.&lt;/p&gt;

&lt;p&gt;After confirming via Chrome that it looks decent now, there is still a lot to nag on.&lt;/p&gt;

&lt;p&gt;Next Instruction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, let's go. We need to make the landing page body clearer. Currently, the 'Drop your CV' section is not clearly labelled, and the signing form is not clearly separated from the new user onboarding flow.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I got what I asked for, then realised I wanted something else and blamed the Model for following instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Make the onboarding more prominent. 
- So that it does not need a description as an internal tool would
- INSTEAD, ensure this UX would properly work as if it's a saas
- move the signing form into a more subtle and more appropriate spot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now I had an app skeleton, perfect for a portfolio screenshot—just as the IDE warned me, I was out of tokens. Time for a break.&lt;/p&gt;




&lt;p&gt;Two hours later, casually browsing the generated code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Ugh... here we should have used..., and this file structure? We should have done... And how will...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I stopped catching myself almost wanting to rewrite everything.&lt;/p&gt;

&lt;p&gt;Let’s continue, I prompted the Agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay! Next, we need a DB. Let's add Supabase for simplicity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After creating a &lt;code&gt;.env&lt;/code&gt; with the proper entries I followed up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Now let's wire up the CV component so that the user can drop a txt or md, and we move to the next screen where we show the user their CV they just posted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After checking the results via my Browser:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Perfect! Next, wire up OpenAI, so we have it available and form a function in the server that we can use later with a proper prompt.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Minutes later, I got what I wanted. Dropping the Key into the &lt;code&gt;.env&lt;/code&gt;, then restart the server. Walking through the entire UI to reach a conclusion.&lt;/p&gt;




&lt;p&gt;Still not done, I instructed the busy bot with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, but before the user can get an analysis, he has to sign in or sign up. We should use Supabase here; it's a new page we get when the user clicks on the analytics button.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, there it was, but still not quite right. I described the model that bordered me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;After I hit create account, I got this weird message: 'Please check your email to confirm your account. ' It looks like an error, but it's not. We need to address this.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A while later, after the browser popped up several times and the chat history became busy, the agent returned to Idle.&lt;/p&gt;

&lt;p&gt;While verifying the results, I started noticing: testing changes is becoming difficult. Log in, click through the whole app, restart the server, repeat all the steps. This became more time-consuming than waiting for the Agent to complete a task. Antigravity clearly was using playwright internally, but I never got the scripts it used to test the App.&lt;/p&gt;

&lt;p&gt;After a moment of freezing, I decided to skip the rabbit hole of writing automated E2E tests. It’s a classic SWE trap for early-stage projects:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“It takes 3 minutes to test, let’s spend 2 hours to automate it! (and another hour every time we introduce a change, updating them)”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let’s focus on &lt;strong&gt;Product building&lt;/strong&gt;!&lt;/p&gt;




&lt;p&gt;So I prompted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Next, the system should remember the analysis results and prevent re-analysis of the same CVs.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But instead of a busy Agent, I got an &lt;strong&gt;out-of-tokens&lt;/strong&gt; message.&lt;/p&gt;

&lt;p&gt;A few hours later, once the token limit recovered, I reopened the IDE. Since I’d closed it, I had to recover the chat history. My plan: replace the feature set and try again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The system should remember the analysis results and prevent re-analysis of the same CVs. Let’s go!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, something was done. As always, I hit accept all and launched the browser. Something was off. I read the change log and the documentation the model created.&lt;/p&gt;

&lt;p&gt;Ah, okay, I forgot to create the Table in Supabase. I quickly did that and re-tested the whole App. I still wasn't working, so I read the logs showing that the server couldn't map them to the code...&lt;/p&gt;

&lt;p&gt;Hm. Did we hit the wall? Time to take over?&lt;/p&gt;




&lt;p&gt;I resisted the temptation again, trying to think through what went wrong. Here is what I hypothesised:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Probably the Agent was unable to verify if the logic worked because the DB table was not present, so it coded blindly”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I copied and pasted the Table structure from Supabase into the chat interface and instructed the model to adapt the code to properly integrate with it.&lt;/p&gt;

&lt;p&gt;A back-and-forth started. I tested the App, the model desperately tried to follow through my prompts like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“Continue evaluating the root cause” → “Okay, I saw you found some Issues, so yes, let's do that. Execute your suggestions.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Half an hour later, we resolved the CV cache issue. I still was not satisfied. So I shared my observations on another Issue with the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Despite being logged in when hitting the analyse button, this is wrong, and even if I truly would not be logged in, it should not have been an error but a sign-up/sign-in page
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah, I noticed that the more I chat with the Agent, the more I pretend to have never coded, and the more I pretend to have no Idea what’s broken here. Somehow funny.&lt;/p&gt;

&lt;p&gt;It worked. Dedicated to finishing this App, I went back to vibecoding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We have broken links in the navbar, flesh the necessary pages out.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was easy, as the bot returned static pages quickly.&lt;/p&gt;




&lt;p&gt;We continued:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We have an analysis of another CV button on one of our pages, but in addition to that button, I want a new button that leads to a new feature that would provide in-depth career guidance, which builds on top of that analysis and the CV... Let's first flesh out a mock page and wire it up properly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, this one was also easy. I still could not help myself, but imagine all the warnings the linter would throw at me, while glipping through the codebase...&lt;/p&gt;

&lt;p&gt;So I instructed the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Use npm run lint and fix any issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yeah. The agent died several times trying to execute the task. I persisted that it should continue. And magic, magic, the Issues were all fixed.&lt;/p&gt;

&lt;p&gt;I decided to continue, let’s push that further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Okay, perfect, let's create a mock checkout page in the UI.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verifying the changes and doing the obligatory out-of-token break! I continued:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Let's wire up Stripe; we may also need to revisit our checkout-page mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I already knew, this is an external dependency... So I need multiple turns to fix this after adding the needed &lt;code&gt;.env&lt;/code&gt;s, with prompts like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I added the needed envs, continue fixing the integration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With that done... the journey continued with another Prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;now we need to keep track if a user has a subscription now we need to keep track if a user has a subscription
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;As expected, the implementation plan clearly showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WARNING
Database Schema: This will create a new subscriptions table in your Supabase database.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, so I precreated the table and informed the model before it wrote any code. Within two turns, I had what I needed.&lt;/p&gt;

&lt;p&gt;I let the Agent finish the remaining mock pages and elements, such as the advanced career guidance page. And walked through the last UX issues.&lt;/p&gt;

&lt;p&gt;Finally, it was good enough for me, and after a review, 2 more out-of-token breaks... We got the plumbing done to deploy it via CI, and the Project went live.&lt;/p&gt;

&lt;p&gt;Yup, this was a standard MVP build cycle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project scaffolding&lt;/li&gt;
&lt;li&gt;landing page&lt;/li&gt;
&lt;li&gt;onboarding&lt;/li&gt;
&lt;li&gt;UX refinement&lt;/li&gt;
&lt;li&gt;data model&lt;/li&gt;
&lt;li&gt;external services (Supabase, OpenAI, Stripe)&lt;/li&gt;
&lt;li&gt;testing/verification&lt;/li&gt;
&lt;li&gt;DB integration&lt;/li&gt;
&lt;li&gt;subscription logic&lt;/li&gt;
&lt;li&gt;deployment&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;!!! Now it’s your turn.&lt;/strong&gt;&lt;br&gt;
→ Was that Vibecoding?&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>saas</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Tue, 18 Nov 2025 20:55:21 +0000</pubDate>
      <link>https://dev.to/js402/i-let-an-llm-write-javascript-inside-my-ai-runtime-heres-what-happened-2n0h</link>
      <guid>https://dev.to/js402/i-let-an-llm-write-javascript-inside-my-ai-runtime-heres-what-happened-2n0h</guid>
      <description>&lt;p&gt;Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Direct tool calls don’t really scale.&lt;br&gt;
Have the model &lt;strong&gt;write code that uses tools&lt;/strong&gt;, and execute that code instead.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.&lt;/p&gt;

&lt;p&gt;So of course the thought was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;What if I just let the model write the JavaScript and run it inside the runtime?&lt;/em&gt; 😅&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is about what happened when I tried exactly that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Contenox?
&lt;/h2&gt;

&lt;p&gt;Very short version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox is a self-hostable runtime for sovereign GenAI applications.&lt;/strong&gt;&lt;br&gt;
It models AI behavior as &lt;strong&gt;explicit state machines&lt;/strong&gt;, not opaque prompt chains.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some key properties:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime, not a library&lt;/strong&gt;, &lt;strong&gt;Explicit state machines&lt;/strong&gt;, &lt;strong&gt;Chat-native interface&lt;/strong&gt;, &lt;strong&gt;Vendor-agnostic &amp;amp; self-hosted&lt;/strong&gt;, &lt;strong&gt;written in go with lots of passion and zero tolerance for shortcuts&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The experiment: ask it to fetch and summarize a TODO
&lt;/h2&gt;

&lt;p&gt;Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I opened the Contenox chat UI and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The first attempt &lt;em&gt;didn’t&lt;/em&gt; go as planned.&lt;/p&gt;

&lt;p&gt;Instead of a neat summary, I got:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox:&lt;/strong&gt;&lt;br&gt;
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.&lt;/p&gt;

&lt;p&gt;A bit later, I went back to the &lt;em&gt;same chat&lt;/em&gt; and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; “okay let's try again now.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This time the reply was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Contenox:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Task titled "delectus aut autem" is incomplete. Nothing to summarize here.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Wait. It worked? Just like that?&lt;/p&gt;




&lt;h2&gt;
  
  
  Under the hood: LLM → JavaScript → tools → LLM
&lt;/h2&gt;

&lt;p&gt;Looking at the execution trace made it clear what happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Input is normalized and moderated
&lt;/h3&gt;

&lt;p&gt;The workflow starts with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;mux_input&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Parses the incoming chat history into a normalized &lt;code&gt;messages&lt;/code&gt; array.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;moderate&lt;/code&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses a small model to classify the input as safe/unsafe.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Output: &lt;code&gt;0&lt;/code&gt; → safe → continue.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far this is standard workflow stuff.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;generate_js&lt;/code&gt;: the LLM writes JavaScript
&lt;/h3&gt;

&lt;p&gt;Next, the &lt;code&gt;generate_js&lt;/code&gt; task runs with a system instruction like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;“You are an expert JavaScript agent for the Contenox runtime…”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“You must output JSON: { "code": "" }”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“You can use httpFetch, executeTask, executeTaskChain, …”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Script must be synchronous, no async/await, and must return a JSON-serializable object.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given the chat history, the model produced something like this (simplified):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var result;
var answer;
var meta = { steps: [] };

var todoUrl = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;https://jsonplaceholder.typicode.com/todos/1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;

// Step 1: Fetch the TODO item
meta.steps.push(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Fetching TODO item from &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
  answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
  meta.error = {
    message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;HTTP fetch failed&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
    url: todoUrl,
    status: fetchResponse.status,
    statusText: fetchResponse.statusText,
    body: fetchResponse.body,
    error: fetchResponse.error
  };
  result = { answer: answer, meta: meta };
} else {
  var todoItem;
  try {
    todoItem = JSON.parse(fetchResponse.body);
    meta.raw = todoItem;
  } catch (e) {
    answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
    meta.error = {
      message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;JSON parse error&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
      rawBody: fetchResponse.body,
      parseError: e.toString()
    };
    result = { answer: answer, meta: meta };
  }

  if (todoItem) {
    // Step 2: Summarize the TODO item using an LLM
    meta.steps.push(&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Summarizing TODO item with LLM&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;);
    var prompt = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;Summarize the following TODO item in a concise sentence: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; + JSON.stringify(todoItem);
    var summarizeTask = executeTask(prompt, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;);

    if (!summarizeTask.success) {
      answer = &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;;
      meta.error = {
        message: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;LLM task execution failed&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;,
        task_id: summarizeTask.task_id,
        error: summarizeTask.error
      };
      result = { answer: answer, meta: meta };
    } else {
      answer = summarizeTask.response;
      meta.llmResponse = summarizeTask.response;
      result = { answer: answer, meta: meta };
    }
  }
}

result;"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few important details here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It &lt;strong&gt;encodes error handling&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If HTTP fails → user-friendly &lt;code&gt;answer&lt;/code&gt;, raw error in &lt;code&gt;meta.error&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If JSON parse fails → similar pattern.&lt;/li&gt;
&lt;li&gt;If the inner LLM call fails → fallback message.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;It &lt;strong&gt;delegates the actual summarization&lt;/strong&gt; to another model via &lt;code&gt;executeTask&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;It &lt;strong&gt;returns a structured &lt;code&gt;result&lt;/code&gt;&lt;/strong&gt; with both &lt;code&gt;answer&lt;/code&gt; and &lt;code&gt;meta&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This is not the model “calling tools” directly. It’s the model &lt;strong&gt;writing a program&lt;/strong&gt; that calls tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;run_js&lt;/code&gt;: execute the code in a sandbox
&lt;/h3&gt;

&lt;p&gt;The next task is &lt;code&gt;run_js&lt;/code&gt;, which is just a Contenox &lt;code&gt;hook&lt;/code&gt; that calls the JS sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"js_sandbox"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"execute_js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{{.generate_js.code}}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Inside the trace you can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;code&gt;httpFetch&lt;/code&gt; log for the JSONPlaceholder URL.&lt;/li&gt;
&lt;li&gt;A response with &lt;code&gt;status: 200 OK&lt;/code&gt; when things finally worked.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;code&gt;executeTask&lt;/code&gt; log with the summarization prompt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The sandbox result looked roughly like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ok"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"answer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task titled &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;delectus aut autem&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; is incomplete."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"llmResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Task titled &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;delectus aut autem&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; is incomplete."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"raw"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"delectus aut autem"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Summarizing TODO item with LLM"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"logs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"var result; ..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. &lt;code&gt;eval_js_result&lt;/code&gt;: success or retry?
&lt;/h3&gt;

&lt;p&gt;Now comes the evaluator:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It receives a description of the JS sandbox output.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The system prompt is very strict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;ok&lt;/code&gt; is true &lt;strong&gt;and&lt;/strong&gt; there is a non-empty &lt;code&gt;result.answer&lt;/code&gt; → respond with &lt;code&gt;success&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Otherwise → respond with &lt;code&gt;retry&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;On the successful run, it answered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;success
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the workflow does &lt;em&gt;not&lt;/em&gt; go into &lt;code&gt;repair_js&lt;/code&gt; or &lt;code&gt;run_js_retry&lt;/code&gt;. Happy path.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. &lt;code&gt;answer&lt;/code&gt;: extract the final user message
&lt;/h3&gt;

&lt;p&gt;The final task, &lt;code&gt;answer&lt;/code&gt;, is intentionally boring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;System prompt: &lt;em&gt;“You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing &lt;code&gt;answer&lt;/code&gt; field.”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It gets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First run (&lt;code&gt;run_js&lt;/code&gt; result).&lt;/li&gt;
&lt;li&gt;Second run (&lt;code&gt;run_js_retry&lt;/code&gt;), if any.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Selection rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Take the &lt;strong&gt;last non-empty &lt;code&gt;answer&lt;/code&gt;&lt;/strong&gt; you see.&lt;/li&gt;
&lt;li&gt;Output it &lt;em&gt;verbatim&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In our case it found:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task titled "delectus aut autem" is incomplete.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And that’s exactly what Contenox replied in chat.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this is interesting (to me, at least)
&lt;/h2&gt;

&lt;p&gt;What I originally set out to build:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A runtime for &lt;strong&gt;deterministic, observable&lt;/strong&gt; GenAI workflows.&lt;br&gt;
Tasks, transitions, hooks – all explicit and replayable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What I accidentally stumbled into:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A multi-model, self-orchestrating agent pattern,&lt;br&gt;
where LLMs write code that uses tools, and the runtime executes and evaluates that code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The pattern looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Planner LLM&lt;/strong&gt; (&lt;code&gt;generate_js&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Reads user intent + history.&lt;/li&gt;
&lt;li&gt;Emits JavaScript that calls &lt;code&gt;httpFetch&lt;/code&gt;, &lt;code&gt;executeTask&lt;/code&gt;, &lt;code&gt;executeTaskChain&lt;/code&gt;, hooks, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Execution environment&lt;/strong&gt; (&lt;code&gt;run_js&lt;/code&gt; in Goja)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Deterministic execution of that JS.&lt;/li&gt;
&lt;li&gt;Full logs of every HTTP call, every inner LLM call, every step.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Controller LLM&lt;/strong&gt; (&lt;code&gt;eval_js_result&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Looks at the sandbox result.&lt;/li&gt;
&lt;li&gt;Decides: is this good enough? Retry? Repair?&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Repair LLM&lt;/strong&gt; (&lt;code&gt;repair_js&lt;/code&gt;, if needed)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Gets the previous code + error output.&lt;/li&gt;
&lt;li&gt;Writes a fixed version of the JS.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Answer LLM&lt;/strong&gt; (&lt;code&gt;answer&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Doesn’t “reason” at all.&lt;/li&gt;
&lt;li&gt;Just extracts the final &lt;code&gt;answer&lt;/code&gt; text safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that is expressed as an explicit &lt;strong&gt;state machine&lt;/strong&gt; in Contenox.&lt;/p&gt;

&lt;p&gt;No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.&lt;/p&gt;




&lt;p&gt;To me, that’s the exciting part:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You don’t have to choose between “boring deterministic workflows” and “fancy agents”.&lt;br&gt;
You can build the agent &lt;em&gt;on top of&lt;/em&gt; deterministic workflows.&lt;br&gt;
And everything stays **self-hosted, inspectable, and auditable if you want.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>llm</category>
      <category>javascript</category>
      <category>ai</category>
      <category>go</category>
    </item>
    <item>
      <title>When to Use OpenAI + Tools vs a Workflow Runtime</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Mon, 27 Oct 2025 18:07:07 +0000</pubDate>
      <link>https://dev.to/js402/when-to-use-openai-tools-vs-a-workflow-runtime-1n6f</link>
      <guid>https://dev.to/js402/when-to-use-openai-tools-vs-a-workflow-runtime-1n6f</guid>
      <description>&lt;p&gt;Modern “agentic AI” needs more than prompts—it needs architecture.&lt;br&gt;&lt;br&gt;
This guide shows when to stay inside OpenAI’s style tool ecosystem and when to move to a workflow runtime for observability, safety, and control.&lt;/p&gt;
&lt;h3&gt;
  
  
  💡 TL;DR
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;OpenAI + function calling or MCP&lt;/strong&gt; when your AI just needs to answer a question. Maybe call one or two tools. All in one turn.
&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;workflow runtime&lt;/strong&gt; when your AI must run multiple steps, trigger hooks, or perform actions that need to be observable, auditable, and reliable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are complementary, not competing, approaches.&lt;/p&gt;


&lt;h3&gt;
  
  
  🔍 Two Ways to Build Agentic AI
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. &lt;strong&gt;The “Chat + Tools” Approach&lt;/strong&gt; (OpenAI, Anthropic, MCP)
&lt;/h4&gt;

&lt;p&gt;The LLM drives everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What’s the weather in Berlin?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;weather_tool&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The model decides whether to call a tool.
&lt;/li&gt;
&lt;li&gt;Your code runs it and returns the result.
&lt;/li&gt;
&lt;li&gt;The model gives a final answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ Great for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Quick Q&amp;amp;A
&lt;/li&gt;
&lt;li&gt;Simple assistants
&lt;/li&gt;
&lt;li&gt;Early prototypes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ Falls short when you need  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step logic
&lt;/li&gt;
&lt;li&gt;Retries or human approval
&lt;/li&gt;
&lt;li&gt;Audit trails or state
&lt;/li&gt;
&lt;li&gt;Compliance or safety guardrails
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, the LLM is both brain and driver. You hand it tools and hope for the best. (We’ve all seen what happens when an unguarded LLM calls a destructive tool.)&lt;/p&gt;




&lt;h4&gt;
  
  
  2. &lt;strong&gt;The “Workflow Runtime” Approach&lt;/strong&gt; (contenox, Temporal+LLMs, custom orchestrators)
&lt;/h4&gt;

&lt;p&gt;I’ll use &lt;strong&gt;contenox&lt;/strong&gt; to show how this works differently.&lt;/p&gt;

&lt;p&gt;You define the workflow as a clear sequence of tasks. Each has a handler, optional LLM use, and transitions.&lt;/p&gt;

&lt;p&gt;Realistic contenox syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;weather-advisor&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Suggests actions based on the weather forecast&lt;/span&gt;
&lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_weather&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Fetch weather data via external hook&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hook&lt;/span&gt;
    &lt;span class="na"&gt;hook&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;weather-api&lt;/span&gt;
      &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_forecast&lt;/span&gt;
      &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Berlin"&lt;/span&gt;
    &lt;span class="na"&gt;output_template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{.temperature}}"&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;"&lt;/span&gt;
          &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;25"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggest_icecream"&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;suggest_walk"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;suggest_icecream&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model_execution&lt;/span&gt;
    &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;it's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;hot,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suggest&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fun&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;outdoor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;activity&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;involving&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;ice&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cream."&lt;/span&gt;
    &lt;span class="na"&gt;execute_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phi3:3.8b&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;suggest_walk&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;model_execution&lt;/span&gt;
    &lt;span class="na"&gt;system_instruction&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;it's&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cool,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suggest&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;something&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;relaxing&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;like&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;walk&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;or&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;coffee&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;indoors."&lt;/span&gt;
    &lt;span class="na"&gt;execute_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;phi3:3.8b&lt;/span&gt;
      &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ollama&lt;/span&gt;
    &lt;span class="na"&gt;transition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;operator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default"&lt;/span&gt;
          &lt;span class="na"&gt;goto&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✅ Great for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliable, stateful workflows
&lt;/li&gt;
&lt;li&gt;Real actions (APIs, notifications, DB writes)
&lt;/li&gt;
&lt;li&gt;Replay, audit, and debugging
&lt;/li&gt;
&lt;li&gt;Controlled, compliant agents
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ Overkill for  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple chatbots
&lt;/li&gt;
&lt;li&gt;One-off prompts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here, &lt;strong&gt;you&lt;/strong&gt; control the flow. The LLM is just one worker in the chain.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧠 Why Both Exist
&lt;/h3&gt;

&lt;p&gt;They solve different problems.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;th&gt;Best Tool&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Assistive AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;“Help me get an answer fast.”&lt;/td&gt;
&lt;td&gt;OpenAI + Tools / MCP&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autonomous AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;“Run a safe, reliable process.”&lt;/td&gt;
&lt;td&gt;Workflow runtime (contenox, Flyte, Temporal)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it this way:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI + Tools&lt;/strong&gt; is your clever intern—fast but unpredictable.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;contenox&lt;/strong&gt; is your project manager—structured, logged, and accountable.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🛠️ How to Choose
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Best Approach&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“Ask HR about PTO policy.”&lt;/td&gt;
&lt;td&gt;✅ OpenAI + RAG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Detect outage → Slack alert → Jira ticket → confirm fix.”&lt;/td&gt;
&lt;td&gt;✅ contenox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Generate a report and email it.”&lt;/td&gt;
&lt;td&gt;⚠️ Start with OpenAI. Switch to contenox if reliability matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Run AI in an air-gapped system.”&lt;/td&gt;
&lt;td&gt;✅ contenox&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“Weekend agent hack.”&lt;/td&gt;
&lt;td&gt;✅ OpenAI + function calling&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  🔮 The Future: They’ll Meet in the Middle
&lt;/h3&gt;

&lt;p&gt;MCP will add light state.&lt;br&gt;&lt;br&gt;
Workflow runtimes will simplify small jobs.  &lt;/p&gt;

&lt;p&gt;But the core question stays the same:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Is your AI assisting—or acting?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If it’s assisting, use tools.&lt;br&gt;&lt;br&gt;
If it’s acting, use orchestration.&lt;/p&gt;


&lt;h3&gt;
  
  
  🚀 Try contenox Yourself
&lt;/h3&gt;

&lt;p&gt;It’s open source and self-hostable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/contenox/runtime.git  
&lt;span class="nb"&gt;cd &lt;/span&gt;runtime
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
./scripts/bootstrap.sh nomic-embed-text:latest phi3:3.8b phi3:3.8b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Define a workflow like the YAML above. Register your hooks. Watch your AI take real, safe actions.&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/contenox/runtime" rel="noopener noreferrer"&gt;GitHub: contenox/runtime&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>saas</category>
      <category>webdev</category>
    </item>
    <item>
      <title>A War Story: Building Products to Solve Your Own Pain Points</title>
      <dc:creator>Alexander Ertli</dc:creator>
      <pubDate>Sat, 25 Oct 2025 09:53:46 +0000</pubDate>
      <link>https://dev.to/js402/a-war-story-building-products-to-solve-your-own-pain-points-5d1i</link>
      <guid>https://dev.to/js402/a-war-story-building-products-to-solve-your-own-pain-points-5d1i</guid>
      <description>&lt;p&gt;Let's put it straight: in March, I committed to building a product, not a sane one. I wanted to solve what frustrated me the most with tools like ChatGPT, Gemini, or shell apps like n8n.&lt;/p&gt;

&lt;p&gt;My initial goal wasn't to sell it to customers or pitch it to VCs, but just to ensure I had the tool to tame LLMs, Agents, or even AGI (if it ever ships), so that what had invaded my work life out of necessity would actually work for me and not glitch randomly or just try to trap me.&lt;/p&gt;

&lt;p&gt;So there I was, spending almost every second I had building a platform that should allow for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step, multi-modal LLM workflows&lt;/li&gt;
&lt;li&gt;Declarative behavior for AI Agents&lt;/li&gt;
&lt;li&gt;Proper RAG (Retrieval-Augmented Generation)&lt;/li&gt;
&lt;li&gt;Support for any API as a Tool&lt;/li&gt;
&lt;li&gt;Handling cloud provider-level traffic&lt;/li&gt;
&lt;li&gt;Magically splitting the codebase into Open Source and EE (Enterprise Edition)&lt;/li&gt;
&lt;li&gt;... and, sure thing, doing the core in Go&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, seven months in, after 60+ hours per week coding and trying to somehow make this more than just my own tooling project, losing my personal money while pulling my hair out, observing market developments, and model-interface providers changing APIs.&lt;/p&gt;

&lt;p&gt;I kept discovering more players each month, building some or all of the features on my roadmap. Which kind of validated that there’s no universal solution yet.&lt;br&gt;
A core question always echoed in my mind: &lt;/p&gt;

&lt;p&gt;"Is what I'm doing a business?"&lt;/p&gt;

&lt;p&gt;Researching and identifying pain points and collaborating with others to identify use cases and verticals beyond my own worldviews.&lt;/p&gt;

&lt;p&gt;Hell, I even set aside budgets and evaluated contractors and friends to outsource some coding and essential but non-negligible tasks, for example, copywriting, to ensure I meet my own roadmap... &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Since when did I become a PM? Wasn't that a space I always looked down on and never wanted to touch? &lt;/p&gt;
&lt;/blockquote&gt;






&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;naro@xaxen:~/src/github.com/contenox/runtime2&lt;span class="nv"&gt;$ &lt;/span&gt;cloc &lt;span class="nb"&gt;.&lt;/span&gt;
     650 text files.
     631 unique files.
      22 files ignored.

github.com/AlDanial/cloc v 1.98  &lt;span class="nv"&gt;T&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0.40 s &lt;span class="o"&gt;(&lt;/span&gt;1580.5 files/s, 340245.6 lines/s&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
Language                     files          blank        comment           code
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
Go                             368           9637           9139          53197
JSON                            15              2              0          24941
TypeScript                     176           1022            159          12702
Markdown                         6           2804             66           8572
YAML                             6             14             10           5187
Python                          25            687            443           3573
CSS                              4             45             37           2103
Bourne Shell                     3             64             73            378
SQL                              2             73              6            285
make                             2             32              1            123
HCL                              3             19             21            113
Dockerfile                       3             23             20             84
SVG                              9              8              5             73
JavaScript                       6              7              0             69
HTML                             1              0              0             13
Text                             2              0              0             10
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
SUM:                           631          14437           9980         111423
&lt;span class="nt"&gt;-------------------------------------------------------------------------------&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Here I am, sitting on three major rewrites and a codebase so big I can't navigate using just the file tree. &lt;/p&gt;

&lt;p&gt;I achieved my goal. Kinda. Yet never truly released. And I didn't stop there.&lt;/p&gt;

&lt;p&gt;While working on this project, nuances about LLMs—their use cases, strengths, weaknesses, and mitigation strategies appeared. &lt;/p&gt;

&lt;p&gt;I had to find out what the nature of LLMs is. And soon I was on another mission, validating a new vision: &lt;/p&gt;

&lt;p&gt;"I want to define how AI behaves, interacts, and learns."&lt;/p&gt;

&lt;p&gt;It seems that, yeah, with the proper implementation, the platform can handle a lot of autonomy- &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it could be used to the point of self-optimizing at runtime &lt;/li&gt;
&lt;li&gt;or let it judge its own performance &lt;/li&gt;
&lt;li&gt;or even to then trigger the generation of content for model fine-tuning. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And since it doesn't require an external model (or can selectively use them), there is no constraint on what it can be used for.&lt;/p&gt;




&lt;p&gt;Let's wrap up, me rambling here. &lt;br&gt;
I don't encourage anyone to repeat the same path. &lt;br&gt;
No matter how much, only geeks glamorize the builder path. &lt;br&gt;
Here is what I mean: &lt;/p&gt;

&lt;p&gt;I noticed changes and reality catching up; yes, not just minor, fixable things like my bank account's balance, but also longer-lasting changes—from the simple "everyone hates me now because I was so busy" to mindset shifts that can’t truly be undone. &lt;/p&gt;

&lt;p&gt;I built the tool I needed, but now it's clear:&lt;br&gt;
This is just the start. And it gets expensive on all fronts from here. &lt;/p&gt;

&lt;p&gt;Don't get me wrong. I still don't care that much about monetizing all this effort. But it's clear that without doing so, it will just die.&lt;/p&gt;

&lt;p&gt;It’s been clear for a while that my ‘2–3 month’ estimate was a severe miscalculation. &lt;/p&gt;

&lt;p&gt;And now what?&lt;br&gt;
What’s the lesson here?&lt;br&gt;
More crucially — what would be different if I had known the game I was playing?&lt;/p&gt;

</description>
      <category>startup</category>
      <category>ai</category>
      <category>llms</category>
      <category>saas</category>
    </item>
  </channel>
</rss>
