<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ivan Peychev</title>
    <description>The latest articles on DEV Community by Ivan Peychev (@peycheff).</description>
    <link>https://dev.to/peycheff</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3963262%2F811279ce-8ab4-4d7f-9425-0fad64971fe9.jpeg</url>
      <title>DEV Community: Ivan Peychev</title>
      <link>https://dev.to/peycheff</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/peycheff"/>
    <language>en</language>
    <item>
      <title>Building HELM: from "AI agents are powerful" to "AI agents need an execution boundary"</title>
      <dc:creator>Ivan Peychev</dc:creator>
      <pubDate>Mon, 01 Jun 2026 19:27:13 +0000</pubDate>
      <link>https://dev.to/peycheff/building-helm-from-ai-agents-are-powerful-to-ai-agents-need-an-execution-boundary-313o</link>
      <guid>https://dev.to/peycheff/building-helm-from-ai-agents-are-powerful-to-ai-agents-need-an-execution-boundary-313o</guid>
      <description>&lt;p&gt;Hey DEV, first post here. My name is Ivan, I'm a founder based in Toronto, Canada building open-source AI infrastructure under Mindburn Labs. I spend most of my time designing autonomous agent systems and thinking about what it actually means to deploy them safely. Glad to be here.&lt;/p&gt;




&lt;p&gt;I'm building HELM, an execution authority layer for AI agents.&lt;/p&gt;

&lt;p&gt;The simplest version of the idea is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Models propose. HELM governs execution. Every decision leaves proof.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sentence took a long time to arrive at.&lt;/p&gt;

&lt;p&gt;For a while, I thought the big opportunity in AI agents was better orchestration: better workflows, better chains, better planning, better memory, better autonomous loops. But the deeper I went, the more obvious the real bottleneck became.&lt;/p&gt;

&lt;p&gt;The hard problem is not just getting an agent to decide what to do.&lt;/p&gt;

&lt;p&gt;The hard problem is deciding whether it should be allowed to do it.&lt;/p&gt;

&lt;p&gt;Because the moment an AI agent can touch a real tool, the risk changes completely. It is no longer just "did the model hallucinate?" It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can it delete a database?&lt;/li&gt;
&lt;li&gt;Can it send a customer email?&lt;/li&gt;
&lt;li&gt;Can it export private data?&lt;/li&gt;
&lt;li&gt;Can it issue a refund?&lt;/li&gt;
&lt;li&gt;Can it modify cloud permissions?&lt;/li&gt;
&lt;li&gt;Can it deploy code?&lt;/li&gt;
&lt;li&gt;Can it trigger a workflow that a human or business system will treat as real?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the boundary I became obsessed with.&lt;/p&gt;

&lt;h2&gt;
  
  
  The original itch
&lt;/h2&gt;

&lt;p&gt;I started from a frustration that I think a lot of builders are running into now.&lt;/p&gt;

&lt;p&gt;AI agents are becoming easier to build. Every week there is a new framework, SDK, workflow engine, tool-calling protocol, hosted agent builder, or "AI employee" product.&lt;/p&gt;

&lt;p&gt;But when you actually try to put agents near production systems, the same uncomfortable question appears:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who is responsible for the side effect?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the agent proposes something, that is one thing. If the agent executes something, that is another.&lt;/p&gt;

&lt;p&gt;Most agent stacks blur those two things. The model reasons, plans, selects a tool, calls the tool, and the surrounding application tries to clean up the mess with logs, prompts, evals, dashboards, or human review.&lt;/p&gt;

&lt;p&gt;That felt backwards.&lt;/p&gt;

&lt;p&gt;I kept coming back to a simple systems principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stochastic systems can propose. Deterministic systems should authorize execution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That became the foundation for HELM.&lt;/p&gt;

&lt;h2&gt;
  
  
  What HELM is
&lt;/h2&gt;

&lt;p&gt;HELM is not another agent framework.&lt;/p&gt;

&lt;p&gt;It is not trying to replace LangChain, CrewAI, OpenAI Agents, Claude tool use, MCP servers, workflow builders, or internal automation.&lt;/p&gt;

&lt;p&gt;HELM sits underneath those systems.&lt;/p&gt;

&lt;p&gt;The first product is &lt;strong&gt;HELM AI Kernel&lt;/strong&gt;, a local-first, OSS execution boundary for AI agents. It is designed to intercept tool calls and OpenAI-compatible requests before they become side effects, evaluate whether the action is allowed, and emit signed receipts and EvidencePacks that can be verified later.&lt;/p&gt;

&lt;p&gt;In practical terms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An agent proposes an action.&lt;/li&gt;
&lt;li&gt;HELM evaluates the action.&lt;/li&gt;
&lt;li&gt;HELM returns ALLOW, DENY, or ESCALATE.&lt;/li&gt;
&lt;li&gt;The decision is recorded.&lt;/li&gt;
&lt;li&gt;The proof can be reviewed later.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The second product path is &lt;strong&gt;HELM AI Enterprise Basic&lt;/strong&gt;, which turns the local boundary into team-scale control: shared workspaces, approvals, governed actions, receipts, policies, API access, and short-retention evidence.&lt;/p&gt;

&lt;p&gt;The larger commercial product is &lt;strong&gt;HELM AI Enterprise&lt;/strong&gt;, which I think of as a Company AI OS: it makes company state queryable, detects should-vs-is drift, generates executable specs, routes approved work through the HELM boundary, and writes closure evidence back into the proof system.&lt;/p&gt;

&lt;h2&gt;
  
  
  The insight that changed the product
&lt;/h2&gt;

&lt;p&gt;The early temptation was to describe HELM as "AI governance."&lt;/p&gt;

&lt;p&gt;That was a mistake.&lt;/p&gt;

&lt;p&gt;Developers do not wake up wanting governance. Engineering teams do not want a dashboard that tells them, after the fact, that an agent did something dangerous.&lt;/p&gt;

&lt;p&gt;They want agents to move faster without giving them unchecked authority.&lt;/p&gt;

&lt;p&gt;So the framing became sharper:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HELM AI Kernel is a fail-closed execution firewall for AI agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A firewall is understandable. Fail-closed is understandable. Execution is concrete. Receipts are concrete. Verification is concrete.&lt;/p&gt;

&lt;p&gt;The goal is not "trust AI."&lt;/p&gt;

&lt;p&gt;The goal is: &lt;strong&gt;Do not trust your agents. Verify their execution.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;The AI market is moving from generation to execution.&lt;/p&gt;

&lt;p&gt;The first wave was text, code, chat, summarization, and copilots.&lt;/p&gt;

&lt;p&gt;The next wave is agents that can actually do things: open PRs, change infrastructure, call APIs, move data, trigger business workflows, send messages, update CRMs, submit payments, and coordinate across tools.&lt;/p&gt;

&lt;p&gt;That shift creates a new infrastructure problem.&lt;/p&gt;

&lt;p&gt;When software becomes agentic, logs are not enough. Observability is not enough. Prompt rules are not enough. Human approval for every action is not enough.&lt;/p&gt;

&lt;p&gt;You need an execution boundary. You need policy before the action, not only analysis after the action. You need proof that survives disputes, audits, incidents, and customer questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm deliberately not building
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;HELM is not an agent orchestration platform. Orchestration decides what an agent should attempt. HELM decides what may execute.&lt;/li&gt;
&lt;li&gt;HELM is not a generic AI governance dashboard. Dashboards show things. HELM governs side effects.&lt;/li&gt;
&lt;li&gt;HELM is not an AGI operating system. That framing is vague and too far ahead of product reality.&lt;/li&gt;
&lt;li&gt;HELM is not Kubernetes Helm. This is HELM AI Kernel by Mindburn Labs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What has been hardest
&lt;/h2&gt;

&lt;p&gt;The hardest part has not been writing a policy engine or drawing architecture diagrams.&lt;/p&gt;

&lt;p&gt;The hardest part has been resisting category drift.&lt;/p&gt;

&lt;p&gt;AI makes it very easy to sound bigger than you are. The market does not need another grand AI claim. It needs a clean answer to a concrete problem:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I let this agent touch real systems without giving it unchecked power?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the product I want to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm learning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Developers respond better to security mechanisms than governance language. "Execution firewall" lands. "AI governance platform" does not.&lt;/li&gt;
&lt;li&gt;Founders want velocity, not bureaucracy. The product has to feel like it helps ship faster.&lt;/li&gt;
&lt;li&gt;Proof is a product surface. Receipts, EvidencePacks, offline verification, and replay are not backend details. They are how customers build confidence.&lt;/li&gt;
&lt;li&gt;OSS cannot be a crippled teaser. HELM AI Kernel has to be useful on its own.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm looking for
&lt;/h2&gt;

&lt;p&gt;Builders who are putting agents near real tools and feeling the discomfort. Especially if you are working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI agents that call internal APIs&lt;/li&gt;
&lt;li&gt;MCP tool servers&lt;/li&gt;
&lt;li&gt;AI coding agents with repo or CI access&lt;/li&gt;
&lt;li&gt;Agentic workflows around customer data&lt;/li&gt;
&lt;li&gt;Teams trying to move from prototype agents to production agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best conversations start with: "We want agents to do X, but we are scared they might do Y."&lt;/p&gt;

&lt;p&gt;That is exactly the boundary HELM is being built for.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Mindburn-Labs/helm-ai-kernel" rel="noopener noreferrer"&gt;https://github.com/Mindburn-Labs/helm-ai-kernel&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Happy to answer questions or just hear what problems you're running into.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
