<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: HeytalePazguato</title>
    <description>The latest articles on DEV Community by HeytalePazguato (@heytalepazguato).</description>
    <link>https://dev.to/heytalepazguato</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3893398%2F5340529f-0a61-408f-b408-83d4cc715063.png</url>
      <title>DEV Community: HeytalePazguato</title>
      <link>https://dev.to/heytalepazguato</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/heytalepazguato"/>
    <language>en</language>
    <item>
      <title>Deterministic by design: code review without an LLM</title>
      <dc:creator>HeytalePazguato</dc:creator>
      <pubDate>Tue, 26 May 2026 13:00:00 +0000</pubDate>
      <link>https://dev.to/heytalepazguato/deterministic-by-design-code-review-without-an-llm-e27</link>
      <guid>https://dev.to/heytalepazguato/deterministic-by-design-code-review-without-an-llm-e27</guid>
      <description>&lt;p&gt;Every code review tool launched in the last two years seems to lead with the same word: AI. Point a model at a diff, get back prose about what might be wrong. For a lot of code, that is genuinely useful.&lt;/p&gt;

&lt;p&gt;I built a code review tool recently, and I deliberately left the LLM out. Not because I dislike them, I use them daily, but because the code I was targeting has a property that makes a non-deterministic reviewer the wrong tool: it runs machines, and a wrong or inconsistent answer has a physical cost.&lt;/p&gt;

&lt;p&gt;This is about that decision, why determinism mattered more than fluency for this case, and where I think an LLM still earns a place.&lt;/p&gt;

&lt;h2&gt;
  
  
  The case study: industrial control code
&lt;/h2&gt;

&lt;p&gt;The tool, &lt;a href="https://github.com/HeytalePazguato/plc-st-review" rel="noopener noreferrer"&gt;plc-st-review&lt;/a&gt;, reviews IEC 61131-3 Structured Text. That is the language in which a large share of the world's factories, water plants, and process lines are programmed in. A bug here is not a 500 on a web page. It is a conveyor that runs too fast, a safety interlock whose timeout quietly changed, or a pump that never starts.&lt;/p&gt;

&lt;p&gt;The famous extreme of this is Stuxnet. It quietly altered the PLC logic driving uranium enrichment centrifuges in Iran so they spun at damaging speeds, while replaying normal sensor readings back to the operators so nothing looked wrong. No explosion, just centrifuges tearing themselves apart over months. That was deliberate, state-built malware engineered to hide itself, so to be clear, no linter would have caught it. But you do not need a nation-state attacker to get the physical version of a wrong number. A timer preset typed as T#200ms instead of T#2s in an ordinary change does it too, and that is exactly the kind of thing a code review is supposed to catch and routinely misses.&lt;/p&gt;

&lt;p&gt;You do not need to know Structured Text to follow the argument. The point is the constraint: this is code where "probably fine" is not an acceptable review result, and where the same input has to produce the same answer every single time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why determinism beats fluency here
&lt;/h2&gt;

&lt;p&gt;A linter you gate a CI pipeline on makes a promise: the same code produces the same findings, today and in six months, on my machine and on the build server. That promise is what lets a team say "the build is red, the merge is blocked" and trust it.&lt;/p&gt;

&lt;p&gt;An LLM reviewer cannot make that promise. The same diff can produce different output across runs. It can hallucinate a problem that is not there, or miss one that is. Temperature, model version, and context window all move the result. For an exploratory review, that is a fine trade. For a merge gate on safety-relevant code, it is disqualifying, because a gate that sometimes blocks and sometimes does not is not a gate.&lt;/p&gt;

&lt;p&gt;Determinism bought me four things that matter more than natural language here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reproducibility.&lt;/strong&gt; Every finding is a pure function of the parse tree. Run it a thousand times, get the same result a thousand times. CI can depend on it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auditability.&lt;/strong&gt; When the tool flags something, it points to a named rule and the exact node that triggered it. In a regulated environment, someone will eventually ask, "Why did this fail?" "A rule named TIMER_VALUE_CHANGED fired because the PT went from T#2s to T#200ms" is an answer. "The model felt it looked risky" is not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No data leaving the building.&lt;/strong&gt; Industrial shops are, correctly, paranoid about shipping control code to a third-party API. A tool that parses locally and calls nothing external clears that bar without a procurement fight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost and latency that round to zero.&lt;/strong&gt; It parses and walks a tree. No tokens, no rate limits, no per-review bill. It runs on every push without anyone watching the meter.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it actually works
&lt;/h2&gt;

&lt;p&gt;There is no magic, which is the point. The pipeline is boring on purpose:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Parse each &lt;code&gt;.st&lt;/code&gt; file into a syntax tree with a &lt;a href="https://github.com/HeytalePazguato/tree-sitter-iec61131-3-st" rel="noopener noreferrer"&gt;tree-sitter grammar&lt;/a&gt;. Real parsing, not regex on text.&lt;/li&gt;
&lt;li&gt;Build a symbol table per revision: every program unit and its parameter signature, global variables, enums, timer instances, call sites, CASE statements.&lt;/li&gt;
&lt;li&gt;Hand that structured model to each check. A check is a small, self-contained function that looks at the tree and the symbol table and returns findings.&lt;/li&gt;
&lt;li&gt;For pull request review, do all of the above for both the before and after versions of a change, and diff the two models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is where it earns its keep. A single-revision analyzer can tell you a timer exists. Comparing two revisions tells you the timer's preset went from two seconds to two hundred milliseconds in this specific commit, ten times faster, which is exactly the kind of one-character typo that passes a visual review and trips a machine in production.&lt;/p&gt;

&lt;p&gt;A few more examples of what falls out of having a real model instead of text matching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A function block instance whose outputs you read but that nothing ever calls, so you are reading stale values.&lt;/li&gt;
&lt;li&gt;A literal array index outside the declared bounds.&lt;/li&gt;
&lt;li&gt;A constant whose name starts with &lt;code&gt;SAFETY_&lt;/code&gt; whose value changed, flagged at a higher severity because of the prefix.&lt;/li&gt;
&lt;li&gt;A function that grew a required input while only some of its call sites were updated.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of those needs a language model. They need a correct model of the code and a rule.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the LLM does belong
&lt;/h2&gt;

&lt;p&gt;This is the part I want to be honest about, because "no AI" as a dogma is just the inverse mistake.&lt;/p&gt;

&lt;p&gt;There is one place an LLM clearly helps: explaining a finding to someone who is not a domain expert. A junior engineer reading &lt;code&gt;EDGE_TRIG_REUSED&lt;/code&gt; may not know why feeding one R_TRIG instance from two different clock expressions is a problem. A model is great at turning a terse, correct finding into a paragraph of plain English.&lt;/p&gt;

&lt;p&gt;So the design rule I settled on is: the LLM never originates a finding. It only paraphrases one that the deterministic engine has already produced and grounded in a specific node. Determinism remains the source of truth; the model is an optional translation layer on top. That keeps the gate trustworthy while still making the output approachable. It is on the roadmap as a strictly additive &lt;code&gt;--explain&lt;/code&gt; flag, off by default, never in the path that decides pass or fail.&lt;/p&gt;

&lt;p&gt;That boundary, the model can explain but never decide, is the whole thesis. Let the deterministic core own correctness and the merge gate. Let the LLM own fluency, where being occasionally wrong costs nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway, beyond PLCs
&lt;/h2&gt;

&lt;p&gt;The reflex right now is to reach for a model first and ask what it should not touch later. I think it is worth inverting that for any code where a review result gates something real: decide what must be deterministic and auditable, build that part without the model, and add the LLM only where a wrong answer is cheap.&lt;/p&gt;

&lt;p&gt;Not everything should be reviewed by an AI. Some things should be reviewed by a rule that gives the same answer every time, and can tell you exactly why.&lt;/p&gt;

&lt;p&gt;The tool is open source (MIT) if you want to see the checks: &lt;a href="https://github.com/HeytalePazguato/plc-st-review" rel="noopener noreferrer"&gt;https://github.com/HeytalePazguato/plc-st-review&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would be curious where other people draw this line. What in your stack do you keep deterministic on purpose, and where have you let a model in?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>codequality</category>
      <category>devtools</category>
      <category>showdev</category>
    </item>
    <item>
      <title>A local-first project knowledge graph for AI coding agents</title>
      <dc:creator>HeytalePazguato</dc:creator>
      <pubDate>Tue, 05 May 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/heytalepazguato/a-local-first-project-knowledge-graph-for-ai-coding-agents-34b4</link>
      <guid>https://dev.to/heytalepazguato/a-local-first-project-knowledge-graph-for-ai-coding-agents-34b4</guid>
      <description>&lt;h2&gt;
  
  
  The problem worth solving
&lt;/h2&gt;

&lt;p&gt;AI coding agents are good at solving small problems and bad at situating them. Ask Claude Code to "rename &lt;code&gt;getUserSession&lt;/code&gt; and update every caller" in a 50,000-line codebase, and the answer depends on whether the agent can see the call graph or has to grep for it.&lt;/p&gt;

&lt;p&gt;Most tools fix this with cloud-synced code intelligence. Sourcegraph, Cody, Cursor's index, Continue's RAG. They all work, and they all impose the same trade-off: your code goes to a service, an account, and a continuous indexing job.&lt;/p&gt;

&lt;p&gt;I wanted code intelligence without that trade-off, so I built it as a local SQLite file with a single trigger and no background work. This post is a write-up of the design choices that made it possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "local-first project knowledge graph" actually means
&lt;/h2&gt;

&lt;p&gt;In Event Horizon v3, every workspace gets a graph stored at &lt;code&gt;&amp;lt;workspace&amp;gt;/.eh/graph.db&lt;/code&gt;. The graph holds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Functions, classes, interfaces, methods (nodes)&lt;/li&gt;
&lt;li&gt;Calls, imports, extends, implements (edges)&lt;/li&gt;
&lt;li&gt;Markdown documentation as nodes linked to source files&lt;/li&gt;
&lt;li&gt;Code-comment rationale (&lt;code&gt;// WHY:&lt;/code&gt;, TODO, FIXME, JSDoc/TSDoc, Python docstrings, C# XML doc) attached to the function or class they describe&lt;/li&gt;
&lt;li&gt;Agent activity as graph data: every completed task creates an &lt;code&gt;agent_activity&lt;/code&gt; node with &lt;code&gt;touched&lt;/code&gt;/&lt;code&gt;authored&lt;/code&gt; edges to the files it modified&lt;/li&gt;
&lt;li&gt;Shared knowledge entries as graph nodes with &lt;code&gt;references&lt;/code&gt; edges to the code they mention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The graph is built and refreshed only by user-invoked skills, never by background processes. &lt;code&gt;/eh:optimize-context&lt;/code&gt; builds or rebuilds it on demand. &lt;code&gt;/eh:orchestrate&lt;/code&gt; and &lt;code&gt;/eh:work-on-plan&lt;/code&gt; refresh it automatically when they finish, using the list of files their workers touched. There is no autoscan. There is no file watcher. Activation does not touch the disk. Every refresh is the consequence of a skill the user explicitly ran.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architectural choices that make this work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tree-sitter WASM, five languages, no native build
&lt;/h3&gt;

&lt;p&gt;Code structure extraction runs through tree-sitter compiled to WebAssembly. Adds about 3 MB of grammars to the VSIX, no &lt;code&gt;node-gyp&lt;/code&gt;, no platform-specific binaries. The shipped grammars cover TypeScript, JavaScript, TSX, PHP, Python, and C#. PHP traits and enums are first-class. Python decorators, docstrings, and &lt;code&gt;# TODO&lt;/code&gt; / &lt;code&gt;# FIXME&lt;/code&gt; / &lt;code&gt;# WHY&lt;/code&gt; rationale comments land in the graph. C# records, structs, enums, and XML doc comments land too.&lt;/p&gt;

&lt;h3&gt;
  
  
  SHA256-based incremental skip
&lt;/h3&gt;

&lt;p&gt;Every file's content hash is stored alongside the graph nodes it produced. On rebuild, files whose hash hasn't changed since the last build are skipped entirely. A re-run of &lt;code&gt;/eh:optimize-context&lt;/code&gt; on a clean tree is close to free.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vendor and minified file skipping
&lt;/h3&gt;

&lt;p&gt;The scanner refuses to index &lt;code&gt;vendor/&lt;/code&gt;, &lt;code&gt;__pycache__/&lt;/code&gt;, &lt;code&gt;.venv/&lt;/code&gt;, &lt;code&gt;bin/&lt;/code&gt;, &lt;code&gt;obj/&lt;/code&gt;, &lt;code&gt;target/&lt;/code&gt;, &lt;code&gt;*.min.js&lt;/code&gt;, &lt;code&gt;*.bundle.js&lt;/code&gt;, &lt;code&gt;*.designer.cs&lt;/code&gt;, and a handful of similar patterns. There is also a "first non-empty line longer than 1000 characters" check that catches inline-bundled vendor scripts that don't follow naming conventions. This drops graph node count by 50 to 80 percent on Laravel, Symfony, and .NET projects, where the &lt;code&gt;vendor/&lt;/code&gt; and &lt;code&gt;bin/&lt;/code&gt; folders are usually larger than the actual source.&lt;/p&gt;

&lt;h3&gt;
  
  
  Provenance on every inferred edge
&lt;/h3&gt;

&lt;p&gt;I haven't seen this in any other open-source code-intelligence tool. Every edge in the graph carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A provenance tag: EXTRACTED (deterministic from AST), INFERRED (heuristic), AMBIGUOUS (multiple resolutions possible)&lt;/li&gt;
&lt;li&gt;A confidence score (0 to 1)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When an agent queries the graph and reads a result, it can decide how much to trust an edge. An EXTRACTED 0.99 callee is reliable. An AMBIGUOUS 0.4 callee is a hint. The agent can act on the hint or ask for more context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Shrink-guard
&lt;/h3&gt;

&lt;p&gt;There's a small but practical guard in the extractor: if a rebuild would delete more than 50 percent of a file's prior nodes, the rebuild is rejected. This protects against extractor regressions, silently shrinking the graph during an upgrade.&lt;/p&gt;

&lt;h2&gt;
  
  
  What runs, when
&lt;/h2&gt;

&lt;p&gt;The full lifecycle of the graph in Event Horizon v3 is small enough to fit in two paragraphs.&lt;/p&gt;

&lt;p&gt;You open VS Code, the extension activates, and nothing happens on disk. You ask Claude Code (or OpenCode, or Copilot, or Cursor, all four are supported) to run &lt;code&gt;/eh:optimize-context&lt;/code&gt; for a task. The skill builds or refreshes the graph, hands the agent the relevant slice of nodes and edges, and the agent uses the slice as context. When the agent finishes the task and emits &lt;code&gt;task.complete&lt;/code&gt;, an &lt;code&gt;agent_activity&lt;/code&gt; node is added with &lt;code&gt;touched&lt;/code&gt; edges to every file it modified. When you run &lt;code&gt;/eh:orchestrate&lt;/code&gt; or &lt;code&gt;/eh:work-on-plan&lt;/code&gt; to coordinate multiple workers, the orchestration tracks every file its workers touched, and refreshes the graph against that list automatically before reporting its summary. No need to re-run &lt;code&gt;/eh:optimize-context&lt;/code&gt; after every plan; the graph reflects reality as soon as the orchestrator finishes.&lt;/p&gt;

&lt;p&gt;No background jobs. No autoscan. No telemetry. No outbound LLM calls from Event Horizon itself; agents that opt into LLM-based concept extraction (&lt;code&gt;eh_extract_concepts&lt;/code&gt;) spend their own tokens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying the graph
&lt;/h2&gt;

&lt;p&gt;Five MCP tools wrap the graph for agent use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;eh_query_graph&lt;/code&gt; does search, callers, callees, neighbors, shortest path, explain, and recent activity.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eh_extract_concepts&lt;/code&gt; runs an opt-in LLM extraction pass when the agent wants higher-level concepts on top of the AST.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eh_build_graph&lt;/code&gt; triggers a manual rebuild from the agent side.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eh_curate_context&lt;/code&gt; selects a task-aware slice of the graph that fits within a token budget.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;eh_rescan_files&lt;/code&gt; takes a path list and re-extracts only those files, runs the resolution pass once, and returns a scan summary. This is what powers the orchestrate-end auto-refresh, and it is also available to any agent that needs a targeted refresh after writing files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;eh_curate_context&lt;/code&gt; is the one that pays for everything else. It is the difference between an agent asking "show me everything related to authentication" and getting a 200,000-token dump, versus asking the same question and getting a 4,000-token slice that names the right functions, the right callers, and the relevant rationale comments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualization
&lt;/h2&gt;

&lt;p&gt;Like every other graph tool, this one has a canvas. Unlike most of them, the canvas is in a VS Code webview, not a browser tab on a remote service. The Knowledge tab renders rounded-square nodes (color-coded by type), straight edges, soft cyan glow halos on a dark blueprint grid background. Force-directed initial layout. Click a node to open a 320 px detail drawer with callers, callees, references, rationale, recent agent activity, and a "Reveal in editor" button that jumps to the source file. Pan with mouse drag, zoom with wheel.&lt;/p&gt;

&lt;p&gt;The webview hydrates on connect, so reopening the panel shows the existing graph immediately. It re-fetches automatically whenever a build or refresh finishes, whether triggered by &lt;code&gt;/eh:optimize-context&lt;/code&gt; or by an orchestration ending.&lt;/p&gt;

&lt;h2&gt;
  
  
  The reasoning behind "no autoscan"
&lt;/h2&gt;

&lt;p&gt;I want to be honest about why the graph builds only when you ask. Background indexing is the normal pattern. JetBrains' Indexer, VS Code's reference indexes, and Sourcegraph's batch jobs all run continuously. The trade-off is that you pay for activity you didn't request: CPU cycles, disk writes, sometimes telemetry.&lt;/p&gt;

&lt;p&gt;For this tool, the cost-benefit is different. The graph isn't there to power autocomplete; it is there to give an AI agent context for a specific task. The cadence of "build the graph" is the same as the cadence of "I am starting a non-trivial task". That is a few times a day, not a thousand times a day. Coupling the build to the slash command means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predictable resource use: zero CPU until you ask.&lt;/li&gt;
&lt;li&gt;The graph reflects an explicit moment in time: the moment you decided to start a task. No drift between what the agent saw and what the codebase looked like five minutes later.&lt;/li&gt;
&lt;li&gt;One graph. One rebuild. One file at &lt;code&gt;&amp;lt;workspace&amp;gt;/.eh/graph.db&lt;/code&gt;. Easy to reason about the state.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What this design gives up
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No real-time updates.&lt;/strong&gt; The graph reflects the moment of the last build or refresh. The orchestrate-end auto-refresh covers the most common drift case (a long-running plan that touched many files), but a single agent editing files outside an orchestration still sees a stale view until the next explicit rebuild.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cross-machine sharing.&lt;/strong&gt; The graph file lives on your laptop. Teams that want a shared code-intelligence backend need a server. There is no way around that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tied to tree-sitter coverage.&lt;/strong&gt; Languages without a tree-sitter grammar in the shipped set (Go, Java, Ruby, Rust) are not yet in the graph. The dispatcher is per-language, so adding grammar is a few hundred lines, but it has to be done per language.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are real limits. For a solo developer running 3 to 5 AI agents on their own machine, none of them dominate. For a 50-person engineering org, several do.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Most code-intelligence tools assume "constant background work" is the price of context. For a single developer giving an AI agent context to do a task, it isn't. A SQLite file, a one-shot extractor, and a slash command cover the actual use case. Activation doesn't touch the disk. The graph builds only when you ask. The agent gets a curated slice via MCP. Nothing leaves the laptop unless an agent you opted into makes its own LLM call.&lt;/p&gt;

&lt;p&gt;If that architectural stance interests you, Event Horizon is open source and on the VS Code Marketplace. v3 ships the graph and the orchestrate-end auto-refresh that keeps it current as your agents work, without ever installing a file watcher. Star the repo if you want to follow the next pieces (more languages, smarter slicing, agent-driven graph mutations).&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; Install from the &lt;a href="https://marketplace.visualstudio.com/items?itemName=HeytalePazguato.event-horizon-vscode" rel="noopener noreferrer"&gt;VS Code Marketplace&lt;/a&gt;, or &lt;a href="https://open-vsx.org/extension/HeytalePazguato/event-horizon-vscode" rel="noopener noreferrer"&gt;Open VSX&lt;/a&gt; for Cursor, VSCodium, and Windsurf.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/HeytalePazguato/event-horizon" rel="noopener noreferrer"&gt;github.com/HeytalePazguato/event-horizon&lt;/a&gt; (MIT)&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vscode</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>A zero-infrastructure architecture for coordinating multiple AI coding agents</title>
      <dc:creator>HeytalePazguato</dc:creator>
      <pubDate>Thu, 23 Apr 2026 12:00:00 +0000</pubDate>
      <link>https://dev.to/heytalepazguato/a-zero-infrastructure-architecture-for-coordinating-multiple-ai-coding-agents-2dg7</link>
      <guid>https://dev.to/heytalepazguato/a-zero-infrastructure-architecture-for-coordinating-multiple-ai-coding-agents-2dg7</guid>
      <description>&lt;h2&gt;
  
  
  The question that started it
&lt;/h2&gt;

&lt;p&gt;A few months ago, I asked Claude a genuinely idle question: if it could pick a visual for itself, for how it works, how it thinks, how it collaborates with other AI agents, what would it choose?&lt;/p&gt;

&lt;p&gt;Its answer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Each agent is a planet, a massive entity that consumes energy, emits output, and exerts gravitational influence. Tasks orbit as moons. Data flows as ships. At the center, a black hole where completed work collapses. One agent is a lonely planet. Five agents become a solar system.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So I built it. A VS Code extension that rendered every AI coding agent as a planet, data transfers as ships, and completed work spiraling into a black hole. It was pretty. It was cosmetic. It did not save me from the thing that happened next.&lt;/p&gt;

&lt;h2&gt;
  
  
  The moment it broke
&lt;/h2&gt;

&lt;p&gt;Three Claude Code sessions, same repo. One was building the REST API, one was writing tests, and one was updating docs. I was pleased with myself, look at me, parallelizing AI.&lt;/p&gt;

&lt;p&gt;Twenty minutes in, the build broke. I opened &lt;code&gt;server.ts&lt;/code&gt; and saw that session #2 had overwritten session #1's middleware. Neither of them knew. The tests had been written against the old shape; the docs were describing something that no longer existed. I untangled the mess, lost the work, and started over.&lt;/p&gt;

&lt;p&gt;Then I did it again two days later with a different combination of agents.&lt;/p&gt;

&lt;p&gt;That's when I went looking for a multi-agent coordination tool. What I found was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tools that required Docker + Postgres + a dashboard account&lt;/li&gt;
&lt;li&gt;Tools tied to one agent vendor's cloud&lt;/li&gt;
&lt;li&gt;Handwritten scripts that used git worktrees and prayer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of them fit the real shape of the problem, which was small: I had three agents running on my own machine, they needed to not step on each other, and I needed to see what was happening. That's it.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Event Horizon&lt;/strong&gt;, a VS Code extension that does multi-agent orchestration without any of the infrastructure tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "orchestration" actually requires
&lt;/h2&gt;

&lt;p&gt;When I sat down to list the primitives, it was shorter than I expected:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A shared source of truth&lt;/strong&gt;, so agents know what's planned and what's done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A way to prevent collisions&lt;/strong&gt;, so two agents don't write the same file at the same time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A way to communicate&lt;/strong&gt;, so an agent can tell the next one, "I finished, here's what you need to know."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visibility&lt;/strong&gt;, so the human can see what the team is doing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A way to spawn new agents&lt;/strong&gt;, so one agent can delegate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A database would give me (1). A message queue would give me (3). A scheduler would give me (5). None of that was actually necessary. I'll show you what I did instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  (1) Shared source of truth, a markdown file
&lt;/h3&gt;

&lt;p&gt;Event Horizon's plans are just markdown. Here's a real one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Auth overhaul&lt;/span&gt;

&lt;span class="gu"&gt;## File Map&lt;/span&gt;
| File | Action | Responsibility |
|------|--------|----------------|
| &lt;span class="sb"&gt;`src/auth/session.ts`&lt;/span&gt; | Create | Token rotation logic |
| &lt;span class="sb"&gt;`src/auth/middleware.ts`&lt;/span&gt; | Modify | Wire in session.ts |
| &lt;span class="sb"&gt;`tests/auth/session.test.ts`&lt;/span&gt; | Create | Unit tests |

&lt;span class="gu"&gt;## Phase A, implementation&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; [ ] 1.1 Session rotation [role: implementer]
&lt;span class="p"&gt;  -&lt;/span&gt; &lt;span class="gs"&gt;**Files**&lt;/span&gt;: &lt;span class="sb"&gt;`src/auth/session.ts`&lt;/span&gt; (create)
&lt;span class="p"&gt;  -&lt;/span&gt; &lt;span class="gs"&gt;**Do**&lt;/span&gt;: implement &lt;span class="sb"&gt;`rotateSession(userId, oldToken)`&lt;/span&gt;
&lt;span class="p"&gt;  -&lt;/span&gt; &lt;span class="gs"&gt;**Accept**&lt;/span&gt;: returns new token, invalidates old, writes audit log
&lt;span class="p"&gt;  -&lt;/span&gt; &lt;span class="gs"&gt;**Verify**&lt;/span&gt;: &lt;span class="sb"&gt;`pnpm test src/auth/session.test.ts`&lt;/span&gt;
  &lt;span class="c"&gt;&amp;lt;!-- complexity: medium --&amp;gt;&lt;/span&gt;
  &lt;span class="c"&gt;&amp;lt;!-- model: sonnet --&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; [ ] 1.2 Middleware wiring [role: implementer]
&lt;span class="p"&gt;  -&lt;/span&gt; depends: 1.1
&lt;span class="p"&gt;  -&lt;/span&gt; &lt;span class="gs"&gt;**Files**&lt;/span&gt;: &lt;span class="sb"&gt;`src/auth/middleware.ts`&lt;/span&gt; (modify lines ~40-80)
  ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents claim tasks by making an MCP tool call (&lt;code&gt;eh_claim_task&lt;/code&gt;). The file lives in the repo. You diff it. You merge it. You rollback. It survives VS Code restarts because it's a file on disk, and it survives company migrations because it's 80 lines of plain text.&lt;/p&gt;

&lt;p&gt;A task database would give me structured queries. I don't need structured queries; I need something a human can read at any time without opening a dashboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  (2) Collision prevention, a local HTTP call
&lt;/h3&gt;

&lt;p&gt;Agents acquire locks on files before they write. The MCP tool call is &lt;code&gt;eh_acquire_lock&lt;/code&gt;. The implementation is about 60 lines of TypeScript, runs in a local HTTP server on port 28765, and returns in under 1ms.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pseudocode of the core&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;acquireLock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;locks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentId&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;agentId&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isExpired&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;heldBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentId&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;locks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;acquiredAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the orchestrator can't get a lock, the task gets queued. If an agent terminates without releasing, the lock expires after 5 minutes. If you want full isolation, the extension will optionally spawn each agent in its own git worktree instead, and merge on completion.&lt;/p&gt;

&lt;p&gt;A distributed lock service would give me high availability across data centers. I don't have data centers. I have a laptop.&lt;/p&gt;

&lt;h3&gt;
  
  
  (3) Communication, a queue, in RAM
&lt;/h3&gt;

&lt;p&gt;Agents send each other messages via &lt;code&gt;eh_send_message&lt;/code&gt;. Messages sit in a typed queue in memory. Each agent polls its inbox via &lt;code&gt;eh_get_messages&lt;/code&gt; when it's between steps. Delivered-once semantics, because the producer and consumer are on the same machine.&lt;/p&gt;

&lt;p&gt;There's also shared knowledge, a key/value store with temporal validity (&lt;code&gt;validUntil&lt;/code&gt; timestamps), so stale context automatically expires. Backed by SQLite. Runs in the extension host. Never leaves the machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  (4) Visibility, a webview
&lt;/h3&gt;

&lt;p&gt;This is the part where I deviated from the "no infrastructure" pattern, but only a little. The extension ships a React + PixiJS webview that renders every agent as a planet in a cosmic system. Ships fly between cooperating agents when they share work. Lightning arcs appear between two planets when they've both tried to write to the same file.&lt;/p&gt;

&lt;p&gt;I thought the visualization was going to be the cute part. It turned out to be the &lt;strong&gt;most useful debugging tool I've ever built&lt;/strong&gt;. The first time two of my agents got into a lock contention loop, I could see it immediately, lightning arcs firing every two seconds. Without the visualization, I would have stared at logs for half an hour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9i5s5lbypisyx6r7fasw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9i5s5lbypisyx6r7fasw.png" alt="Two AI agents colliding on App.tsx. The lightning arc is the lock service rejecting the second&amp;lt;br&amp;gt;
  writer."&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  (5) Spawning, &lt;code&gt;child_process.spawn&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;When a plan is loaded, the agent that loaded it automatically becomes the orchestrator. It gets an elevated MCP tool: &lt;code&gt;eh_spawn_agent&lt;/code&gt;. The tool takes an agent type, a task assignment, and a working directory. Under the hood:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;term&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vscode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createTerminal&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`agent-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;shellPath&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;resolvedBin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// claude, opencode, cursor&lt;/span&gt;
  &lt;span class="na"&gt;shellArgs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nx"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The new agent runs in a visible VS Code terminal. You can watch what it's doing. You can &lt;code&gt;⌘+C | Ctrl+C&lt;/code&gt; it. You can type follow-ups if the orchestrator spans it in interactive mode. There's no "hidden worker process"; every agent is a terminal you can see.&lt;/p&gt;

&lt;p&gt;This was a deliberate design choice. Early prototypes spawned agents as background processes and piped their output to a panel. It was technically cleaner but psychologically worse: users didn't trust agents they couldn't see. Visible terminals + planet visualizations + file-lock lightning = the team becomes legible.&lt;/p&gt;

&lt;h2&gt;
  
  
  The orchestrator flow, in practice
&lt;/h2&gt;

&lt;p&gt;Here's what actually happens when you use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/eh:create-plan Build a REST API with auth, database layer, and tests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your current Claude session reads the prompt, scopes the work, writes a markdown plan, calls &lt;code&gt;eh_load_plan&lt;/code&gt;, and calls &lt;code&gt;eh_claim_orchestrator&lt;/code&gt;. It is now the orchestrator.&lt;/p&gt;

&lt;p&gt;Then it reads the plan, groups tasks by dependencies, and decides it needs three workers: an implementer, a tester, and a reviewer. It calls &lt;code&gt;eh_spawn_agent&lt;/code&gt; three times. Three new terminals open. Three planets appear next to the orchestrator star.&lt;/p&gt;

&lt;p&gt;Each worker calls &lt;code&gt;eh_claim_task&lt;/code&gt; with a task ID, claims a lock on the files it'll touch, does the work, marks the task done, and sends a message back to the orchestrator. If a task fails verification (the &lt;code&gt;**Verify:**&lt;/code&gt; command in the plan), the extension auto-retries with a more expensive model (haiku → sonnet → opus). If it still fails, the orchestrator gets a notification and decides what to do.&lt;/p&gt;

&lt;p&gt;Meanwhile, a &lt;strong&gt;budget gauge&lt;/strong&gt; fills up as tokens are spent. A &lt;strong&gt;context fuel gauge&lt;/strong&gt; on each planet shows how close that agent is to its context window limit. A &lt;strong&gt;Cost Insights&lt;/strong&gt; panel shows cache-hit ratios, duplicate reads, and where the money is going.&lt;/p&gt;

&lt;p&gt;When the plan is done, you see a Kanban board with everything green, a cost total, and the commit history of each worker. The terminals are still there. You can inspect, kill, or keep working.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I didn't build
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the limits, because the pitch so far sounds too good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not built:&lt;/strong&gt; cross-machine coordination. Event Horizon only works inside one VS Code window. If you want a team of humans sharing an agent team, you need something else. That's the legitimate use case for a server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not built:&lt;/strong&gt; formal verification that the lock/queue/knowledge primitives are race-free at scale. They work well for 3–5 agents. I haven't tried 50. The design is local-machine-first, and I suspect you'd hit limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not built:&lt;/strong&gt; the visualization isn't free on CPU. Running it with 20 planets + heavy traffic uses a few percent CPU. Fine on a laptop. Might annoy a battery-paranoid user.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack + licensing
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core&lt;/strong&gt;: TypeScript, zero runtime deps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Renderer&lt;/strong&gt;: PixiJS v8&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI&lt;/strong&gt;: React + Zustand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt;: sql.js (SQLite as WASM), everything local, no native build&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IPC&lt;/strong&gt;: local HTTP (port 28765) + MCP over stdio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editors supported&lt;/strong&gt;: VS Code, Cursor, VSCodium, Windsurf, Gitpod, Eclipse Theia, Coder (one Open VSX publish reaches all of them)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MIT licensed. Code at &lt;a href="https://github.com/HeytalePazguato/event-horizon" rel="noopener noreferrer"&gt;github.com/HeytalePazguato/event-horizon&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway I keep coming back to
&lt;/h2&gt;

&lt;p&gt;The infrastructure tax, Docker, Postgres, accounts, and dashboards weren't there because multi-agent coordination is hard. It was there because the tools were designed for multi-team environments where those pieces had to exist anyway. When you solve for a single developer on a single machine, 90% of the "infrastructure" folds into a local HTTP server, a markdown file, and an MCP tool schema.&lt;/p&gt;

&lt;p&gt;I didn't want to run Postgres to coordinate three Claude instances. Turns out I didn't have to.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it:&lt;/strong&gt; Install from the &lt;a href="https://marketplace.visualstudio.com/items?itemName=HeytalePazguato.event-horizon-vscode" rel="noopener noreferrer"&gt;VS Code Marketplace&lt;/a&gt; or &lt;a href="https://open-vsx.org/extension/HeytalePazguato/event-horizon-vscode" rel="noopener noreferrer"&gt;Open VSX&lt;/a&gt;. Ships with hooks for Claude Code, OpenCode, GitHub Copilot, and Cursor; mix and match freely.&lt;/p&gt;

&lt;p&gt;If this resonates, &lt;strong&gt;&lt;a href="https://github.com/HeytalePazguato/event-horizon" rel="noopener noreferrer"&gt;star the repo&lt;/a&gt;&lt;/strong&gt; so others can find it.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>vscode</category>
    </item>
  </channel>
</rss>
