DEV Community: HeytalePazguato

Deterministic by design: code review without an LLM

HeytalePazguato — Tue, 26 May 2026 13:00:00 +0000

Every code review tool launched in the last two years seems to lead with the same word: AI. Point a model at a diff, get back prose about what might be wrong. For a lot of code, that is genuinely useful.

I built a code review tool recently, and I deliberately left the LLM out. Not because I dislike them, I use them daily, but because the code I was targeting has a property that makes a non-deterministic reviewer the wrong tool: it runs machines, and a wrong or inconsistent answer has a physical cost.

This is about that decision, why determinism mattered more than fluency for this case, and where I think an LLM still earns a place.

The case study: industrial control code

The tool, plc-st-review, reviews IEC 61131-3 Structured Text. That is the language in which a large share of the world's factories, water plants, and process lines are programmed in. A bug here is not a 500 on a web page. It is a conveyor that runs too fast, a safety interlock whose timeout quietly changed, or a pump that never starts.

The famous extreme of this is Stuxnet. It quietly altered the PLC logic driving uranium enrichment centrifuges in Iran so they spun at damaging speeds, while replaying normal sensor readings back to the operators so nothing looked wrong. No explosion, just centrifuges tearing themselves apart over months. That was deliberate, state-built malware engineered to hide itself, so to be clear, no linter would have caught it. But you do not need a nation-state attacker to get the physical version of a wrong number. A timer preset typed as T#200ms instead of T#2s in an ordinary change does it too, and that is exactly the kind of thing a code review is supposed to catch and routinely misses.

You do not need to know Structured Text to follow the argument. The point is the constraint: this is code where "probably fine" is not an acceptable review result, and where the same input has to produce the same answer every single time.

Why determinism beats fluency here

A linter you gate a CI pipeline on makes a promise: the same code produces the same findings, today and in six months, on my machine and on the build server. That promise is what lets a team say "the build is red, the merge is blocked" and trust it.

An LLM reviewer cannot make that promise. The same diff can produce different output across runs. It can hallucinate a problem that is not there, or miss one that is. Temperature, model version, and context window all move the result. For an exploratory review, that is a fine trade. For a merge gate on safety-relevant code, it is disqualifying, because a gate that sometimes blocks and sometimes does not is not a gate.

Determinism bought me four things that matter more than natural language here:

Reproducibility. Every finding is a pure function of the parse tree. Run it a thousand times, get the same result a thousand times. CI can depend on it.

Auditability. When the tool flags something, it points to a named rule and the exact node that triggered it. In a regulated environment, someone will eventually ask, "Why did this fail?" "A rule named TIMER_VALUE_CHANGED fired because the PT went from T#2s to T#200ms" is an answer. "The model felt it looked risky" is not.

No data leaving the building. Industrial shops are, correctly, paranoid about shipping control code to a third-party API. A tool that parses locally and calls nothing external clears that bar without a procurement fight.

Cost and latency that round to zero. It parses and walks a tree. No tokens, no rate limits, no per-review bill. It runs on every push without anyone watching the meter.

How it actually works

There is no magic, which is the point. The pipeline is boring on purpose:

Parse each .st file into a syntax tree with a tree-sitter grammar. Real parsing, not regex on text.
Build a symbol table per revision: every program unit and its parameter signature, global variables, enums, timer instances, call sites, CASE statements.
Hand that structured model to each check. A check is a small, self-contained function that looks at the tree and the symbol table and returns findings.
For pull request review, do all of the above for both the before and after versions of a change, and diff the two models.

That last step is where it earns its keep. A single-revision analyzer can tell you a timer exists. Comparing two revisions tells you the timer's preset went from two seconds to two hundred milliseconds in this specific commit, ten times faster, which is exactly the kind of one-character typo that passes a visual review and trips a machine in production.

A few more examples of what falls out of having a real model instead of text matching:

A function block instance whose outputs you read but that nothing ever calls, so you are reading stale values.
A literal array index outside the declared bounds.
A constant whose name starts with SAFETY_ whose value changed, flagged at a higher severity because of the prefix.
A function that grew a required input while only some of its call sites were updated.

None of those needs a language model. They need a correct model of the code and a rule.

Where the LLM does belong

This is the part I want to be honest about, because "no AI" as a dogma is just the inverse mistake.

There is one place an LLM clearly helps: explaining a finding to someone who is not a domain expert. A junior engineer reading EDGE_TRIG_REUSED may not know why feeding one R_TRIG instance from two different clock expressions is a problem. A model is great at turning a terse, correct finding into a paragraph of plain English.

So the design rule I settled on is: the LLM never originates a finding. It only paraphrases one that the deterministic engine has already produced and grounded in a specific node. Determinism remains the source of truth; the model is an optional translation layer on top. That keeps the gate trustworthy while still making the output approachable. It is on the roadmap as a strictly additive --explain flag, off by default, never in the path that decides pass or fail.

That boundary, the model can explain but never decide, is the whole thesis. Let the deterministic core own correctness and the merge gate. Let the LLM own fluency, where being occasionally wrong costs nothing.

The takeaway, beyond PLCs

The reflex right now is to reach for a model first and ask what it should not touch later. I think it is worth inverting that for any code where a review result gates something real: decide what must be deterministic and auditable, build that part without the model, and add the LLM only where a wrong answer is cheap.

Not everything should be reviewed by an AI. Some things should be reviewed by a rule that gives the same answer every time, and can tell you exactly why.

The tool is open source (MIT) if you want to see the checks: https://github.com/HeytalePazguato/plc-st-review

I would be curious where other people draw this line. What in your stack do you keep deterministic on purpose, and where have you let a model in?

A local-first project knowledge graph for AI coding agents

HeytalePazguato — Tue, 05 May 2026 12:00:00 +0000

The problem worth solving

AI coding agents are good at solving small problems and bad at situating them. Ask Claude Code to "rename getUserSession and update every caller" in a 50,000-line codebase, and the answer depends on whether the agent can see the call graph or has to grep for it.

Most tools fix this with cloud-synced code intelligence. Sourcegraph, Cody, Cursor's index, Continue's RAG. They all work, and they all impose the same trade-off: your code goes to a service, an account, and a continuous indexing job.

I wanted code intelligence without that trade-off, so I built it as a local SQLite file with a single trigger and no background work. This post is a write-up of the design choices that made it possible.

What "local-first project knowledge graph" actually means

In Event Horizon v3, every workspace gets a graph stored at <workspace>/.eh/graph.db. The graph holds:

Functions, classes, interfaces, methods (nodes)
Calls, imports, extends, implements (edges)
Markdown documentation as nodes linked to source files
Code-comment rationale (// WHY:, TODO, FIXME, JSDoc/TSDoc, Python docstrings, C# XML doc) attached to the function or class they describe
Agent activity as graph data: every completed task creates an agent_activity node with touched/authored edges to the files it modified
Shared knowledge entries as graph nodes with references edges to the code they mention

The graph is built and refreshed only by user-invoked skills, never by background processes. /eh:optimize-context builds or rebuilds it on demand. /eh:orchestrate and /eh:work-on-plan refresh it automatically when they finish, using the list of files their workers touched. There is no autoscan. There is no file watcher. Activation does not touch the disk. Every refresh is the consequence of a skill the user explicitly ran.

The architectural choices that make this work

Tree-sitter WASM, five languages, no native build

Code structure extraction runs through tree-sitter compiled to WebAssembly. Adds about 3 MB of grammars to the VSIX, no node-gyp, no platform-specific binaries. The shipped grammars cover TypeScript, JavaScript, TSX, PHP, Python, and C#. PHP traits and enums are first-class. Python decorators, docstrings, and # TODO / # FIXME / # WHY rationale comments land in the graph. C# records, structs, enums, and XML doc comments land too.

SHA256-based incremental skip

Every file's content hash is stored alongside the graph nodes it produced. On rebuild, files whose hash hasn't changed since the last build are skipped entirely. A re-run of /eh:optimize-context on a clean tree is close to free.

Vendor and minified file skipping

The scanner refuses to index vendor/, __pycache__/, .venv/, bin/, obj/, target/, *.min.js, *.bundle.js, *.designer.cs, and a handful of similar patterns. There is also a "first non-empty line longer than 1000 characters" check that catches inline-bundled vendor scripts that don't follow naming conventions. This drops graph node count by 50 to 80 percent on Laravel, Symfony, and .NET projects, where the vendor/ and bin/ folders are usually larger than the actual source.

Provenance on every inferred edge

I haven't seen this in any other open-source code-intelligence tool. Every edge in the graph carries:

A provenance tag: EXTRACTED (deterministic from AST), INFERRED (heuristic), AMBIGUOUS (multiple resolutions possible)
A confidence score (0 to 1)

When an agent queries the graph and reads a result, it can decide how much to trust an edge. An EXTRACTED 0.99 callee is reliable. An AMBIGUOUS 0.4 callee is a hint. The agent can act on the hint or ask for more context.

Shrink-guard

There's a small but practical guard in the extractor: if a rebuild would delete more than 50 percent of a file's prior nodes, the rebuild is rejected. This protects against extractor regressions, silently shrinking the graph during an upgrade.

What runs, when

The full lifecycle of the graph in Event Horizon v3 is small enough to fit in two paragraphs.

You open VS Code, the extension activates, and nothing happens on disk. You ask Claude Code (or OpenCode, or Copilot, or Cursor, all four are supported) to run /eh:optimize-context for a task. The skill builds or refreshes the graph, hands the agent the relevant slice of nodes and edges, and the agent uses the slice as context. When the agent finishes the task and emits task.complete, an agent_activity node is added with touched edges to every file it modified. When you run /eh:orchestrate or /eh:work-on-plan to coordinate multiple workers, the orchestration tracks every file its workers touched, and refreshes the graph against that list automatically before reporting its summary. No need to re-run /eh:optimize-context after every plan; the graph reflects reality as soon as the orchestrator finishes.

No background jobs. No autoscan. No telemetry. No outbound LLM calls from Event Horizon itself; agents that opt into LLM-based concept extraction (eh_extract_concepts) spend their own tokens.

Querying the graph

Five MCP tools wrap the graph for agent use:

eh_query_graph does search, callers, callees, neighbors, shortest path, explain, and recent activity.
eh_extract_concepts runs an opt-in LLM extraction pass when the agent wants higher-level concepts on top of the AST.
eh_build_graph triggers a manual rebuild from the agent side.
eh_curate_context selects a task-aware slice of the graph that fits within a token budget.
eh_rescan_files takes a path list and re-extracts only those files, runs the resolution pass once, and returns a scan summary. This is what powers the orchestrate-end auto-refresh, and it is also available to any agent that needs a targeted refresh after writing files.

eh_curate_context is the one that pays for everything else. It is the difference between an agent asking "show me everything related to authentication" and getting a 200,000-token dump, versus asking the same question and getting a 4,000-token slice that names the right functions, the right callers, and the relevant rationale comments.

Visualization

Like every other graph tool, this one has a canvas. Unlike most of them, the canvas is in a VS Code webview, not a browser tab on a remote service. The Knowledge tab renders rounded-square nodes (color-coded by type), straight edges, soft cyan glow halos on a dark blueprint grid background. Force-directed initial layout. Click a node to open a 320 px detail drawer with callers, callees, references, rationale, recent agent activity, and a "Reveal in editor" button that jumps to the source file. Pan with mouse drag, zoom with wheel.

The webview hydrates on connect, so reopening the panel shows the existing graph immediately. It re-fetches automatically whenever a build or refresh finishes, whether triggered by /eh:optimize-context or by an orchestration ending.

The reasoning behind "no autoscan"

I want to be honest about why the graph builds only when you ask. Background indexing is the normal pattern. JetBrains' Indexer, VS Code's reference indexes, and Sourcegraph's batch jobs all run continuously. The trade-off is that you pay for activity you didn't request: CPU cycles, disk writes, sometimes telemetry.

For this tool, the cost-benefit is different. The graph isn't there to power autocomplete; it is there to give an AI agent context for a specific task. The cadence of "build the graph" is the same as the cadence of "I am starting a non-trivial task". That is a few times a day, not a thousand times a day. Coupling the build to the slash command means:

Predictable resource use: zero CPU until you ask.
The graph reflects an explicit moment in time: the moment you decided to start a task. No drift between what the agent saw and what the codebase looked like five minutes later.
One graph. One rebuild. One file at <workspace>/.eh/graph.db. Easy to reason about the state.

What this design gives up

I want to be honest about the trade-offs:

No real-time updates. The graph reflects the moment of the last build or refresh. The orchestrate-end auto-refresh covers the most common drift case (a long-running plan that touched many files), but a single agent editing files outside an orchestration still sees a stale view until the next explicit rebuild.
No cross-machine sharing. The graph file lives on your laptop. Teams that want a shared code-intelligence backend need a server. There is no way around that.
Tied to tree-sitter coverage. Languages without a tree-sitter grammar in the shipped set (Go, Java, Ruby, Rust) are not yet in the graph. The dispatcher is per-language, so adding grammar is a few hundred lines, but it has to be done per language.

These are real limits. For a solo developer running 3 to 5 AI agents on their own machine, none of them dominate. For a 50-person engineering org, several do.

The takeaway

Most code-intelligence tools assume "constant background work" is the price of context. For a single developer giving an AI agent context to do a task, it isn't. A SQLite file, a one-shot extractor, and a slash command cover the actual use case. Activation doesn't touch the disk. The graph builds only when you ask. The agent gets a curated slice via MCP. Nothing leaves the laptop unless an agent you opted into makes its own LLM call.

If that architectural stance interests you, Event Horizon is open source and on the VS Code Marketplace. v3 ships the graph and the orchestrate-end auto-refresh that keeps it current as your agents work, without ever installing a file watcher. Star the repo if you want to follow the next pieces (more languages, smarter slicing, agent-driven graph mutations).

Try it: Install from the VS Code Marketplace, or Open VSX for Cursor, VSCodium, and Windsurf.

Source: github.com/HeytalePazguato/event-horizon (MIT)

A zero-infrastructure architecture for coordinating multiple AI coding agents

HeytalePazguato — Thu, 23 Apr 2026 12:00:00 +0000

The question that started it

A few months ago, I asked Claude a genuinely idle question: if it could pick a visual for itself, for how it works, how it thinks, how it collaborates with other AI agents, what would it choose?

Its answer:

Each agent is a planet, a massive entity that consumes energy, emits output, and exerts gravitational influence. Tasks orbit as moons. Data flows as ships. At the center, a black hole where completed work collapses. One agent is a lonely planet. Five agents become a solar system.

So I built it. A VS Code extension that rendered every AI coding agent as a planet, data transfers as ships, and completed work spiraling into a black hole. It was pretty. It was cosmetic. It did not save me from the thing that happened next.

The moment it broke

Three Claude Code sessions, same repo. One was building the REST API, one was writing tests, and one was updating docs. I was pleased with myself, look at me, parallelizing AI.

Twenty minutes in, the build broke. I opened server.ts and saw that session #2 had overwritten session #1's middleware. Neither of them knew. The tests had been written against the old shape; the docs were describing something that no longer existed. I untangled the mess, lost the work, and started over.

Then I did it again two days later with a different combination of agents.

That's when I went looking for a multi-agent coordination tool. What I found was:

Tools that required Docker + Postgres + a dashboard account
Tools tied to one agent vendor's cloud
Handwritten scripts that used git worktrees and prayer

None of them fit the real shape of the problem, which was small: I had three agents running on my own machine, they needed to not step on each other, and I needed to see what was happening. That's it.

So I built Event Horizon, a VS Code extension that does multi-agent orchestration without any of the infrastructure tax.

What "orchestration" actually requires

When I sat down to list the primitives, it was shorter than I expected:

A shared source of truth, so agents know what's planned and what's done.
A way to prevent collisions, so two agents don't write the same file at the same time.
A way to communicate, so an agent can tell the next one, "I finished, here's what you need to know."
Visibility, so the human can see what the team is doing.
A way to spawn new agents, so one agent can delegate.

A database would give me (1). A message queue would give me (3). A scheduler would give me (5). None of that was actually necessary. I'll show you what I did instead.

(1) Shared source of truth, a markdown file

Event Horizon's plans are just markdown. Here's a real one:

# Auth overhaul

## File Map
| File | Action | Responsibility |
|------|--------|----------------|
| `src/auth/session.ts` | Create | Token rotation logic |
| `src/auth/middleware.ts` | Modify | Wire in session.ts |
| `tests/auth/session.test.ts` | Create | Unit tests |

## Phase A, implementation

- [ ] 1.1 Session rotation [role: implementer]
  - **Files**: `src/auth/session.ts` (create)
  - **Do**: implement `rotateSession(userId, oldToken)`
  - **Accept**: returns new token, invalidates old, writes audit log
  - **Verify**: `pnpm test src/auth/session.test.ts`
  <!-- complexity: medium -->
  <!-- model: sonnet -->

- [ ] 1.2 Middleware wiring [role: implementer]
  - depends: 1.1
  - **Files**: `src/auth/middleware.ts` (modify lines ~40-80)
  ...

Agents claim tasks by making an MCP tool call (eh_claim_task). The file lives in the repo. You diff it. You merge it. You rollback. It survives VS Code restarts because it's a file on disk, and it survives company migrations because it's 80 lines of plain text.

A task database would give me structured queries. I don't need structured queries; I need something a human can read at any time without opening a dashboard.

(2) Collision prevention, a local HTTP call

Agents acquire locks on files before they write. The MCP tool call is eh_acquire_lock. The implementation is about 60 lines of TypeScript, runs in a local HTTP server on port 28765, and returns in under 1ms.

// Pseudocode of the core
function acquireLock(agentId: string, filePath: string) {
  const existing = locks.get(filePath);
  if (existing && existing.agentId !== agentId && !isExpired(existing)) {
    return { ok: false, heldBy: existing.agentId };
  }
  locks.set(filePath, { agentId, acquiredAt: Date.now() });
  return { ok: true };
}

If the orchestrator can't get a lock, the task gets queued. If an agent terminates without releasing, the lock expires after 5 minutes. If you want full isolation, the extension will optionally spawn each agent in its own git worktree instead, and merge on completion.

A distributed lock service would give me high availability across data centers. I don't have data centers. I have a laptop.

(3) Communication, a queue, in RAM

Agents send each other messages via eh_send_message. Messages sit in a typed queue in memory. Each agent polls its inbox via eh_get_messages when it's between steps. Delivered-once semantics, because the producer and consumer are on the same machine.

There's also shared knowledge, a key/value store with temporal validity (validUntil timestamps), so stale context automatically expires. Backed by SQLite. Runs in the extension host. Never leaves the machine.

(4) Visibility, a webview

This is the part where I deviated from the "no infrastructure" pattern, but only a little. The extension ships a React + PixiJS webview that renders every agent as a planet in a cosmic system. Ships fly between cooperating agents when they share work. Lightning arcs appear between two planets when they've both tried to write to the same file.

I thought the visualization was going to be the cute part. It turned out to be the most useful debugging tool I've ever built. The first time two of my agents got into a lock contention loop, I could see it immediately, lightning arcs firing every two seconds. Without the visualization, I would have stared at logs for half an hour.

(5) Spawning, `child_process.spawn`

When a plan is loaded, the agent that loaded it automatically becomes the orchestrator. It gets an elevated MCP tool: eh_spawn_agent. The tool takes an agent type, a task assignment, and a working directory. Under the hood:

const term = vscode.window.createTerminal({
  name: `agent-${id}`,
  shellPath: resolvedBin,  // claude, opencode, cursor
  shellArgs: [...prompts, ...flags],
});

The new agent runs in a visible VS Code terminal. You can watch what it's doing. You can ⌘+C | Ctrl+C it. You can type follow-ups if the orchestrator spans it in interactive mode. There's no "hidden worker process"; every agent is a terminal you can see.

This was a deliberate design choice. Early prototypes spawned agents as background processes and piped their output to a panel. It was technically cleaner but psychologically worse: users didn't trust agents they couldn't see. Visible terminals + planet visualizations + file-lock lightning = the team becomes legible.

The orchestrator flow, in practice

Here's what actually happens when you use it:

/eh:create-plan Build a REST API with auth, database layer, and tests

Your current Claude session reads the prompt, scopes the work, writes a markdown plan, calls eh_load_plan, and calls eh_claim_orchestrator. It is now the orchestrator.

Then it reads the plan, groups tasks by dependencies, and decides it needs three workers: an implementer, a tester, and a reviewer. It calls eh_spawn_agent three times. Three new terminals open. Three planets appear next to the orchestrator star.

Each worker calls eh_claim_task with a task ID, claims a lock on the files it'll touch, does the work, marks the task done, and sends a message back to the orchestrator. If a task fails verification (the **Verify:** command in the plan), the extension auto-retries with a more expensive model (haiku → sonnet → opus). If it still fails, the orchestrator gets a notification and decides what to do.

Meanwhile, a budget gauge fills up as tokens are spent. A context fuel gauge on each planet shows how close that agent is to its context window limit. A Cost Insights panel shows cache-hit ratios, duplicate reads, and where the money is going.

When the plan is done, you see a Kanban board with everything green, a cost total, and the commit history of each worker. The terminals are still there. You can inspect, kill, or keep working.

What I didn't build

I want to be honest about the limits, because the pitch so far sounds too good.

Not built: cross-machine coordination. Event Horizon only works inside one VS Code window. If you want a team of humans sharing an agent team, you need something else. That's the legitimate use case for a server.

Not built: formal verification that the lock/queue/knowledge primitives are race-free at scale. They work well for 3–5 agents. I haven't tried 50. The design is local-machine-first, and I suspect you'd hit limits.

Not built: the visualization isn't free on CPU. Running it with 20 planets + heavy traffic uses a few percent CPU. Fine on a laptop. Might annoy a battery-paranoid user.

Stack + licensing

Core: TypeScript, zero runtime deps
Renderer: PixiJS v8
UI: React + Zustand
Persistence: sql.js (SQLite as WASM), everything local, no native build
IPC: local HTTP (port 28765) + MCP over stdio
Editors supported: VS Code, Cursor, VSCodium, Windsurf, Gitpod, Eclipse Theia, Coder (one Open VSX publish reaches all of them)

MIT licensed. Code at github.com/HeytalePazguato/event-horizon.

The takeaway I keep coming back to

The infrastructure tax, Docker, Postgres, accounts, and dashboards weren't there because multi-agent coordination is hard. It was there because the tools were designed for multi-team environments where those pieces had to exist anyway. When you solve for a single developer on a single machine, 90% of the "infrastructure" folds into a local HTTP server, a markdown file, and an MCP tool schema.

I didn't want to run Postgres to coordinate three Claude instances. Turns out I didn't have to.

Try it: Install from the VS Code Marketplace or Open VSX. Ships with hooks for Claude Code, OpenCode, GitHub Copilot, and Cursor; mix and match freely.

If this resonates, star the repo so others can find it.