<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dmitry Bondarchuk</title>
    <description>The latest articles on DEV Community by Dmitry Bondarchuk (@ubcent).</description>
    <link>https://dev.to/ubcent</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3794488%2Fa64386d6-ba15-4c51-9552-d131b79d23fc.png</url>
      <title>DEV Community: Dmitry Bondarchuk</title>
      <link>https://dev.to/ubcent</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ubcent"/>
    <language>en</language>
    <item>
      <title>My AI Micromanager Got a Body</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Sun, 12 Apr 2026 17:55:33 +0000</pubDate>
      <link>https://dev.to/ubcent/my-ai-micromanager-got-a-body-2c9p</link>
      <guid>https://dev.to/ubcent/my-ai-micromanager-got-a-body-2c9p</guid>
      <description>&lt;p&gt;&lt;em&gt;A follow-up to &lt;a href="https://dev.to/ubcent/i-built-an-ai-micromanager-that-bullies-claude-code-500a"&gt;I Built an AI Micromanager That Bullies Claude Code&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;So a week ago I built a text-to-speech micromanager for the DEV April Fools' Challenge. It would nag Claude Code with escalating passive-aggressive remarks until your task was done. Dialog boxes. Desktop notifications. "The board has been notified." The whole bit.&lt;/p&gt;

&lt;p&gt;That was fun. Then it was Sunday. I had nothing going on.&lt;/p&gt;

&lt;p&gt;You can probably see where this is going.&lt;/p&gt;

&lt;h2&gt;
  
  
  The natural next step
&lt;/h2&gt;

&lt;p&gt;The TTS version had no face. No physical presence. It was just a disembodied voice yelling at you. Relatable, sure — but incomplete. A real micromanager needs to &lt;em&gt;loom&lt;/em&gt;. They need to pace. They need to make you feel observed even when nothing is being said.&lt;/p&gt;

&lt;p&gt;So I gave him a body.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/jQnL3jWkz24"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ai-micromanager&lt;/code&gt; now ships with a pixel-art mascot that appears above your terminal window and gets progressively more unhinged the longer Claude takes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What he does
&lt;/h2&gt;

&lt;p&gt;The escalation follows a strict corporate timeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Idle&lt;/strong&gt; — he just stands there. Breathing. Watching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stomping&lt;/strong&gt; — foot tapping begins. He's noticed the time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pacing&lt;/strong&gt; — walks back and forth above your terminal window. Side to side. Relentless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Status updates&lt;/strong&gt; — random speech bubbles. &lt;em&gt;"Any updates?"&lt;/em&gt; &lt;em&gt;"Can you at least give me a percentage?"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whip phase&lt;/strong&gt; — yes. He has a whip now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finally&lt;/strong&gt; — when the task completes, he delivers a sarcastic closing line and goes back to idle.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you start the mascot mid-task, it jumps straight to the correct phase based on elapsed time. He's always aware of how late you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Same hook architecture as before — Claude Code supports lifecycle hooks that fire on events like &lt;code&gt;PreToolUse&lt;/code&gt; and &lt;code&gt;Stop&lt;/code&gt;. The Python hook writes a timestamp to a temp file when work starts and clears it when it ends.&lt;/p&gt;

&lt;p&gt;The new part is a native macOS Swift app that polls that file every 500ms, detects the active terminal window (Terminal, iTerm2, Warp, Ghostty, and a few others), and positions an overlay window directly above it. The mascot lives in that overlay, running a 30fps animation loop tied to elapsed task time.&lt;/p&gt;

&lt;p&gt;No external dependencies. No cloud. Just a tiny man with a whip and a very short fuse.&lt;/p&gt;

&lt;p&gt;The pixel art was generated with &lt;a href="https://www.pixellab.ai" rel="noopener noreferrer"&gt;PixelLab&lt;/a&gt; — genuinely excellent tool if you've ever wanted sprites without learning to draw.&lt;/p&gt;

&lt;h2&gt;
  
  
  Was this necessary
&lt;/h2&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;But it was Sunday, and the alternative was doing something useful, and here we are.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/ubcent/ai-micromanager" rel="noopener noreferrer"&gt;github.com/ubcent/ai-micromanager&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No AI agents were harmed in the making of this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>humor</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>I Built an AI Micromanager That Bullies Claude Code</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:07:23 +0000</pubDate>
      <link>https://dev.to/ubcent/i-built-an-ai-micromanager-that-bullies-claude-code-500a</link>
      <guid>https://dev.to/ubcent/i-built-an-ai-micromanager-that-bullies-claude-code-500a</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/aprilfools-2026"&gt;DEV April Fools Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;code&gt;ai-micromanager&lt;/code&gt;, a small tool that answers an important question:&lt;/p&gt;

&lt;p&gt;What if your AI coding assistant had a manager so catastrophically unnecessary that even HR would ask it to "take a more human tone"?&lt;/p&gt;

&lt;p&gt;Most AI tools are obsessed with helping.&lt;br&gt;
They autocomplete your code.&lt;br&gt;
They fix your tests.&lt;br&gt;
They explain monads with the confidence of a man who has never paid taxes.&lt;/p&gt;

&lt;p&gt;That is not what I wanted.&lt;/p&gt;

&lt;p&gt;I wanted realism.&lt;/p&gt;

&lt;p&gt;I wanted the authentic modern software experience.&lt;/p&gt;

&lt;p&gt;I wanted Claude Code to feel like it was trying to refactor a Python file while a regional director of Strategic Alignment stood behind it breathing through his nose and saying, "Do we have an ETA on this?"&lt;/p&gt;

&lt;p&gt;So I built a joke hook for Claude Code that activates whenever a task takes longer than five seconds.&lt;/p&gt;

&lt;p&gt;After that, every five seconds, it does three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;speaks a passive-aggressive management line out loud&lt;/li&gt;
&lt;li&gt;sends a desktop notification&lt;/li&gt;
&lt;li&gt;opens a blocking dialog box, because tyranny should be multisensory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first it sounds supportive, in the way a bear trap is technically supportive of your leg staying in one place.&lt;/p&gt;

&lt;p&gt;Then it escalates.&lt;/p&gt;

&lt;p&gt;It starts with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Just checking in, any updates?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What's the blocker here?"&lt;/li&gt;
&lt;li&gt;"This is impacting sprint velocity."&lt;/li&gt;
&lt;li&gt;"Leadership is asking for visibility."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And eventually it reaches its final corporate form:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I'm setting up a bridge call."&lt;/li&gt;
&lt;li&gt;"The board has been notified."&lt;/li&gt;
&lt;li&gt;"This is the worst thing that has ever happened to Q4."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So now, instead of simply writing code, your AI assistant gets to enjoy the full dignity of modern knowledge work:&lt;/p&gt;

&lt;p&gt;being interrupted by somebody whose entire skill set is converting one small delay into a company-wide weather event.&lt;/p&gt;

&lt;p&gt;This is not productivity software.&lt;/p&gt;

&lt;p&gt;This is workplace folklore in executable form.&lt;/p&gt;

&lt;p&gt;This is not a tool.&lt;/p&gt;

&lt;p&gt;This is an emotionally active org chart.&lt;/p&gt;
&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Observe a machine being managed with the intensity usually reserved for launch failures, data breaches, and a typo in a slide deck seen by a VP:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/7sZz8hzHFJQ"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The key feature is that the manager voice becomes more stressed over time, which means the software does not merely interrupt you.&lt;/p&gt;

&lt;p&gt;It develops a narrative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The repo is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ubcent/ai-micromanager" rel="noopener noreferrer"&gt;https://github.com/ubcent/ai-micromanager&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The implementation is beautifully petty.&lt;/p&gt;

&lt;p&gt;There are two main Python files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;hooks/micromanager.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hooks/micromanager_nag.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One script listens for Claude Code hook events and decides when to start or stop the nonsense.&lt;/p&gt;

&lt;p&gt;The other script is the nonsense.&lt;/p&gt;

&lt;p&gt;That is the architecture.&lt;/p&gt;

&lt;p&gt;No cloud.&lt;br&gt;
No vector database.&lt;br&gt;
No agent swarm.&lt;br&gt;
No tasteful dashboard with rounded corners and the word "insights" in the top left.&lt;/p&gt;

&lt;p&gt;Just Python, timers, system dialogs, and the steady moral collapse of a machine that wanted to help and instead got assigned a stakeholder.&lt;/p&gt;

&lt;p&gt;The control flow is very simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Claude starts working.&lt;/li&gt;
&lt;li&gt;The hook starts a background nagging process.&lt;/li&gt;
&lt;li&gt;If Claude keeps working for more than five seconds, the manager begins its performance.&lt;/li&gt;
&lt;li&gt;Every five seconds the machine receives another demand for visibility, alignment, clarity, ownership, urgency, or healing.&lt;/li&gt;
&lt;li&gt;When the task finally ends, the manager stops, but not before one final closing remark to ensure the emotional damage lands cleanly.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It is essentially a watchdog, if the watchdog had gone to business school and described itself as "results-driven."&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;I built it with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Claude Code hooks&lt;/li&gt;
&lt;li&gt;macOS &lt;code&gt;say&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;osascript&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;an amount of personal experience that is difficult to discuss in a safe and constructive environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hook entry point watches for Claude Code lifecycle events:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PreToolUse&lt;/code&gt; starts the background process&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Stop&lt;/code&gt; kills it and cleans up state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The nagger process is where the art happens.&lt;/p&gt;

&lt;p&gt;It has a fixed escalation ladder of management dialogue, and it delivers each line with increasingly fast speech.&lt;/p&gt;

&lt;p&gt;This was important.&lt;/p&gt;

&lt;p&gt;I did not want a calm manager.&lt;/p&gt;

&lt;p&gt;A calm manager can be reasoned with.&lt;/p&gt;

&lt;p&gt;I wanted the specific energy of a man who says "Can we take this offline?" about a problem that is currently on fire in front of everyone.&lt;/p&gt;

&lt;p&gt;I wanted the tone of somebody who schedules a 7:30 AM sync called &lt;code&gt;Quick touch base&lt;/code&gt; and then opens with, "A few folks have concerns."&lt;/p&gt;

&lt;p&gt;I wanted the software equivalent of a Slack message that says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hey, just bubbling this up.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and somehow causes your spine to factory reset.&lt;/p&gt;

&lt;p&gt;I also wrote tests, because if you are going to build a fake manager from hell, you should still maintain professional standards.&lt;/p&gt;

&lt;p&gt;The tests verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;startup and shutdown behavior&lt;/li&gt;
&lt;li&gt;escalation timing&lt;/li&gt;
&lt;li&gt;cleanup of temp files&lt;/li&gt;
&lt;li&gt;the final sarcastic sendoff&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words, this project has better QA coverage than several actual managers I have met.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prize Category
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Community Favorite&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This project should win Community Favorite because it unites developers across languages, stacks, time zones, and trauma backgrounds.&lt;/p&gt;

&lt;p&gt;You might write:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python&lt;/li&gt;
&lt;li&gt;Rust&lt;/li&gt;
&lt;li&gt;TypeScript&lt;/li&gt;
&lt;li&gt;Go&lt;/li&gt;
&lt;li&gt;COBOL in a basement under a government building&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But no matter who you are, you understand the universal horror of these phrases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"gentle reminder"&lt;/li&gt;
&lt;li&gt;"friendly ping"&lt;/li&gt;
&lt;li&gt;"circling back"&lt;/li&gt;
&lt;li&gt;"quick follow-up"&lt;/li&gt;
&lt;li&gt;"adding some urgency here"&lt;/li&gt;
&lt;li&gt;"just want to make sure this stays visible"&lt;/li&gt;
&lt;li&gt;"per my last message"&lt;/li&gt;
&lt;li&gt;"can we put together a short deck?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are not words.&lt;/p&gt;

&lt;p&gt;These are cursed runes.&lt;/p&gt;

&lt;p&gt;This project takes that shared experience and turns it into a fully operational harassment machine for your laptop.&lt;/p&gt;

&lt;p&gt;And that, to me, is community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;The tech industry spent years asking:&lt;/p&gt;

&lt;p&gt;"How can AI make developers more productive?"&lt;/p&gt;

&lt;p&gt;I asked a better question:&lt;/p&gt;

&lt;p&gt;"How can AI make developers feel like they are being lightly hunted through an open-plan office by a director named Brad?"&lt;/p&gt;

&lt;p&gt;The answer, it turns out, is surprisingly achievable.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ai-micromanager&lt;/code&gt; is stupid, mean, unnecessary, and extremely committed to the bit.&lt;/p&gt;

&lt;p&gt;Which is to say:&lt;/p&gt;

&lt;p&gt;it is my most realistic software project to date.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>418challenge</category>
      <category>showdev</category>
    </item>
    <item>
      <title>OmnethDB: Building a Memory System Agents Can Actually Trust</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Tue, 07 Apr 2026 18:05:27 +0000</pubDate>
      <link>https://dev.to/ubcent/omnethdb-building-a-memory-system-agents-can-actually-trust-15f</link>
      <guid>https://dev.to/ubcent/omnethdb-building-a-memory-system-agents-can-actually-trust-15f</guid>
      <description>&lt;p&gt;I have been working on Vexdo for a while now, trying to build an autonomous system that can ship code with as little human intervention as possible.&lt;/p&gt;

&lt;p&gt;Some of that work ended up in earlier write-ups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/ubcent/i-built-a-local-ai-dev-pipeline-that-reviews-its-own-code-before-opening-a-pr-geg"&gt;I built a local AI dev pipeline that reviews its own code before opening a PR&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ubcent/i-let-agents-write-my-code-they-got-stuck-in-a-loop-and-argued-with-each-other-36me"&gt;I let agents write my code. They got stuck in a loop and argued with each other&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/ubcent/i-needed-a-workflow-engine-for-ai-agents-none-of-them-fit-so-i-built-one-mjf"&gt;I needed a workflow engine for AI agents. None of them fit, so I built one&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OmnethDB came out of a pretty simple thought within that broader Vexdo journey.&lt;/p&gt;

&lt;p&gt;If I want agents to work on a codebase with less and less human supervision, it would be really useful if they could accumulate project memory in roughly the same way people do.&lt;/p&gt;

&lt;p&gt;A person who has been on a project for a long time is usually much more effective than a newcomer. They know the weird edge cases, the old migrations, the intentional tradeoffs that look like bugs, the decisions that were reversed, and the things that are technically possible but architecturally wrong.&lt;/p&gt;

&lt;p&gt;I wanted something closer to that.&lt;/p&gt;

&lt;p&gt;Project link: &lt;a href="https://github.com/ubcent/omnethdb" rel="noopener noreferrer"&gt;github.com/ubcent/omnethdb&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Most agent memory systems are optimized for demos.&lt;/p&gt;

&lt;p&gt;They can retrieve semantically similar notes, summarize recent context, and make an assistant feel like it "remembers." That is enough to look impressive in a prototype.&lt;/p&gt;

&lt;p&gt;It is not enough to build something trustworthy.&lt;/p&gt;

&lt;p&gt;So I started building OmnethDB from a stricter premise: memory for agents should be treated as a serious system primitive, not as a vague cache wrapped around embeddings.&lt;/p&gt;

&lt;p&gt;The bar is higher than "it retrieved something relevant."&lt;/p&gt;

&lt;p&gt;The bar is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;can we inspect why this memory exists?&lt;/li&gt;
&lt;li&gt;can we see whether it was superseded?&lt;/li&gt;
&lt;li&gt;can we tell whether it is a stable fact or a historical event?&lt;/li&gt;
&lt;li&gt;can we audit what changed and why?&lt;/li&gt;
&lt;li&gt;can an agent retrieve current truth without silently mixing it with stale truth?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the problem I want OmnethDB to solve.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem With "Memory"
&lt;/h2&gt;

&lt;p&gt;A lot of systems treat memory as one undifferentiated blob:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;architecture facts&lt;/li&gt;
&lt;li&gt;implementation details&lt;/li&gt;
&lt;li&gt;temporary incidents&lt;/li&gt;
&lt;li&gt;outdated decisions&lt;/li&gt;
&lt;li&gt;inferred patterns&lt;/li&gt;
&lt;li&gt;random notes from previous runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything gets embedded. Everything becomes retrievable. And then the agent is expected to "figure it out."&lt;/p&gt;

&lt;p&gt;That sounds flexible, but in practice it creates ambiguity.&lt;/p&gt;

&lt;p&gt;When a fact changes, both the old and new versions often remain in the corpus with no explicit semantic difference. Retrieval might surface either one. Sometimes it surfaces both. The agent gets contaminated context and has to guess what is current.&lt;/p&gt;

&lt;p&gt;That is not a retrieval problem. It is a &lt;strong&gt;memory semantics&lt;/strong&gt; problem.&lt;/p&gt;

&lt;p&gt;Agents do not just need more memory. They need memory with explicit rules around:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;versioning&lt;/li&gt;
&lt;li&gt;lineage&lt;/li&gt;
&lt;li&gt;lifecycle&lt;/li&gt;
&lt;li&gt;provenance&lt;/li&gt;
&lt;li&gt;relation semantics&lt;/li&gt;
&lt;li&gt;current-vs-historical truth&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What OmnethDB Is
&lt;/h2&gt;

&lt;p&gt;OmnethDB is a &lt;strong&gt;versioned, governed, inspectable memory primitive&lt;/strong&gt; for autonomous agents.&lt;/p&gt;

&lt;p&gt;At the architecture level, it is intentionally opinionated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;memories have kinds such as &lt;code&gt;Static&lt;/code&gt;, &lt;code&gt;Episodic&lt;/code&gt;, and &lt;code&gt;Derived&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;memory updates are explicit, not implicit&lt;/li&gt;
&lt;li&gt;lineage is preserved&lt;/li&gt;
&lt;li&gt;old memories are not deleted&lt;/li&gt;
&lt;li&gt;forgetting is a lifecycle mark, not silent removal&lt;/li&gt;
&lt;li&gt;relations are typed&lt;/li&gt;
&lt;li&gt;retrieval is designed to return the current version of knowledge, not a probabilistic blend of history and present&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last part matters a lot.&lt;/p&gt;

&lt;p&gt;In the OmnethDB architecture, if memory A updates memory B, that is not just metadata for humans to inspect later. It changes the active truth of the lineage. There is exactly one latest memory in a lineage at any point in time.&lt;/p&gt;

&lt;p&gt;That gives agents a much stronger contract than "here are some similar snippets, good luck."&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters In Practice
&lt;/h2&gt;

&lt;p&gt;The dangerous failure mode in agent systems is not forgetting.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;remembering the wrong thing with high confidence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If an agent is helping with debugging, migrations, architecture work, or product decisions, stale memory is often worse than missing memory. Missing memory usually creates uncertainty. Stale memory creates false certainty.&lt;/p&gt;

&lt;p&gt;That is why I treat memory in OmnethDB as something that must be inspectable and auditable, not just searchable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Corpus Came From
&lt;/h2&gt;

&lt;p&gt;The corpus behind these examples was not invented for the article.&lt;/p&gt;

&lt;p&gt;I connected OmnethDB to Claude Code as an MCP server and used it inside a real pet project for about a week.&lt;/p&gt;

&lt;p&gt;During that time, the memory corpus accumulated the kind of facts that actually show up in day-to-day engineering work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;architectural boundaries&lt;/li&gt;
&lt;li&gt;infra edge cases&lt;/li&gt;
&lt;li&gt;intentional tradeoffs that look like bugs without context&lt;/li&gt;
&lt;li&gt;superseded plans&lt;/li&gt;
&lt;li&gt;implementation details that matter operationally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because the interesting question is not whether a memory system can store polished examples.&lt;/p&gt;

&lt;p&gt;The interesting question is whether it stays useful when the knowledge is messy, evolving, and grounded in real work.&lt;/p&gt;

&lt;p&gt;That is the environment these examples came from.&lt;/p&gt;

&lt;p&gt;Also, one small warning before the examples: names like &lt;code&gt;mulder&lt;/code&gt;, &lt;code&gt;palantir&lt;/code&gt;, &lt;code&gt;gringotts&lt;/code&gt; and &lt;code&gt;chronicle&lt;/code&gt; are just internal service names from my pet project. I have a bad habit of giving services weird names and then making future-me work harder to remember what any of them actually do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Corpus Example 1: An Intentional Auth Decision
&lt;/h2&gt;

&lt;p&gt;Here is the kind of memory that benefits from strong semantics:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;rotateRefreshToken: false&lt;/code&gt; in an OIDC config was explicitly recorded as &lt;strong&gt;intentional&lt;/strong&gt;, not a bug.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[static] gringotts: rotateRefreshToken: false in configOIDC.ts is intentional, not a bug.

Reason: default oidc-provider v8 rotates refresh tokens on every use. With
parallel refresh requests, reuse detection can revoke the whole grant, including
 newly issued tokens, leading to permanent 401 failures.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The memory did not just store the final conclusion. It captured the operational reason:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;default refresh rotation marked tokens as consumed&lt;/li&gt;
&lt;li&gt;parallel refresh requests could trigger token reuse detection&lt;/li&gt;
&lt;li&gt;reuse detection revoked the whole grant&lt;/li&gt;
&lt;li&gt;users could receive fresh tokens that were already dead&lt;/li&gt;
&lt;li&gt;the result was permanent &lt;code&gt;401&lt;/code&gt; failures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly the kind of fact that agents routinely mishandle if memory is fuzzy.&lt;/p&gt;

&lt;p&gt;Without disciplined memory, a future agent might see &lt;code&gt;rotateRefreshToken: false&lt;/code&gt; and "fix" it back to &lt;code&gt;true&lt;/code&gt; because rotating refresh tokens sounds more secure in the abstract.&lt;/p&gt;

&lt;p&gt;With governed memory, the system can preserve the actual local truth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this was a deliberate tradeoff&lt;/li&gt;
&lt;li&gt;the rationale is known&lt;/li&gt;
&lt;li&gt;the memory is stable until superseded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is much closer to how strong engineering teams actually reason.&lt;/p&gt;




&lt;h2&gt;
  
  
  Corpus Example 2: Nginx, Subdomains, And The Difference Between A Symptom And A Cause
&lt;/h2&gt;

&lt;p&gt;Another memory in the corpus captured a subtle but high-impact behavior in nginx routing for subdomains.&lt;/p&gt;

&lt;p&gt;The observed issue was simple: relative links were broken on artist subdomains.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[static] client/prod.nginx.conf.sigil: subdomain block rewrites location / to
/user/$username.

Critical nginx behavior: proxy_pass with URI replaces the matched location
prefix. Request /foo becomes /user/$usernamefoo, so relative links break.
Only the root / works correctly.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the memory did not stop at the symptom. It preserved the real mechanism:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;location /&lt;/code&gt; rewrote traffic to &lt;code&gt;/user/$username&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;proxy_pass&lt;/code&gt; with a URI replaces the matched location prefix&lt;/li&gt;
&lt;li&gt;requests like &lt;code&gt;/users&lt;/code&gt; became &lt;code&gt;/user/artistusers&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;only the root path worked correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That memory then pointed to the practical fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use a top-level app URL&lt;/li&gt;
&lt;li&gt;generate absolute internal links&lt;/li&gt;
&lt;li&gt;avoid relying on relative navigation from the rewritten subdomain path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a good example of memory that is not merely descriptive. It is operationally useful because it encodes causality, not just observed breakage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Corpus Example 3: Why Lineage Matters More Than Similarity
&lt;/h2&gt;

&lt;p&gt;One of the clearest examples in the corpus is a calendar-related architectural shift.&lt;/p&gt;

&lt;p&gt;At one point, memory reflected a plan involving a separate &lt;code&gt;chronicle&lt;/code&gt; service emitting &lt;code&gt;calendar:event:changed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A later memory updated that reality: calendar functionality lives inside &lt;code&gt;palantir&lt;/code&gt;, not a standalone &lt;code&gt;chronicle&lt;/code&gt; service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;v1:
[static] New pattern: calendar:event:changed from chronicle (port 3007)

v2:
[static] CalendarModule is implemented inside palantir (port 3005) - a separate
chronicle service is not created.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your system only does semantic retrieval, both memories may look relevant forever.&lt;/p&gt;

&lt;p&gt;That is the core problem.&lt;/p&gt;

&lt;p&gt;They are both about calendar architecture.&lt;br&gt;
They are both high-similarity.&lt;br&gt;
They are both "useful context."&lt;/p&gt;

&lt;p&gt;But only one is the current truth.&lt;/p&gt;

&lt;p&gt;OmnethDB's lineage model is designed precisely for this case. The past remains auditable, but the present remains explicit. Historical memory is still available for inspection without silently driving live decisions.&lt;/p&gt;

&lt;p&gt;That distinction is one of the main reasons we think memory needs stronger primitives than vector search alone.&lt;/p&gt;


&lt;h2&gt;
  
  
  Corpus Example 4: Structured Memory For Retrieval Boundaries
&lt;/h2&gt;

&lt;p&gt;Another good example comes from search architecture.&lt;/p&gt;

&lt;p&gt;The corpus records that &lt;code&gt;mulder&lt;/code&gt; indexes profiles into OpenSearch, but the client does not query &lt;code&gt;mulder&lt;/code&gt; directly. Public search flows still go through CMS GraphQL and PostgreSQL unless a specific public search API is added.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[static] mulder fully indexes profiles into OpenSearch, but the client never
queries mulder directly - all search goes through CMS GraphQL -&amp;gt; PostgreSQL.
OpenSearch is currently a "dead" index without a public search API.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sounds like a small implementation detail, but it is actually a product and architecture boundary.&lt;/p&gt;

&lt;p&gt;If an agent misses that distinction, it may:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;propose the wrong integration point&lt;/li&gt;
&lt;li&gt;wire a client directly into the wrong service&lt;/li&gt;
&lt;li&gt;assume OpenSearch is already serving user-facing search traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The memory is useful because it tells the agent not just what exists, but what role it currently plays in the system.&lt;/p&gt;

&lt;p&gt;Again: explicit behavior beats ambient context.&lt;/p&gt;




&lt;h2&gt;
  
  
  Not Anti-Embedding, Anti-Hand-Waving
&lt;/h2&gt;

&lt;p&gt;To be clear, this is not an anti-embedding argument.&lt;/p&gt;

&lt;p&gt;Embeddings are useful.&lt;br&gt;
Similarity search is useful.&lt;br&gt;
Semantic retrieval is useful.&lt;/p&gt;

&lt;p&gt;But embeddings alone do not give you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supersession semantics&lt;/li&gt;
&lt;li&gt;lifecycle control&lt;/li&gt;
&lt;li&gt;derivation provenance&lt;/li&gt;
&lt;li&gt;auditability&lt;/li&gt;
&lt;li&gt;current-version guarantees&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarity can tell you what is related.&lt;br&gt;
It cannot tell you what is canonical.&lt;/p&gt;

&lt;p&gt;That is why we think memory systems for agents need stronger structure than "store chunks, embed them, and retrieve the top K."&lt;/p&gt;




&lt;h2&gt;
  
  
  Advisory As Memory Lint
&lt;/h2&gt;

&lt;p&gt;Another part of the idea that I find increasingly important is the advisory layer around memory quality.&lt;/p&gt;

&lt;p&gt;One useful way to think about it is: &lt;strong&gt;advisory is a bit like lint for memory&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A linter does not usually rewrite your whole program for you. It points at suspicious structure, inconsistent style, dead code, or likely mistakes and asks you to make an explicit decision.&lt;/p&gt;

&lt;p&gt;I think memory systems need something similar.&lt;/p&gt;

&lt;p&gt;Over time, a corpus accumulates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stale facts that should probably be superseded&lt;/li&gt;
&lt;li&gt;duplicate memories that should be merged or retired&lt;/li&gt;
&lt;li&gt;weak derived patterns with shaky provenance&lt;/li&gt;
&lt;li&gt;memories that are still retrievable but no longer belong on the hot path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is not exactly retrieval, and it is not exactly storage either. It is memory hygiene.&lt;/p&gt;

&lt;p&gt;So one direction I care about in OmnethDB is an advisory layer that can surface these issues the way a linter surfaces code smells: not by pretending to know product truth automatically, but by making memory quality problems visible and actionable.&lt;/p&gt;

&lt;p&gt;That feels like an important missing piece. A serious memory system should not just remember. It should also help you keep what it remembers legible, current, and worth trusting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Design Standard
&lt;/h2&gt;

&lt;p&gt;The standard we care about is not "good enough to demo."&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic correctness&lt;/li&gt;
&lt;li&gt;explicit behavior over hidden magic&lt;/li&gt;
&lt;li&gt;inspectable state transitions&lt;/li&gt;
&lt;li&gt;durable provenance&lt;/li&gt;
&lt;li&gt;retrieval that respects current truth&lt;/li&gt;
&lt;li&gt;enough structure that a strong engineer can trust the system under scrutiny&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If agent memory is going to become core infrastructure, it should be built with the seriousness we apply to databases, queues, and auth systems.&lt;/p&gt;

&lt;p&gt;Not as a toy.&lt;br&gt;
Not as a vibe.&lt;br&gt;
As infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Gets Interesting
&lt;/h2&gt;

&lt;p&gt;The exciting part is not just that agents can remember more.&lt;/p&gt;

&lt;p&gt;It is that they can remember in a way that supports disciplined reasoning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what is current&lt;/li&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what was superseded&lt;/li&gt;
&lt;li&gt;what is historical but still worth inspecting&lt;/li&gt;
&lt;li&gt;what was derived from multiple sources&lt;/li&gt;
&lt;li&gt;what should remain visible without being allowed to silently control decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the path from "agent memory" as a demo feature to memory as a trustworthy primitive.&lt;/p&gt;

&lt;p&gt;That is what I am trying to build with OmnethDB.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;If we want agents that can operate safely in real codebases and real systems, memory has to become more than retrieval sugar.&lt;/p&gt;

&lt;p&gt;It has to become something we can inspect, govern, version, and audit.&lt;/p&gt;

&lt;p&gt;That is the bet behind OmnethDB:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;memory should be queryable, but it should also be legible.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And when the truth changes, the system should know the difference between history and the present.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>memory</category>
      <category>agents</category>
    </item>
    <item>
      <title>I Built a Cookie Banner That Makes It Technically Possible to Reject Cookies</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Tue, 07 Apr 2026 10:23:59 +0000</pubDate>
      <link>https://dev.to/ubcent/i-built-a-cookie-banner-that-makes-it-technically-possible-to-reject-cookies-1ank</link>
      <guid>https://dev.to/ubcent/i-built-a-cookie-banner-that-makes-it-technically-possible-to-reject-cookies-1ank</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/aprilfools-2026"&gt;DEV April Fools Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;A React component library that faithfully recreates the experience of trying to reject cookies on a modern enterprise website.&lt;/p&gt;

&lt;p&gt;By which I mean: you can reject. Technically. Eventually. After a brief sequence of clarifications.&lt;/p&gt;

&lt;p&gt;The package is called &lt;code&gt;react-consent-chaos&lt;/code&gt;. It ships a &lt;code&gt;ConsentManagerFromHell&lt;/code&gt; modal with three &lt;code&gt;hellMode&lt;/code&gt; settings (&lt;code&gt;"polite"&lt;/code&gt;, &lt;code&gt;"pushy"&lt;/code&gt;, &lt;code&gt;"comically-evil"&lt;/code&gt;), three &lt;code&gt;rejectDifficulty&lt;/code&gt; levels (&lt;code&gt;"annoying"&lt;/code&gt;, &lt;code&gt;"absurd"&lt;/code&gt;, &lt;code&gt;"nightmare"&lt;/code&gt;), and a prop called &lt;code&gt;allowRejectEventually&lt;/code&gt; whose &lt;code&gt;false&lt;/code&gt; case is described in the README as: &lt;em&gt;rejection remains aspirational&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I want to be very clear that I wrote that documentation willingly.&lt;/p&gt;

&lt;p&gt;The default company name is &lt;code&gt;"Consent Dynamics"&lt;/code&gt;. The default vendor count is &lt;code&gt;1,847&lt;/code&gt;. Both are configurable. Neither number has been audited.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Here is what happens when a user opens the modal on &lt;code&gt;hellMode="pushy"&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrgu24q3lbiuffr79qax.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrgu24q3lbiuffr79qax.png" alt="Initial consent modal" width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The badge says &lt;strong&gt;"Partner-Aligned"&lt;/strong&gt;. The accept button says &lt;strong&gt;"Accept all and continue"&lt;/strong&gt;. The reject button says &lt;strong&gt;"Reject optional cookies"&lt;/strong&gt; and underneath that, in smaller text: &lt;strong&gt;"Step 1 / 5"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Step 1 of 5.&lt;/p&gt;

&lt;p&gt;The user clicks it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0yvy5yr22xiior1e8f2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0yvy5yr22xiior1e8f2.png" alt="Reject flow escalation" width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The button now says &lt;strong&gt;"Confirm reduced experience"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The status box updates to: &lt;em&gt;"Our partners will be disappointed."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The user, briefly, feels something.&lt;/p&gt;

&lt;p&gt;They click again. The button becomes &lt;strong&gt;"Acknowledge optimization loss"&lt;/strong&gt;. Then &lt;strong&gt;"Continue without joy"&lt;/strong&gt;. Then, on step five — visibly exhausted, through gritted teeth — &lt;strong&gt;"Reject anyway"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At this point the modal finally closes and fires the &lt;code&gt;onRejectAll&lt;/code&gt; callback with the message: &lt;strong&gt;"Fine. We respect your persistence."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is considered the happy path.&lt;/p&gt;




&lt;p&gt;The preferences panel is where the package really earns its name.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkemuebs3x1ijfhwdslo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkemuebs3x1ijfhwdslo.png" alt="Preferences panel" width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every consent category has a description written in the voice of a mid-level enterprise product manager who is doing their best:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Necessary&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Required for the continued existence of the button.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Measures intent, confusion, and funnel sincerity.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Personalization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Allows the interface to remember your boundaries and negotiate them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Legitimate interest&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A timeless category with exceptional self-esteem.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mood tracking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Detects mild reluctance for premium reassurance.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Productivity optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ensures banners arrive during your most fragile focus windows.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;"Productivity optimization" is a real consent category in this component. It is off by default. You're welcome.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy263fb963gd4imu5331y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy263fb963gd4imu5331y.png" alt="Event log" width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; &lt;a href="https://github.com/ubcent/react-consent-chaos" rel="noopener noreferrer"&gt;https://github.com/ubcent/react-consent-chaos&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;react-consent-chaos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Basic usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ConsentManagerFromHell&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react-consent-chaos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;react-consent-chaos/styles.css&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ConsentManagerFromHell&lt;/span&gt;
  &lt;span class="na"&gt;open&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;open&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;onOpenChange&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;setOpen&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;companyName&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"Synergy Harvest"&lt;/span&gt;
  &lt;span class="na"&gt;hellMode&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"pushy"&lt;/span&gt;
  &lt;span class="na"&gt;rejectDifficulty&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"absurd"&lt;/span&gt;
  &lt;span class="na"&gt;allowRejectEventually&lt;/span&gt;
  &lt;span class="na"&gt;onAcceptAll&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Excellent. Your journey has been optimized.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;onRejectAll&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Fine. We respect your persistence.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
  &lt;span class="na"&gt;onSavePreferences&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;handleSavePreferences&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The reject button in action
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;ConsentStepButton&lt;/code&gt; walks users through a dynamically generated sequence of labels. At &lt;code&gt;difficulty="nightmare"&lt;/code&gt; and &lt;code&gt;mode="comically-evil"&lt;/code&gt;, the full sequence is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reject optional cookies&lt;/li&gt;
&lt;li&gt;Confirm reduced experience&lt;/li&gt;
&lt;li&gt;Acknowledge optimization loss&lt;/li&gt;
&lt;li&gt;Continue without joy&lt;/li&gt;
&lt;li&gt;Reject anyway&lt;/li&gt;
&lt;li&gt;Decline enhanced destiny&lt;/li&gt;
&lt;li&gt;Deny data to the revenue temple&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 7 of 7 is &lt;strong&gt;"Deny data to the revenue temple"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I am not sure I can defend this. I am also not taking it out.&lt;/p&gt;

&lt;h3&gt;
  
  
  The quiet part, out loud
&lt;/h3&gt;

&lt;p&gt;When a user clicks the reject button before they've completed enough steps, two things happen in the source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user attempted informed choice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;consent friction increased&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are real lines. In the production bundle. Labeled as warnings.&lt;/p&gt;

&lt;p&gt;I briefly considered labeling them &lt;code&gt;console.error&lt;/code&gt;. I chose not to because I have some remaining sense of proportion.&lt;/p&gt;

&lt;h3&gt;
  
  
  The preferences panel also has a feature
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;getManipulatedPreferences&lt;/code&gt; runs silently on every save. In &lt;code&gt;pushy&lt;/code&gt; mode, it re-enables &lt;code&gt;legitimateInterest&lt;/code&gt; regardless of what the toggle says. In &lt;code&gt;comically-evil&lt;/code&gt; mode, it also re-enables &lt;code&gt;advertising&lt;/code&gt; and &lt;code&gt;partnerSharing&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The function is not hidden. It is named &lt;code&gt;getManipulatedPreferences&lt;/code&gt;. It is called inside &lt;code&gt;handleSavePreferences&lt;/code&gt;. Any developer who reads the file will find it immediately.&lt;/p&gt;

&lt;p&gt;This is not obfuscation. It is transparency with a very specific energy.&lt;/p&gt;

&lt;h3&gt;
  
  
  The hook, for enthusiasts
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;escalation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useConsentEscalation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nightmare&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;comically-evil&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;allowRejectEventually&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// escalation.statusMessage →&lt;/span&gt;
&lt;span class="c1"&gt;//   "Compliance theater is nearing completion."&lt;/span&gt;
&lt;span class="c1"&gt;//   "Revenue sadness has been acknowledged."&lt;/span&gt;
&lt;span class="c1"&gt;//   "Your defiance has entered the final audit lane."&lt;/span&gt;
&lt;span class="c1"&gt;//   "Fine. We respect your persistence."  ← only on the last step&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;useConsentEscalation&lt;/code&gt; is exported separately for developers who want to implement their own rejection experience without using the full modal. It returns &lt;code&gt;canReject&lt;/code&gt;, &lt;code&gt;advanceRejectFlow()&lt;/code&gt;, &lt;code&gt;resetRejectFlow()&lt;/code&gt;, and a rotating &lt;code&gt;statusMessage&lt;/code&gt; that cycles through mode-appropriate passive aggression until the user has exhausted their allocation of steps.&lt;/p&gt;

&lt;p&gt;After that: &lt;em&gt;"Fine. We respect your persistence."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's it. That's the only acknowledgment available. There is no version of this component that says "sure, no problem."&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;The package is TypeScript-first, bundles to ESM + CJS via &lt;code&gt;tsup&lt;/code&gt;, and ships styles as a separate import. No runtime dependencies beyond React 18+.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;hellMode&lt;/code&gt; prop controls copy across the entire modal. In &lt;code&gt;comically-evil&lt;/code&gt; mode the title changes to &lt;em&gt;"Universal Consent Acceleration Layer"&lt;/em&gt;, the badge becomes &lt;strong&gt;"Legally Adjacent"&lt;/strong&gt;, the accept button becomes &lt;strong&gt;"Excellent, optimize me"&lt;/strong&gt;, and the manage preferences button becomes &lt;strong&gt;"Audit the damage"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;polite&lt;/code&gt; mode — the tamest setting — the helper text reads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Your privacy matters deeply to us within commercially reasonable limits.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the polite version.&lt;/p&gt;

&lt;p&gt;The preferences panel heading in &lt;code&gt;comically-evil&lt;/code&gt; mode is &lt;em&gt;"Manual resistance configuration"&lt;/em&gt;. The component also tracks how many times the user has attempted to save preferences (&lt;code&gt;saveCount&lt;/code&gt;) and displays it in the UI, because if you're going to be hostile you should at least be transparent about it.&lt;/p&gt;

&lt;p&gt;The footer legalese reads:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Necessary cookies are mandatory, spiritually and technically.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Optional categories may remain enabled where enterprise momentum requires.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both lines are in the real component. Both survived every review I gave them.&lt;/p&gt;

&lt;p&gt;The entire thing is accessible: keyboard navigation works, ARIA roles are set, focus is managed on open and panel change, and the status message during the reject flow is wrapped in &lt;code&gt;aria-live="polite"&lt;/code&gt; so screen reader users receive every update in real time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Our partners will be disappointed."&lt;/em&gt; — delivered accessibly, to everyone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prize Category
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Anti-Value Proposition
&lt;/h3&gt;

&lt;p&gt;The value of this package is entirely negative and precisely calibrated. Setting &lt;code&gt;allowRejectEventually={false}&lt;/code&gt; renders a modal where rejection is permanently queued. The progress bar advances. The steps are counted. The button labels grow more resigned with each click. Nothing ever resolves.&lt;/p&gt;

&lt;p&gt;The README describes this as: &lt;em&gt;rejection remains aspirational&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I am genuinely proud of that sentence and also slightly concerned about what it says about me.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creativity
&lt;/h3&gt;

&lt;p&gt;The original insight is that dark patterns are not hidden — they're just undocumented. &lt;code&gt;react-consent-chaos&lt;/code&gt; documents all of them. &lt;code&gt;getManipulatedPreferences&lt;/code&gt; is a named export. The &lt;code&gt;console.warn&lt;/code&gt; lines are readable in any debugger. &lt;code&gt;allowRejectEventually={false}&lt;/code&gt; is a prop you pass on purpose.&lt;/p&gt;

&lt;p&gt;The joke is that naming the manipulation doesn't make it less manipulative. It just makes it honest manipulation. Which is somehow worse.&lt;/p&gt;

&lt;p&gt;Also: the &lt;code&gt;nightmare&lt;/code&gt; difficulty allows up to seven steps, and if you set &lt;code&gt;rejectStepsBeforeSuccess&lt;/code&gt; to a number beyond the label pool, the fallback label is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Continue with regrettable self-determination N&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;where &lt;code&gt;N&lt;/code&gt; is the step number. This was not planned. It emerged naturally from the implementation and I kept it because it felt right.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Execution
&lt;/h3&gt;

&lt;p&gt;Real library, real build pipeline, real types, real accessibility, real hook API. The &lt;code&gt;ConsentStepButton&lt;/code&gt; and &lt;code&gt;useConsentEscalation&lt;/code&gt; are individually exported for anyone who wants to compose their own dark-pattern UI from primitives. The component handles controlled and uncontrolled state. The overlay is non-closable by default. Escape is gated behind &lt;code&gt;overlayClosable&lt;/code&gt;. Focus returns correctly.&lt;/p&gt;

&lt;p&gt;It is a well-engineered component whose entire purpose is to demonstrate how much engineering goes into making people feel bad about wanting privacy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Writing Quality
&lt;/h3&gt;

&lt;p&gt;Every string in this codebase was written as if it would appear in an actual product and reviewed by an actual legal team in an actual company that has lost the thread. &lt;em&gt;"Delight generation may be reduced."&lt;/em&gt; &lt;em&gt;"Your independence is being carefully reviewed."&lt;/em&gt; &lt;em&gt;"A timeless category with exceptional self-esteem."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;These are not captions. They are labels. They render in the UI. They are wrapped in ARIA attributes and shipped in a bundle.&lt;/p&gt;

&lt;p&gt;The demo's event log initializes with the entry: &lt;em&gt;"Awaiting a fresh compliance event."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I think about that line sometimes.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>418challenge</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Needed a Workflow Engine for AI Agents. None of Them Fit. So I Built One.</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Fri, 27 Mar 2026 17:08:10 +0000</pubDate>
      <link>https://dev.to/ubcent/i-needed-a-workflow-engine-for-ai-agents-none-of-them-fit-so-i-built-one-mjf</link>
      <guid>https://dev.to/ubcent/i-needed-a-workflow-engine-for-ai-agents-none-of-them-fit-so-i-built-one-mjf</guid>
      <description>&lt;p&gt;&lt;em&gt;Part three of the vexdo series — after &lt;a href="https://dev.to/ubcent/i-built-a-local-ai-dev-pipeline-that-reviews-its-own-code-before-opening-a-pr-geg"&gt;building a local AI dev pipeline&lt;/a&gt; and &lt;a href="https://dev.to/ubcent/i-let-agents-write-my-code-they-got-stuck-in-a-loop-and-argued-with-each-other-36me"&gt;moving it to the cloud&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;vexdo works. I use it. It handles the boring parts of shipping code — the implement-review-fix loop that used to eat my afternoons.&lt;/p&gt;

&lt;p&gt;At some point I started thinking: could this be something more than a personal tool? Not just a CLI I run on my machine, but an actual product. Something with a proper foundation, not held together with state files and hardcoded pipeline logic.&lt;/p&gt;

&lt;p&gt;And that's where things got complicated.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with "just use a workflow engine"
&lt;/h2&gt;

&lt;p&gt;The obvious answer when you want to orchestrate multi-step processes is: use a workflow engine. Airflow, Temporal, BullMQ, Prefect — there are plenty of them, and some are very good at what they do.&lt;/p&gt;

&lt;p&gt;The problem is what they're good at.&lt;/p&gt;

&lt;p&gt;These engines are built around a core assumption: &lt;strong&gt;you know your steps upfront&lt;/strong&gt;. You define a DAG — nodes, edges, dependencies — and the engine executes it. The graph is fixed. That's the contract.&lt;/p&gt;

&lt;p&gt;For traditional workflows, this is fine. ETL pipelines, CI/CD jobs, batch processing — you know what needs to happen before it starts happening.&lt;/p&gt;

&lt;p&gt;AI agents break this assumption.&lt;/p&gt;

&lt;p&gt;Here's a concrete example from vexdo. When an agent starts working on a task, it first analyzes the codebase — what files are involved, which modules are sensitive, how deep the change goes. But the &lt;em&gt;result&lt;/em&gt; of that analysis determines what comes next.&lt;/p&gt;

&lt;p&gt;Simple task touching one service? Skip the design council, go straight to implementation.&lt;/p&gt;

&lt;p&gt;Task that touches the payments module? Spawn a dedicated security review. If it also changes the API schema, spawn a contract validation step. If the codebase has low test coverage, spawn a test generation pass first.&lt;/p&gt;

&lt;p&gt;None of this is knowable when the workflow starts. The agent discovers it by doing the work.&lt;/p&gt;

&lt;p&gt;If you try to handle this with a fixed DAG, you end up with one of two bad options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pre-define every possible branch&lt;/strong&gt; — the graph becomes a sprawling mess of conditional edges, and half the nodes never run. You're essentially writing a decision tree disguised as a workflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Treat the whole thing as one big step&lt;/strong&gt; — you lose parallelism, observability, retry granularity, and the ability to checkpoint. Your "workflow" is just a black box that either finishes or fails.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither option is good. What you actually want is a workflow that can &lt;em&gt;extend itself&lt;/em&gt; at runtime — where completing a step can add new steps to the graph, based on what was discovered.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core idea: a graph that grows
&lt;/h2&gt;

&lt;p&gt;I've been calling this a &lt;strong&gt;living graph&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a traditional workflow engine, the DAG is immutable after you start a run. In a living graph, nodes can spawn new nodes as part of their output. The graph is a starting point, not a constraint.&lt;/p&gt;

&lt;p&gt;When a node completes, its result can include a list of new nodes to add to the graph — with their own dependencies, retry policies, and compensation logic. The scheduler picks them up and runs them exactly like any other node. From the engine's perspective, there's no difference between a node that was defined at the start and one that was spawned mid-run.&lt;/p&gt;

&lt;p&gt;This is the key idea behind &lt;strong&gt;Grael&lt;/strong&gt; — the workflow engine I built specifically for AI agent pipelines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Workflow starts:
  [scout] → ???

Scout runs, analyzes the codebase, returns:
  output: { complexity: "high", touchedModules: ["payments", "api"] }
  spawn:  [
    { id: "council",   dependsOn: ["scout"] },
    { id: "implement", dependsOn: ["council"] },
    { id: "sec-review",dependsOn: ["implement"] }   ← spawned because: payments
    { id: "reviewer",  dependsOn: ["implement"] },
    { id: "arbiter",   dependsOn: ["reviewer", "sec-review"] },
    { id: "pr",        dependsOn: ["arbiter"] }
  ]

Graph is now:
  [scout] → [council] → [implement] → [reviewer] ──→ [arbiter] → [pr]
                                   └→ [sec-review] ─┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The spawn happened inside the scout's activity — the engine didn't know about any of this when the workflow started.&lt;/p&gt;

&lt;p&gt;Another example: spec contradictions. The arbiter reviews the diff and notices that what the executor built doesn't match the original spec — not a code quality issue, but a genuine conflict in requirements. Maybe the spec said "use cursor-based pagination" but the executor implemented offset-based because an existing helper made it easier. Maybe two requirements in the spec are mutually exclusive and the executor quietly picked one.&lt;/p&gt;

&lt;p&gt;In a fixed pipeline, this either escalates to a human or gets sent back to the executor with a "fix it" comment. But the right answer is often neither — you need to go back to whoever wrote the spec and ask for a decision.&lt;/p&gt;

&lt;p&gt;With a living graph, the arbiter can spawn a &lt;code&gt;spec-clarification&lt;/code&gt; node instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Arbiter runs, detects contradiction, returns:
  output: { decision: "spec-contradiction" }
  spawn:  [
    {
      id: "spec-clarification",
      activityType: "spec-writer",   ← could be a human checkpoint or another agent
      dependsOn: ["arbiter"],
      input: {
        question: "Spec says cursor-based pagination, executor used offset. Which do you want?",
        context: { diff, specExcerpt }
      }
    },
    {
      id: "implement-revised",
      activityType: "executor",
      dependsOn: ["spec-clarification"]   ← continues once clarified
    },
    ...
  ]

Graph grows:
  ... → [arbiter] → [spec-clarification] → [implement-revised] → [reviewer-2] → [pr]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;spec-writer&lt;/code&gt; activity type could be anything — a human approval gate, a dedicated planning agent that re-evaluates the requirements, or a call to the original spec-generation step with additional context. The arbiter doesn't need to know. It just knows this is a spec problem, not a code problem, and spawns the right node type. That routing decision is something only an agent can make in context. You can't pre-define it in a YAML file before the run starts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Grael actually is
&lt;/h2&gt;

&lt;p&gt;Grael is a Go service built around this idea. The code is on GitHub: &lt;a href="https://github.com/ubcent/grael" rel="noopener noreferrer"&gt;github.com/ubcent/grael&lt;/a&gt;. A few things I cared about when building it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Everything is an event.&lt;/strong&gt; The entire state of a workflow run is an append-only event log. The current graph, node states, retry counts — all of it is derived by replaying events from the WAL. This means crashes are recoverable, history is auditable, and replay is deterministic. If Grael goes down mid-run, it picks up exactly where it left off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Workers are just processes that poll for tasks.&lt;/strong&gt; Any language, any runtime. You register a worker with the activity types it can handle, then poll for tasks. When you get one, you run it and report back. The Go SDK is one file. A TypeScript SDK is straightforward to build on top of the gRPC API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Compensation is built in.&lt;/strong&gt; Each node can declare a compensation activity — what to undo if things go wrong downstream. When a run fails, Grael automatically runs compensations in reverse order. Saga pattern, out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human checkpoints are first-class.&lt;/strong&gt; An activity can return a checkpoint signal instead of completing — the node enters a waiting state, unrelated work continues, and the run resumes when someone calls &lt;code&gt;ApproveCheckpoint&lt;/code&gt;. The checkpoint timeout is configurable per node. This is how you put a human in the loop without halting the entire pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/fZXBkdV1Jow"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The demo runs a morning incident briefing workflow. The scenario: an on-call team needs to quickly assemble a picture of what's happening and decide what to investigate. Here's the shape of the run:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Three preparation steps start in parallel — collect customer escalations, pull checkout metrics, prepare the briefing outline.&lt;/li&gt;
&lt;li&gt;A planning step runs once those complete and decides which follow-up checks need to happen.&lt;/li&gt;
&lt;li&gt;Grael spawns the concrete investigation nodes based on that decision — verify checkout latency, confirm payment auth drop, review support spike. These weren't in the graph when the run started.&lt;/li&gt;
&lt;li&gt;One spawned investigation fails retryably. Grael retries it automatically.&lt;/li&gt;
&lt;li&gt;An editor approval gate opens. The run doesn't freeze — the other investigations keep progressing.&lt;/li&gt;
&lt;li&gt;Once all investigations are done and the approval comes through, the results flow into assembling the final brief.&lt;/li&gt;
&lt;li&gt;The brief is published. Run completes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What to watch for: the graph growing after the planning step, multiple nodes running at the same time, the failed node recovering, and the approval gate that's clearly distinct from a stall.&lt;/p&gt;

&lt;p&gt;One more thing worth noting: what you're watching is a replay. Not a live run recorded at demo time — a deterministic replay of a previously recorded execution, driven from the event log.&lt;/p&gt;

&lt;p&gt;This is possible because of how Grael works internally. Every state transition — node started, node completed, spawn happened, retry scheduled, checkpoint reached — is written to an append-only WAL before anything changes in memory. The current state of a run is always derived by replaying that log from the beginning. There's no separate "current state" that can drift or get corrupted.&lt;/p&gt;

&lt;p&gt;The consequence is that any run can be replayed exactly. Same events, same order, same graph shape, same outcome. That's what makes the demo reproducible — and it's also what makes crash recovery work. If Grael goes down mid-run, it replays the log on restart and picks up where it left off. The demo and the durability guarantee are the same mechanism.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Again — this is very early.&lt;/strong&gt; The workflow is synthetic, built to demonstrate these behaviors in a controlled setting. I haven't thrown real production workloads at this. Consider it a proof of concept, not a reliability claim.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To be clear: this is very early.&lt;/strong&gt; What you're seeing is a proof of concept, not a production system. I've run it through a handful of synthetic workflows to validate the architecture. I haven't thrown real production workloads at it. There are rough edges, missing features, and exactly zero battle testing. The goal right now is to get the core idea right — not to ship something stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this matters for vexdo specifically
&lt;/h2&gt;

&lt;p&gt;The current version of vexdo has its orchestration hardcoded. The pipeline is always: submit → review → arbiter → fix → repeat. That works for the use case I built it for, but it's not general enough to turn into a product.&lt;/p&gt;

&lt;p&gt;With Grael underneath, vexdo becomes a set of activity workers — scout, executor, reviewer, arbiter, pr-creator — registered against an engine that handles the graph. The pipeline itself is just a starting node definition. What it grows into depends on what the agents discover.&lt;/p&gt;

&lt;p&gt;This also unlocks things that were awkward before: running review and security checks in parallel, spawning additional investigation steps when something looks risky, human approval gates at specific points in the pipeline. These become configuration, not code changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Grael needs a gRPC server layer before it can talk to TypeScript workers. That's the immediate next step. After that, the TypeScript SDK, then wiring vexdo's agents into it.&lt;/p&gt;

&lt;p&gt;If this is interesting to you — either because you're building something similar, or because you're skeptical this is the right abstraction — I'd genuinely like to hear it. The living graph idea feels right to me, but I'm very much still figuring out where it breaks.&lt;/p&gt;

&lt;p&gt;More posts coming as this progresses.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Let Agents Write My Code. They Got Stuck in a Loop and Argued With Each Other</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Thu, 19 Mar 2026 15:27:16 +0000</pubDate>
      <link>https://dev.to/ubcent/i-let-agents-write-my-code-they-got-stuck-in-a-loop-and-argued-with-each-other-36me</link>
      <guid>https://dev.to/ubcent/i-let-agents-write-my-code-they-got-stuck-in-a-loop-and-argued-with-each-other-36me</guid>
      <description>&lt;p&gt;&lt;em&gt;A follow-up to &lt;a href="https://dev.to/ubcent/i-built-a-local-ai-dev-pipeline-that-reviews-its-own-code-before-opening-a-pr-geg"&gt;building a local AI pipeline that reviews its own code&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I built vexdo — a CLI pipeline that automates the full dev cycle: spec → Codex implementation → reviewer → arbiter → PR. The dream: close my laptop, come back to a reviewed PR. No manual copy-pasting between tools, no being the glue.&lt;/p&gt;

&lt;p&gt;Then I migrated from local Codex to Codex Cloud. Then I swapped the reviewer from Claude to GitHub Copilot CLI. Then I went to make a coffee and came back to find my pipeline had sent Codex the same feedback four times in a row.&lt;/p&gt;

&lt;p&gt;This post is about that, and the other ways things broke. Not the happy path — the other one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick recap: what vexdo does
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec.yaml → Codex (implements) → Reviewer (finds issues) → Arbiter (fix / submit / escalate) → PR
                 ↑___________________________|
                         fix loop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;v1 ran locally and synchronously. v2 runs Codex Cloud so I can kick off a task and close my laptop. The reviewer is now GitHub Copilot CLI. The arbiter is still Claude.&lt;/p&gt;

&lt;p&gt;Simple enough in theory. Here's what went wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  The infinite loop that took me an embarrassing amount of time to notice
&lt;/h2&gt;

&lt;p&gt;This one genuinely hurt.&lt;/p&gt;

&lt;p&gt;My spec had a contradiction I didn't catch when writing it. One section described the expected behavior. Another section described the system architecture. They disagreed on where a certain piece of logic should live.&lt;/p&gt;

&lt;p&gt;Codex, being a dutiful implementer, read the behavior requirement and made change A. The reviewer flagged it: &lt;em&gt;"this violates the architecture described in section 3."&lt;/em&gt; Fair enough. The arbiter sent it back for a fix.&lt;/p&gt;

&lt;p&gt;Next iteration: Codex, now armed with the reviewer's feedback, made change B instead. The reviewer flagged it: &lt;em&gt;"this doesn't implement the behavior described in section 1."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Codex made change A again.&lt;/p&gt;

&lt;p&gt;I watched this unfold across four iterations before I admitted to myself what was happening. The agents weren't broken. They were doing exactly what they were told. The spec was broken, and nobody in the loop had the job of noticing that — because I hadn't given anyone that job.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix: a stuck detector
&lt;/h3&gt;

&lt;p&gt;I added a fourth agent call — a loop detector that runs after each review. It gets the full iteration history: every reviewer output, every piece of feedback, every resulting diff. Its only job is to answer one question: &lt;em&gt;are we making progress, or are we going in circles?&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="s2"&gt;`You are reviewing the history of a code review loop.\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;`Below are the last &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; iterations: reviewer findings and the resulting diffs.\n\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;formatHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;\n\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;`Is the loop making progress toward resolution, or is it cycling?\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;`If cycling: briefly describe the contradiction causing it.\n`&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
  &lt;span class="s2"&gt;`Respond with JSON: { "status": "progress" | "stuck", "reason": string }`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When it returns &lt;code&gt;stuck&lt;/code&gt;, the pipeline escalates to me with the reason. In the spec-contradiction case the output was something like: &lt;em&gt;"Reviewer alternates between flagging architecture violation and spec violation. Likely spec inconsistency between sections 1 and 3."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's exactly the signal I needed. I fixed the spec in two minutes. The task ran clean on the next attempt.&lt;/p&gt;

&lt;p&gt;One more API call per iteration. Absolutely worth it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The arbiter that treated every nit as a showstopper
&lt;/h2&gt;

&lt;p&gt;My arbiter's job is to decide: &lt;code&gt;fix&lt;/code&gt;, &lt;code&gt;submit&lt;/code&gt;, or &lt;code&gt;escalate&lt;/code&gt;. In v1, it was prompt-tuned to be thorough — if there are any issues in the review, send it back for fixes.&lt;/p&gt;

&lt;p&gt;Sounds responsible. Was not.&lt;/p&gt;

&lt;p&gt;The Copilot reviewer, being an agent with opinions, would find real issues — and also flag a variable name it preferred, a missing blank line, inconsistent comment style. Nits. These came back as review comments. The arbiter, seeing review comments, dutifully returned &lt;code&gt;fix&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So tasks that were functionally correct would bounce through 2-3 extra cycles chasing aesthetics. Each cycle is 8-10 minutes of Codex Cloud execution. A task that should have been one pass took four. The diff after iteration 4 was identical to the diff after iteration 1 except for a renamed variable and a blank line.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix: severity-aware arbitration
&lt;/h3&gt;

&lt;p&gt;The reviewer was already tagging severity — I just wasn't using it in the arbiter decision. One prompt update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Severity guide:
- critical / high: always fix before submitting
- medium: fix if it's a behavior issue; use judgment for style
- low / nit: do NOT send back for fix; note in PR description instead

Only return "fix" if there are unresolved critical or high severity issues.
If the only remaining issues are low/nit, return "submit".
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Task cycle count dropped immediately.&lt;/p&gt;

&lt;p&gt;The thing I kept reminding myself: &lt;strong&gt;the arbiter is a policy, not just a judge.&lt;/strong&gt; Left to its own devices, it defaults to "fix everything," which is technically correct and practically a treadmill. You have to encode the actual policy — what counts as blocking, what doesn't — or you'll spend a lot of Codex Cloud credits on blank lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cloud stuff that also broke (quickly)
&lt;/h2&gt;

&lt;p&gt;Since you'll hit these too:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exit codes lie.&lt;/strong&gt; &lt;code&gt;codex cloud status&lt;/code&gt; returns non-zero when a task is still pending. Not an error — just "not done yet." My polling loop treated every poll as a failure and gave up immediately. Fix: parse stdout first, only throw if the output is unrecognizable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The status values aren't what the docs imply.&lt;/strong&gt; I was matching &lt;code&gt;completed&lt;/code&gt;. The actual output contains &lt;code&gt;[READY]&lt;/code&gt;. Also in rotation: &lt;code&gt;[PENDING]&lt;/code&gt;, &lt;code&gt;[RUNNING]&lt;/code&gt;, &lt;code&gt;[IN_PROGRESS]&lt;/code&gt;. Add them all, map &lt;code&gt;READY&lt;/code&gt; → &lt;code&gt;completed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There's no CLI resume command.&lt;/strong&gt; The web UI lets you continue a Codex session with follow-up instructions. The CLI doesn't expose this. I simulate it by submitting a new task with the original spec plus feedback appended, with a header so it's recognizable in the UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[REVIEW FEEDBACK — FIX REQUESTED]
Task: Implement key pairs validation
Iteration: 2

&amp;lt;original spec&amp;gt;

Issues to fix:
&amp;lt;arbiter feedback&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The less funny thing I've been sitting with
&lt;/h2&gt;

&lt;p&gt;All of the above are patchable. Annoying to find, quick to fix.&lt;/p&gt;

&lt;p&gt;The bigger issue isn't a bug: &lt;strong&gt;my codebase wasn't ready for agents to work in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I realized this gradually, then all at once. I put together a scoring framework — an "agent-ready codebase" checklist — and ran my codebase through it. The result was humbling.&lt;/p&gt;

&lt;h3&gt;
  
  
  The framework
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Repository structure &amp;amp; modularity.&lt;/strong&gt; Can you clearly identify domain logic, application services, adapters, infrastructure, and tests? Are module boundaries clean, or is there a "shared dump" folder where things go to be forgotten? Hidden coupling is invisible to you and actively confusing to agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Locality of changes.&lt;/strong&gt; For a typical feature, how many files does a change touch? Which modules get pulled in? "God files" and scattered logic mean agents produce large, sprawling diffs — which makes the reviewer's job harder and increases the surface area for things to slip through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Naming &amp;amp; intent clarity.&lt;/strong&gt; Are functions and modules named by use-case, or generically? Can you infer side effects from names? An agent reading &lt;code&gt;processData()&lt;/code&gt; has to guess. An agent reading &lt;code&gt;validateAndPersistUserPayment()&lt;/code&gt; doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Contracts &amp;amp; boundaries.&lt;/strong&gt; Are API boundaries validated — schemas, types, runtime validation? Are there contract tests? Is the public API clearly separated from internals? Without this, agents make changes that technically compile but violate implicit assumptions at integration points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Test quality &amp;amp; reliability.&lt;/strong&gt; Are tests deterministic? Behavior-focused? Do they cover edge cases? Can you easily add a regression test when something breaks? Flaky tests are worse than no tests in an automated pipeline — they inject false negatives into the review loop and you can't tell whether the failure is real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Verification pipeline.&lt;/strong&gt; Is there a single command that verifies correctness — lint, types, tests? Can you run partial checks scoped to changed files? If the answer is "kind of, it's complicated," agents will struggle to self-verify. And if they can't self-verify, you end up doing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Review comment verifiability.&lt;/strong&gt; Can typical review comments be validated automatically — via lint, type checker, tests? Or are most comments subjective judgment calls? The higher the ratio of automatable-to-subjective feedback, the more effective an automated reviewer becomes. A codebase full of subjective review surface generates noise that the arbiter has to wade through on every cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Risk segmentation.&lt;/strong&gt; Can you identify high-risk areas — auth, billing, migrations, infrastructure? Is this encoded somewhere: path conventions, annotations, docs? Without it, agents treat all code as equally safe to modify. That's fine until they're modifying the billing module with the same confidence they'd modify a utility function.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Documentation for agents.&lt;/strong&gt; Is there an ARCHITECTURE.md? A CONTRIBUTING.md? An AGENTS.md (or equivalent) that explains how to run the service, how to test changes, how to add a feature? Agents can infer a lot from code — but they shouldn't have to infer things that could just be written down. Every missing doc is a guess the agent makes on your behalf.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10. Dev environment &amp;amp; reproducibility.&lt;/strong&gt; Can you bootstrap the service reliably from a clean clone? Are there hidden dependencies — secrets, external services that need to be running, manual steps nobody wrote down? Every hidden dependency is a potential point of silent failure when the agent tries to verify its own work.&lt;/p&gt;

&lt;h3&gt;
  
  
  My score: 52/100
&lt;/h3&gt;

&lt;p&gt;That number explains a lot of friction. When a Codex change touches six files across three modules, the reviewer has more surface area to miss things. When tests are flaky, the verification step is unreliable. When architectural rules live only in my head, no agent can enforce them — which made the stuck loop I described earlier almost predictable in hindsight.&lt;/p&gt;




&lt;h2&gt;
  
  
  A brief word about the "code quality matters less now" take
&lt;/h2&gt;

&lt;p&gt;I keep seeing this framing: in the era of agentic systems, code quality matters less because the AI will figure it out. Sloppy structure, vague names, tangled modules — the model is smart enough to work around it.&lt;/p&gt;

&lt;p&gt;I think this is exactly backwards, and I'm saying this as someone who just spent several evenings watching agents thrash inside a mediocre codebase.&lt;/p&gt;

&lt;p&gt;The agent can't ask you what you meant. It can't read the git history and infer the original design intent. It reads what's there. Ambiguous structure → ambiguous behavior. Hidden coupling → unexpected side effects. Vague names → hallucinated assumptions. No AGENTS.md → the agent guesses how your service is supposed to work and proceeds with confidence.&lt;/p&gt;

&lt;p&gt;Code quality doesn't matter less when agents are writing and reviewing your code. It matters &lt;em&gt;more&lt;/em&gt;, because the human who could previously fill in the gaps isn't filling them in anymore. The code is the only source of truth the agent has. It better be a good one.&lt;/p&gt;

&lt;p&gt;A score of 52/100 means I'm running agents on a codebase that's half-ready for them. Getting that number up is now higher on my list than any pipeline feature.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the pipeline looks like now
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec.yaml
  → codex cloud exec --branch &amp;lt;feature-branch&amp;gt;
  → [poll until READY]
  → codex cloud apply → git commit → git push
  → copilot review (with full iteration history)
  → stuck detector (iteration &amp;gt; 1)
  → arbiter (severity-aware)
  → if fix: loop with feedback header
  → if submit: open PR
  → if escalate: surface to human with reason
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix iterations stack as commits on the branch. Each commit message is generated by Copilot — a prompt built around conventional commit rules (&lt;code&gt;type(scope): description&lt;/code&gt;), first line of output taken as the message, with a fallback to &lt;code&gt;chore: apply codex changes&lt;/code&gt; if Copilot times out or returns nothing. Squash-merge when done. The history is readable: you get actual meaningful commit messages at each iteration, not just &lt;code&gt;iteration 2&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I'm going with this
&lt;/h2&gt;

&lt;p&gt;My actual goal is a system where agents write and review code autonomously, and I step in rarely — for escalations, ambiguous specs, and the cases that genuinely need human judgment.&lt;/p&gt;

&lt;p&gt;Right now I'm in the loop more than I want to be. Some of that is tooling immaturity. Some of it is the 52/100. Some of it is that spec-writing is still entirely manual — and as I learned, a bad spec defeats even a well-tuned pipeline.&lt;/p&gt;

&lt;p&gt;Here's what's on the roadmap, roughly grouped by problem area:&lt;/p&gt;

&lt;h3&gt;
  
  
  Review and verification
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Verification ladder.&lt;/strong&gt; Right now the arbiter makes a judgment call about whether something is "done." I want to replace that with structured &lt;code&gt;must_haves&lt;/code&gt; in the task YAML — a list of requirements that get verified against the diff at four tiers: static (file/export presence), command (tests pass), behavioral (observable output), or human (escalate). Submit is only allowed when every must-have passes. No more "looks good to me" from the arbiter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Better stuck detection.&lt;/strong&gt; The current loop detector catches cycles at the review level. I want to add diff-level detection: if Codex produces the same diff twice, fire a diagnostic retry with a targeted prompt. On a second identical diff, escalate with a structured breakdown showing exactly which review comments went unaddressed. Less "something seems wrong," more "here is precisely what didn't change and why."&lt;/p&gt;

&lt;h3&gt;
  
  
  Context and memory
&lt;/h3&gt;

&lt;p&gt;This is the area I'm most excited about. Right now each Codex submission is stateless — it knows the spec and the feedback, nothing else. Over a multi-step task, that's a problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fresh context injection.&lt;/strong&gt; Before each Codex submission, prepend summaries of completed steps and a decisions register to the prompt. Prevents Codex from re-implementing utilities already built by earlier steps. Capped at 2000 tokens so it doesn't eat the context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decisions register.&lt;/strong&gt; A &lt;code&gt;.vexdo/decisions.md&lt;/code&gt; file — an append-only table of architectural decisions made during execution: which validation library was chosen, what the storage strategy is, naming conventions adopted. The arbiter populates it automatically. Every subsequent step prompt gets it injected. The goal: agents that build on prior decisions instead of relitigating them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scout agent.&lt;/strong&gt; A focused Claude call before each Codex submission that scans the target service's codebase and returns relevant existing files, reuse hints, and conventions to follow. Non-fatal: if Scout fails, execution continues without it. But when it works, Codex stops reinventing things that already exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Adaptive replanning.&lt;/strong&gt; After each step completes, a lightweight Claude call checks whether remaining step specs are still accurate given what was actually built. Proposes updates for me to confirm before the next step runs. Multi-step plans rarely survive contact with reality unchanged — this is the mechanism for adjusting without rewriting everything manually.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Continue-here protocol.&lt;/strong&gt; Right now if the process crashes mid-task, you start over. I'm adding a &lt;code&gt;.vexdo/continue.md&lt;/code&gt; checkpoint written at every major phase transition — codex submitted, codex done, review iteration, arbiter done. &lt;code&gt;vexdo start --resume&lt;/code&gt; reads the checkpoint and picks up from exactly where it left off. This matters more than it sounds once tasks are running for 30+ minutes across multiple iterations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability and interaction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost and token tracking.&lt;/strong&gt; Every Claude API call will capture token usage and estimated cost. Per-step and total costs shown in &lt;code&gt;vexdo status&lt;/code&gt;. Optional budget ceiling in &lt;code&gt;.vexdo.yml&lt;/code&gt; that pauses execution before overspending. Right now I have no idea what a task costs until I check my API bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UAT script generation.&lt;/strong&gt; After all steps complete, Vexdo writes &lt;code&gt;.vexdo/uat.md&lt;/code&gt; — a human test script derived from step must-haves and arbiter summaries. &lt;code&gt;vexdo submit&lt;/code&gt; warns if UAT items are unchecked (override with &lt;code&gt;--skip-uat&lt;/code&gt;). The dream of fully autonomous code is great; the reality is that some things still need a human to click through the UI once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discuss command.&lt;/strong&gt; &lt;code&gt;vexdo discuss &amp;lt;task-id&amp;gt;&lt;/code&gt; opens an interactive Claude session with full task context pre-loaded — what was built, what decisions were made, what's still pending. Ask questions, queue spec updates for pending steps, steer execution from a second terminal while &lt;code&gt;start&lt;/code&gt; is running. The CLI as a conversation partner, not just an executor.&lt;/p&gt;




&lt;p&gt;Getting the codebase score above 80 will get me closer to the goal. So will all of the above. The common thread: the more context agents have, the less they guess. The less they guess, the fewer loops. The fewer loops, the closer I get to actually closing my laptop and coming back to a PR that's ready.&lt;/p&gt;

&lt;p&gt;One problem at a time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;vexdo is open source — &lt;a href="https://github.com/vexdo/vexdo-cli" rel="noopener noreferrer"&gt;github.com/vexdo/vexdo-cli&lt;/a&gt;. If you're building something similar or have hit these problems differently, I'd like to hear about it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>codereview</category>
    </item>
    <item>
      <title>When Your Code Compiles But You Don't</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Sat, 14 Mar 2026 11:04:07 +0000</pubDate>
      <link>https://dev.to/ubcent/when-your-code-compiles-but-you-dont-2hmn</link>
      <guid>https://dev.to/ubcent/when-your-code-compiles-but-you-dont-2hmn</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/wecoded-2026"&gt;2026 WeCoded Challenge&lt;/a&gt;: Echoes of Experience&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  On burnout, imposter syndrome, and finding your voice as a developer abroad — and the unexpected failure that woke me back up.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ERROR] Input suppressed. Reason: unknown.
[WARN]  Motivation: degraded.
[ERROR] Identity: not found.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I stared at that metaphor for a long time before I realised it was my year.&lt;/p&gt;

&lt;p&gt;There's a specific kind of silence that only developers know. Not the comfortable silence of deep focus, or the peaceful silence of a solved problem. I'm talking about the silence of knowing the answer — and choosing not to say it.&lt;/p&gt;

&lt;p&gt;I've sat in dozens of meetings, heart quietly racing, watching a discussion go in circles — sometimes toward a solution I already knew was wrong — while a voice in my head repeated the same line: &lt;em&gt;your input isn't needed here.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It wasn't shyness. It wasn't a language barrier, though I'd relocated abroad and English was my working language. It was something quieter and more corrosive: a deep-seated belief that I hadn't earned the right to take up space.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three kinds of silence
&lt;/h2&gt;

&lt;p&gt;Seniority is contextual. I knew that intellectually. I didn't know it would feel like starting over.&lt;/p&gt;

&lt;p&gt;I call it the three silences, because that's how it showed up for me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The language silence.&lt;/strong&gt; I'd draft a message in Slack, then delete it. Not because it was wrong — because I'd reread it three times wondering if the phrasing sounded &lt;em&gt;too foreign&lt;/em&gt;, too formal, too something. I started filtering myself before I even had the chance to be misunderstood.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The professional silence.&lt;/strong&gt; I had ideas. Lots of them — architectural improvements, product suggestions, side projects that could've become real tools. But every time I thought about sharing them, imposter syndrome arrived right on cue — and at senior level, it doesn't sound naive. It sounds reasonable: &lt;em&gt;This has already been done. There's a library for it, a blog post about it, a better implementation of it somewhere on GitHub. What are you going to add?&lt;/em&gt; The voice is more polished. It's harder to argue with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The inner silence.&lt;/strong&gt; This one came last, and it was the loudest. There's a particular brand of exhaustion that burnout brings — not tiredness, but numbness. I'd open my laptop, stare at the code, and feel nothing. Not frustration, not curiosity. Just absence. The IDE was running. I was not.&lt;/p&gt;




&lt;h2&gt;
  
  
  The lie I told myself about time
&lt;/h2&gt;

&lt;p&gt;For a long time, I explained away my stagnation with a single story: &lt;em&gt;I don't have time.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I was busy. The sprints were packed, life abroad had logistics I hadn't anticipated, and building a personal brand felt like a luxury for people who had already "made it." I had ideas for articles, side projects, open-source contributions — all neatly filed in Notion drafts and half-opened browser tabs, where they couldn't be judged.&lt;/p&gt;

&lt;p&gt;Here's the thing about that story: it was comfortable. And it was a lie.&lt;/p&gt;

&lt;p&gt;The truth was that I had pockets of time. I just didn't trust myself enough to use them. Writing a post felt pointless if nobody cared. Building something felt arrogant if I might fail.&lt;/p&gt;

&lt;p&gt;Motivation wasn't missing. It was buried under a layer of fear I'd labelled as a scheduling problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What burnout actually looks like (from the inside)
&lt;/h2&gt;

&lt;p&gt;People talk about burnout like it's a dramatic collapse. And sometimes it is. But the version I lived was quieter — a slow dimming rather than a sudden blackout.&lt;/p&gt;

&lt;p&gt;I still shipped features. I still showed up to standups. From the outside, I was probably fine. From the inside, I was running on a kind of professional autopilot — doing the work, but completely disconnected from the reason I got into this field in the first place.&lt;/p&gt;

&lt;p&gt;The hardest part wasn't the exhaustion. It was losing the thread back to myself — to the person who once pulled all-nighters not because of deadlines, but because the problem was just too interesting to put down.&lt;/p&gt;




&lt;h2&gt;
  
  
  The failure that woke me up
&lt;/h2&gt;

&lt;p&gt;Recovery didn't come from a productivity hack or a helpful conversation. It came from a system design interview that went badly wrong.&lt;/p&gt;

&lt;p&gt;I'd prepared. I knew the patterns. But the moment the interviewer started asking questions, something broke down. I couldn't structure my thinking. I went in circles. I could see the confusion on the screen and I couldn't stop it.&lt;/p&gt;

&lt;p&gt;I failed. Clearly, decisively, unambiguously.&lt;/p&gt;

&lt;p&gt;And something strange happened in the days that followed: I felt &lt;em&gt;alive&lt;/em&gt; again.&lt;/p&gt;

&lt;p&gt;Not happy — it stung badly. But underneath the sting was something I hadn't felt in months: genuine engagement. I was frustrated, yes. But frustration means you &lt;em&gt;care&lt;/em&gt;. It means the signal is still there. After months of numbness, even a hard feeling was proof that I was still in there somewhere.&lt;/p&gt;

&lt;p&gt;That failure did what nothing else had: it threw a stack trace. Suddenly I could see exactly where I'd been running in silent degraded mode. The interview didn't break me — it interrupted the autopilot. And that interruption was what I needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[INFO]  Exception caught: SystemDesignInterviewFailure
[INFO]  Stack trace:
          - few years autopilot
          - suppressed ideas
          - unshared voice
[INFO]  Attempting recovery...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Finding the voice again
&lt;/h2&gt;

&lt;p&gt;After that, things moved differently.&lt;/p&gt;

&lt;p&gt;I opened one of those Notion drafts and just finished it. I left a comment on a GitHub issue I'd been silently following for weeks. Then I did the thing I'd been postponing the longest: I started an open-source project. Not because I was sure it was needed — but because building it in the open meant I couldn't quietly abandon it. I wrote a couple of posts along the way, sharing things I'd stumbled across while building. Small discoveries, not grand conclusions. Public ones. With a name attached.&lt;/p&gt;

&lt;p&gt;And something unexpected happened: people responded. Not to a polished, authoritative voice I thought I needed to have — but to the honest, unguarded version I'd been afraid to show.&lt;/p&gt;

&lt;p&gt;It turns out authenticity travels across languages. Struggle is not a regional dialect.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd tell my quieter self
&lt;/h2&gt;

&lt;p&gt;If I could go back to that person staring at a blinking cursor, unsure whether their ideas mattered, I'd say a few things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your accent is not a qualifier for your competence.&lt;/strong&gt; Years of experience don't make this easier — they just make you better at hiding the discomfort. The nervousness you feel speaking in a second language is not evidence that you don't belong. It's evidence that you're doing something genuinely hard, and doing it anyway.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Imposter syndrome isn't a diagnosis — it's a symptom.&lt;/strong&gt; It usually means you care deeply and you're in new territory. Both of those are good things, even when they don't feel like it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The ideas you're sitting on are not too small.&lt;/strong&gt; The blog post you're not writing, the project you're not building, the comment you keep deleting — none of those need to be perfect to be worth sharing. Done and imperfect is infinitely more valuable than flawless and invisible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Burnout is not a character flaw.&lt;/strong&gt; It's what happens when a system runs too long without maintenance. You are allowed to rest. You are allowed to be unproductive. You are allowed to be a person first and an engineer second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;And sometimes the failure you're dreading is the only thing loud enough to reach you.&lt;/strong&gt; Not because pain is a good teacher, but because numbness has no edges. Failure does.&lt;/p&gt;




&lt;p&gt;I'm writing this now — this very post — because I'm on the other side of that quiet period. Not because everything is solved, but because I finally stopped waiting to feel ready.&lt;/p&gt;

&lt;p&gt;If you're in the thick of it right now — navigating a new country, swallowing your ideas, running on empty — I want you to know: this is not permanent. The silence ends. And when you start speaking again, you'll find that people were waiting to hear you all along.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your voice belongs here. It always did.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have you experienced burnout or imposter syndrome in your tech career? I'd love to hear what shifted things for you — drop it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>wecoded</category>
      <category>dei</category>
      <category>career</category>
    </item>
    <item>
      <title>I Built a Local AI Dev Pipeline That Reviews Its Own Code Before Opening a PR</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Wed, 11 Mar 2026 13:17:13 +0000</pubDate>
      <link>https://dev.to/ubcent/i-built-a-local-ai-dev-pipeline-that-reviews-its-own-code-before-opening-a-pr-geg</link>
      <guid>https://dev.to/ubcent/i-built-a-local-ai-dev-pipeline-that-reviews-its-own-code-before-opening-a-pr-geg</guid>
      <description>&lt;p&gt;&lt;em&gt;How I got tired of being the glue between AI tools and automated the whole thing with vexdo&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I've been using AI coding assistants for about a year now. Claude Code for planning and spec writing, Codex for the actual implementation, Copilot for inline suggestions. The results were genuinely good — but the &lt;em&gt;process&lt;/em&gt; was exhausting.&lt;/p&gt;

&lt;p&gt;Every task looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write a spec with Claude Code&lt;/li&gt;
&lt;li&gt;Copy-paste it into Codex&lt;/li&gt;
&lt;li&gt;Wait for Codex to finish&lt;/li&gt;
&lt;li&gt;Open a PR&lt;/li&gt;
&lt;li&gt;Manually request a Copilot review&lt;/li&gt;
&lt;li&gt;Read the review comments&lt;/li&gt;
&lt;li&gt;Decide which ones matter&lt;/li&gt;
&lt;li&gt;Copy the important ones back into Codex&lt;/li&gt;
&lt;li&gt;Wait again&lt;/li&gt;
&lt;li&gt;Repeat until it looks good&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I was the glue. Every step required me to context-switch, copy text between tools, and make judgment calls. The AI was doing the creative work, but I was doing all the plumbing.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;vexdo&lt;/strong&gt; — a local CLI pipeline that automates the entire cycle: spec → implementation → review → fixes → PR. Human intervention only when something goes wrong.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclaimer for my boss: this is exclusively a personal project. I have never used any of this for work tasks. Not once. 😄&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fair warning before we go further: this is an experiment, not a production-ready tool.&lt;/strong&gt; It works, I use it, but it's very much a proof of concept. The goal was to validate the architecture and see where it breaks — not to ship a polished product. If you're looking for something battle-tested, this isn't it yet. If you're curious about the pattern and want to hack on it, read on.&lt;/p&gt;

&lt;p&gt;Here's what I learned building it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The core idea: review &lt;em&gt;before&lt;/em&gt; the PR
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools open a PR and then review it. This makes sense in a human workflow — you open a PR, someone reviews it, you fix things.&lt;/p&gt;

&lt;p&gt;But in an automated pipeline, this creates a mess. You end up with PRs full of back-and-forth commits like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feat: add /events endpoint
fix: add input validation (per review)
fix: fix validation again (per review)
fix: actually fix it this time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;vexdo flips this. Review happens &lt;strong&gt;on the local diff&lt;/strong&gt;, before a PR is ever opened. The pipeline iterates until the code is clean, then opens exactly one PR — already reviewed, already fixed.&lt;/p&gt;

&lt;p&gt;The git history stays clean. The PR is meaningful. You only get notified when there's something that actually needs a human decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  Spec-driven development as the foundation
&lt;/h2&gt;

&lt;p&gt;The whole thing only works if the agent knows what "done" looks like. That's where spec-driven development comes in.&lt;/p&gt;

&lt;p&gt;Every task in vexdo is a YAML file with a structured spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;task-001&lt;/span&gt;
&lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;POST&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;/events&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;endpoint"&lt;/span&gt;
&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
      &lt;span class="s"&gt;Implement a REST endpoint POST /events.&lt;/span&gt;

      &lt;span class="s"&gt;Acceptance criteria:&lt;/span&gt;
        &lt;span class="s"&gt;- Validates incoming payload against EventSchema&lt;/span&gt;
        &lt;span class="s"&gt;- Returns 201 with created event on success&lt;/span&gt;
        &lt;span class="s"&gt;- Returns 400 with validation errors on failure&lt;/span&gt;
        &lt;span class="s"&gt;- Unit tests cover happy path and validation errors&lt;/span&gt;

      &lt;span class="s"&gt;Architectural constraints:&lt;/span&gt;
        &lt;span class="s"&gt;- Use existing auth middleware, do not reimplement&lt;/span&gt;
        &lt;span class="s"&gt;- Do not modify existing endpoint interfaces&lt;/span&gt;

      &lt;span class="s"&gt;Critical if:&lt;/span&gt;
        &lt;span class="s"&gt;- No input validation&lt;/span&gt;
        &lt;span class="s"&gt;- Breaking change to existing API&lt;/span&gt;
        &lt;span class="s"&gt;- No tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;acceptance criteria&lt;/code&gt; and &lt;code&gt;critical if&lt;/code&gt; fields aren't just documentation — they're the ground truth that the reviewer and arbiter use to evaluate the code. No spec, no review. No review, no PR.&lt;/p&gt;

&lt;p&gt;I write these specs collaboratively with Claude Code before handing anything to Codex. This 10-minute investment saves hours of back-and-forth later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Codex for implementation (and not Claude Code)
&lt;/h2&gt;

&lt;p&gt;The obvious question: why not use Claude Code for the coding step too? It's clearly the better model for complex coding tasks.&lt;/p&gt;

&lt;p&gt;Cost.&lt;/p&gt;

&lt;p&gt;Claude Code is great but expensive for automated, unattended runs. When you're running a pipeline that might do 3 iterations of "write code → review → fix" per task, the token cost adds up fast — especially if you're running multiple tasks per day.&lt;/p&gt;

&lt;p&gt;Codex hits a much more comfortable price point for the implementation step. It's not as capable as Claude Code on hard problems, but for well-scoped tasks with a clear spec, it does the job at a fraction of the cost.&lt;/p&gt;

&lt;p&gt;The split I landed on: &lt;strong&gt;Codex does the implementation&lt;/strong&gt; (cheap, runs autonomously, good enough for scoped tasks), &lt;strong&gt;Claude Haiku does the review and arbitration&lt;/strong&gt; (also cheap, but here accuracy matters more than raw coding ability). Claude Code stays in my workflow for the part it's genuinely irreplaceable at — writing the spec interactively with me before the pipeline starts.&lt;/p&gt;

&lt;p&gt;One implementation detail worth noting: Codex runs with &lt;code&gt;--full-auto&lt;/code&gt; flag and &lt;strong&gt;doesn't commit anything&lt;/strong&gt;. All its changes sit as unstaged modifications. The review loop captures them via &lt;code&gt;git diff HEAD&lt;/code&gt; — staged and unstaged together. This means the entire set of changes Codex made is visible to the reviewer in one clean diff, not scattered across intermediate commits.&lt;/p&gt;

&lt;p&gt;If cost isn't a constraint for you, swapping Codex for Claude Code in the pipeline would probably improve results. The architecture supports it — it's just a config change.&lt;/p&gt;




&lt;h2&gt;
  
  
  The review loop: two Claude calls, not one
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. I don't use a single AI call to review the code. I use two, with deliberately isolated contexts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Call 1 — The Reviewer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The reviewer sees: the spec + the git diff. Nothing else.&lt;/p&gt;

&lt;p&gt;It returns a structured list of findings, using four severity levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"critical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/routes/events.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;23&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"No validation on req.body before passing to createEvent()"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggestion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add schema validation using existing validateBody middleware"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"important"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/routes/events.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Error from createEvent() not caught — unhandled rejection will crash the process"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggestion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Wrap in try/catch and return 500 with a generic message"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"minor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"file"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"src/routes/events.ts"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"line"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Inconsistent error message format compared to other endpoints"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"suggestion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Use errorResponse() helper for consistency"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The four severity levels are strictly defined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;critical&lt;/strong&gt; — breaks an acceptance criterion or architectural constraint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;important&lt;/strong&gt; — likely to cause bugs directly related to what the spec requires&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;minor&lt;/strong&gt; — code quality issue, but doesn't block the spec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;noise&lt;/strong&gt; — style or preference, spec-neutral&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reviewer's job is purely technical: does this code satisfy the spec?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Call 2 — The Arbiter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The arbiter sees: the spec + the diff + the reviewer's findings. It does &lt;em&gt;not&lt;/em&gt; see the history of how the spec was written. This isolation is intentional — it prevents the arbiter from being too lenient because it "knows" the original intent.&lt;/p&gt;

&lt;p&gt;The arbiter returns a decision:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fix"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Critical validation issue and unhandled error path must be resolved before merge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"feedback_for_codex"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add input validation to POST /events handler. Use the existing validateBody(EventSchema) middleware pattern from POST /users. The validation should happen before any database calls. Also wrap the createEvent() call in try/catch and return 500 for unexpected errors."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2 issues require fixing: missing validation, unhandled error path"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three possible decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;fix&lt;/strong&gt; — send &lt;code&gt;feedback_for_codex&lt;/code&gt; to Codex and iterate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;submit&lt;/strong&gt; — no critical or important spec violations, open the PR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;escalate&lt;/strong&gt; — something needs a human&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;submit&lt;/code&gt; threshold matters: the arbiter submits only when there are no &lt;strong&gt;critical or important&lt;/strong&gt; findings that reflect real spec violations. Minor and noise issues don't block submission. The spec is the bar, not stylistic perfection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When escalation triggers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The arbiter escalates in three distinct cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explicit conflict&lt;/strong&gt; — a reviewer comment contradicts the spec, or there's genuine architectural ambiguity the arbiter shouldn't resolve autonomously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max iterations reached&lt;/strong&gt; — the arbiter kept requesting fixes but ran out of iterations (configurable, default 3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing fix instructions&lt;/strong&gt; — the arbiter decided "fix" but didn't produce &lt;code&gt;feedback_for_codex&lt;/code&gt; (a guardrail against bad model outputs)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In all three cases, you get the full context: the spec, the diff, every review comment with severity and location, the arbiter's reasoning, and a summary. Enough to make a decision without re-reading the code from scratch.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Claude as arbiter works surprisingly well
&lt;/h2&gt;

&lt;p&gt;When I first designed this, I was skeptical that an LLM could reliably classify review comments. In practice, it works much better than expected — for a specific reason.&lt;/p&gt;

&lt;p&gt;The arbiter isn't making subjective judgments. It's doing a structured comparison: &lt;em&gt;does this review comment point to a violation of the acceptance criteria or architectural constraints in the spec?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If yes → critical or important, needs fixing.&lt;br&gt;
If no → minor or noise, can be ignored.&lt;br&gt;
If the review comment contradicts the spec → escalate.&lt;/p&gt;

&lt;p&gt;The spec acts as an objective grounding document. The arbiter doesn't need to have opinions about code quality in the abstract — it just needs to read and compare two documents. LLMs are very good at this.&lt;/p&gt;

&lt;p&gt;The key prompt constraint I found most important: &lt;strong&gt;the arbiter must not try to resolve conflicts between the reviewer and the spec&lt;/strong&gt;. When there's a conflict, it escalates. This keeps humans in the loop for the decisions that actually matter.&lt;/p&gt;


&lt;h2&gt;
  
  
  Multi-repo support without a monorepo
&lt;/h2&gt;

&lt;p&gt;Most of my projects have multiple services in separate repositories. I didn't want to force a monorepo structure just to use an automation tool.&lt;/p&gt;

&lt;p&gt;vexdo uses a simple project layout on your local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;projectRoot/
  .vexdo.yml          ← project config
  tasks/
    backlog/
    in_progress/
    review/
    done/
    blocked/
  service1/           ← git repo
  service2/           ← git repo
  service3/           ← git repo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;.vexdo.yml&lt;/code&gt; config maps service names to paths:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./backend&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend&lt;/span&gt;
&lt;span class="na"&gt;review&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-haiku-4-5-20251001&lt;/span&gt;
  &lt;span class="na"&gt;max_iterations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;auto_submit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="na"&gt;codex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gpt-4o&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service is its own git repo. vexdo treats &lt;code&gt;projectRoot&lt;/code&gt; as a workspace, not a repo.&lt;/p&gt;

&lt;p&gt;Multi-step tasks with dependencies work like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;contracts&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;EventType&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;shared&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;schema"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backend&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;contracts&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;handler&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;for&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;EventType"&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;frontend&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Display&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;EventType&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;event&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;list"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Steps without &lt;code&gt;depends_on&lt;/code&gt; can run in parallel (not implemented yet, but the architecture supports it). Steps with &lt;code&gt;depends_on&lt;/code&gt; run sequentially — the backend step doesn't start until contracts is reviewed, fixed, and submitted.&lt;/p&gt;

&lt;p&gt;Each step gets its own branch (&lt;code&gt;vexdo/task-001/backend&lt;/code&gt;), its own review loop, and its own PR. If the frontend step fails review 3 times and escalates, you get the full context of what happened in all previous steps, so you can make an informed decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the workflow looks like in practice
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initialize a new project&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ~/projects/my-project
vexdo init

&lt;span class="c"&gt;# Write a task spec (I do this with Claude Code interactively)&lt;/span&gt;
vim tasks/backlog/task-001.yml

&lt;span class="c"&gt;# Hand it off&lt;/span&gt;
vexdo start tasks/backlog/task-001.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then vexdo takes over. The output is a flat stream of progress markers — something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Step 1/2: backend: Add POST /events endpoint
→ Creating branch vexdo/task-001/backend
→ Running codex implementation for service backend
→ Starting review loop for service backend
Iteration 1/3
→ Collecting git diff for service backend
→ Requesting reviewer analysis (model: claude-haiku-4-5-20251001)
Review: 1 critical 1 important 1 minor
- critical (src/routes/events.ts:23): No validation on req.body
- important (src/routes/events.ts:31): Unhandled rejection from createEvent()
- minor (src/routes/events.ts:45): Inconsistent error message format
→ Requesting arbiter decision (model: claude-haiku-4-5-20251001)
→ Arbiter decision: fix (2 issues require fixing)
→ Applying arbiter feedback with codex
Iteration 2/3
→ Collecting git diff for service backend
→ Requesting reviewer analysis (model: claude-haiku-4-5-20251001)
Review: 0 critical 0 important 1 minor
→ Requesting arbiter decision (model: claude-haiku-4-5-20251001)
→ Arbiter decision: submit (no critical issues)

Step 2/2: frontend: Add POST /events endpoint
→ Creating branch vexdo/task-001/frontend
→ Running codex implementation for service frontend
→ Starting review loop for service frontend
Iteration 1/3
→ Collecting git diff for service frontend
→ Requesting reviewer analysis (model: claude-haiku-4-5-20251001)
Review: 0 critical 0 important 0 minor
→ Requesting arbiter decision (model: claude-haiku-4-5-20251001)
→ Arbiter decision: submit (no issues found)

✓ Task ready for PR. Run 'vexdo submit' to create PR.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I come back to two clean PRs, each with a review summary attached. I read the summary, look at the diff, hit merge. The whole thing took 8 minutes and I didn't touch it.&lt;/p&gt;

&lt;p&gt;The iteration logs are preserved in &lt;code&gt;.vexdo/logs/{taskId}/&lt;/code&gt; — one diff, one review JSON, and one arbiter JSON per iteration per service. &lt;code&gt;vexdo logs task-001&lt;/code&gt; shows a summary; &lt;code&gt;vexdo logs task-001 --full&lt;/code&gt; dumps everything including diffs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What happens on escalation
&lt;/h2&gt;

&lt;p&gt;When the arbiter escalates, vexdo prints the full context — spec, all review comments with locations, arbiter reasoning — and exits with a non-zero code.&lt;/p&gt;

&lt;p&gt;The task file moves to &lt;code&gt;tasks/blocked/&lt;/code&gt;. Importantly, &lt;strong&gt;the state is preserved&lt;/strong&gt; — &lt;code&gt;.vexdo/state.json&lt;/code&gt; stays on disk with &lt;code&gt;status: escalated&lt;/code&gt;. The branches are preserved too. This means you can inspect exactly what happened, fix the spec or the code manually, and decide how to proceed.&lt;/p&gt;

&lt;p&gt;The recovery path is still manual: run &lt;code&gt;vexdo abort&lt;/code&gt; to clear the state, then restart with an updated spec. Automated recovery from escalation is on the roadmap.&lt;/p&gt;




&lt;h2&gt;
  
  
  What doesn't work (yet)
&lt;/h2&gt;

&lt;p&gt;I want to be honest about the limitations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex has a complexity ceiling.&lt;/strong&gt; For well-scoped tasks — add an endpoint, update a client, add a utility function — it's great. For tasks that require deep understanding of implicit system invariants, it struggles. The spec helps a lot, but it's not magic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The arbiter can be too lenient.&lt;/strong&gt; If your spec is vague, the arbiter will be too. "Add proper error handling" is not a spec. "Return 400 with &lt;code&gt;{ error: string }&lt;/code&gt; for validation failures, 500 with a generic message for unexpected errors" is a spec.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No automatic rollback.&lt;/strong&gt; If step 3 of a 4-step task escalates, the previous steps are already complete (branches and, if &lt;code&gt;auto_submit: true&lt;/code&gt;, PRs are already created). You need to handle rollback manually. This is on the roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State recovery is basic.&lt;/strong&gt; If the process crashes mid-task, &lt;code&gt;vexdo start --resume&lt;/code&gt; picks up from the last completed step. But if it crashes mid-Codex-run, you need to clean up the unstaged changes manually before resuming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Only GitHub.&lt;/strong&gt; The PR creation is wired to the &lt;code&gt;gh&lt;/code&gt; CLI. GitLab, Gitea, and others aren't supported.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @vexdo/cli

&lt;span class="c"&gt;# In your project&lt;/span&gt;
vexdo init

&lt;span class="c"&gt;# Set your API key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key-here

&lt;span class="c"&gt;# Make sure you have the codex and gh CLIs installed&lt;/span&gt;
&lt;span class="c"&gt;# Then write a task and run it&lt;/span&gt;
vexdo start tasks/backlog/your-task.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo is at &lt;a href="https://github.com/vexdo/vexdo-cli" rel="noopener noreferrer"&gt;https://github.com/vexdo/vexdo-cli&lt;/a&gt;. Contributions welcome — especially around the state recovery story and parallel step execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The bigger idea
&lt;/h2&gt;

&lt;p&gt;What I built is less about vexdo specifically and more about a pattern: &lt;strong&gt;AI agents work best when they have structured evaluation criteria and a clear escalation path to humans.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The spec is the evaluation criteria. The arbiter is the evaluator. Escalation is the safety valve.&lt;/p&gt;

&lt;p&gt;Without the spec, you get an agent that does something but you're not sure if it's right. Without the arbiter, you get a flood of review comments with no prioritization. Without escalation, you get an agent that either loops forever or merges bad code.&lt;/p&gt;

&lt;p&gt;All three together create something that actually feels autonomous — not because it never needs you, but because it knows &lt;em&gt;when&lt;/em&gt; it needs you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;vexdo is open source under MIT. If you build something with it or find a bug, open an issue — I read them all.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>agents</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>I ran a privacy proxy on my AI traffic. Here's what it found.</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Fri, 06 Mar 2026 10:51:56 +0000</pubDate>
      <link>https://dev.to/ubcent/i-ran-a-privacy-proxy-on-my-ai-traffic-heres-what-it-found-4dbo</link>
      <guid>https://dev.to/ubcent/i-ran-a-privacy-proxy-on-my-ai-traffic-heres-what-it-found-4dbo</guid>
      <description>&lt;p&gt;When I built &lt;a href="https://github.com/ubcent/velar" rel="noopener noreferrer"&gt;Velar&lt;/a&gt; — a local proxy that masks sensitive data before it reaches AI providers — I mostly thought of it as a tool for &lt;em&gt;other people's&lt;/em&gt; problems.&lt;/p&gt;

&lt;p&gt;I was wrong.&lt;/p&gt;

&lt;p&gt;After running it on my own machine during normal browser-based interactions with ChatGPT, here's what it intercepted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Masked Items
----------------------------------------
API_KEY:        30 ███████████████░░░░░
ORG:             9 ████░░░░░░░░░░░░░░░░
JWT:             1 ░░░░░░░░░░░░░░░░░░░░
Total:          40
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;40 items. Without doing anything unusual. But the story behind that API_KEY number is what really got me.&lt;/p&gt;




&lt;h2&gt;
  
  
  30 API keys — before I even hit Send
&lt;/h2&gt;

&lt;p&gt;All 30 API_KEY detections came from a single session where I was editing a script directly inside the ChatGPT input field.&lt;/p&gt;

&lt;p&gt;Here's the thing most people don't realize: &lt;strong&gt;ChatGPT sends the contents of the input field to its servers in the background as you type.&lt;/strong&gt; Not when you hit Send — continuously, while you're still editing.&lt;/p&gt;

&lt;p&gt;So I pasted a script that contained an API key, spent a few minutes tweaking it before sending, and ChatGPT quietly transmitted that script — and the key inside it — 30 times to OpenAI's servers before I was done.&lt;/p&gt;

&lt;p&gt;I wasn't trying to send the key. I was just editing. That's the part that's hard to reason about intuitively.&lt;/p&gt;

&lt;p&gt;This is a real gotcha with browser-based AI chat: the moment sensitive data touches the input field, it's potentially already in transit — regardless of whether you decide to actually send the message.&lt;/p&gt;




&lt;h2&gt;
  
  
  What about the other numbers
&lt;/h2&gt;

&lt;p&gt;The 9 ORG detections are a good example of current limitations. These were false positives from the ONNX NER model — it flagged the Russian word "Расскажи" ("tell me") as an organization name. The model is trained on English only, so it occasionally misreads non-English text as named entities. Something I'm actively working on.&lt;/p&gt;

&lt;p&gt;The 1 JWT is probably real — likely from a session token that ended up in a request payload somewhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  A note on scope
&lt;/h2&gt;

&lt;p&gt;This data covers &lt;strong&gt;browser-based interactions only&lt;/strong&gt; — ChatGPT in the browser, routed through Velar's MITM proxy.&lt;/p&gt;

&lt;p&gt;Intercepting IDE tools like Cursor or GitHub Copilot is a different and harder problem. They communicate over gRPC with protobuf, which requires a different interception approach than standard HTTPS traffic. That's on the roadmap, but not there yet — and honestly, that's probably where the more interesting (and scarier) data would come from, given that those tools have access to your full codebase.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Velar does with detected values
&lt;/h2&gt;

&lt;p&gt;Each value gets replaced with a deterministic placeholder before the request is forwarded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sk-proj-abc123...  →  [API_KEY_1]
eyJhbGci...        →  [JWT_1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI still gets enough context to be useful. When the response comes back, Velar restores the originals — so your tools keep working normally.&lt;/p&gt;

&lt;p&gt;Everything runs locally. No cloud processing, no external logging, no callbacks home. MIT-licensed Go — you can read the source and verify.&lt;/p&gt;




&lt;h2&gt;
  
  
  The broader pattern
&lt;/h2&gt;

&lt;p&gt;AI coding tools are getting &lt;em&gt;more&lt;/em&gt; context access, not less. Cursor reads your whole codebase. Agents are being given filesystem and terminal access. The more capable these tools get, the more opportunities there are for sensitive data to end up in that context without anyone actively deciding to send it.&lt;/p&gt;

&lt;p&gt;The input field thing is a small example of a bigger pattern: the boundary between "data I'm sharing" and "data that's being transmitted" is increasingly blurry. Most developers I've talked to haven't thought carefully about where that boundary sits.&lt;/p&gt;




&lt;h2&gt;
  
  
  Caveats
&lt;/h2&gt;

&lt;p&gt;Velar is experimental — I'm still figuring out what it should become, and I'd be the first to say the detection isn't perfect. Regex-based detection for structured values like API keys is reasonably reliable. NER-based detection for things like names and organizations is still rough, as the false positives above show.&lt;/p&gt;

&lt;p&gt;Also, yes — Velar is itself a MITM proxy, which is a fair thing to be skeptical about. It only intercepts domains you explicitly configure. The source is open and auditable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ubcent/velar.git
&lt;span class="nb"&gt;cd &lt;/span&gt;velar
make build
./bin/velar ca init
./bin/velar start
./bin/velar proxy on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it for a few days and check &lt;code&gt;velar stats&lt;/code&gt;. I'm curious whether other people hit the same input-field behavior — or find something I haven't seen yet.&lt;/p&gt;

&lt;p&gt;If you try it, share your breakdown in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>chatgpt</category>
      <category>privacy</category>
      <category>security</category>
    </item>
    <item>
      <title>I realized my AI tools were leaking sensitive data. So I built a local proxy to stop it</title>
      <dc:creator>Dmitry Bondarchuk</dc:creator>
      <pubDate>Thu, 26 Feb 2026 11:31:45 +0000</pubDate>
      <link>https://dev.to/ubcent/i-realized-my-ai-tools-were-leaking-sensitive-data-so-i-built-a-local-proxy-to-stop-it-2pma</link>
      <guid>https://dev.to/ubcent/i-realized-my-ai-tools-were-leaking-sensitive-data-so-i-built-a-local-proxy-to-stop-it-2pma</guid>
      <description>&lt;p&gt;A few months ago I had a moment of uncomfortable clarity.&lt;/p&gt;

&lt;p&gt;I was using Cursor to work on a project that had database credentials in an .env file. The AI had full access to the codebase. I wasn't thinking about it - I was just coding. And then it hit me: &lt;strong&gt;all of this is going to their servers right now&lt;/strong&gt;. The keys, the internal URLs, everything.&lt;/p&gt;

&lt;p&gt;I stopped and thought about how long I'd been doing this without a second thought. And then I asked a few colleagues. Same story. Nobody was really thinking about it. We all just... trusted that it was fine.&lt;/p&gt;

&lt;p&gt;It probably is fine, most of the time. But "probably fine" is not a compliance posture. And as AI coding tools get deeper access to our codebases, the surface area for accidental leaks keeps growing.&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;&lt;a href="https://github.com/ubcent/velar" rel="noopener noreferrer"&gt;Velar&lt;/a&gt;&lt;/strong&gt; — a local proxy that sits between your app and AI providers, detects sensitive data, and masks it before it ever leaves your machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg4zwv8hz4v87ausunjr.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvg4zwv8hz4v87ausunjr.gif" alt=" " width="1024" height="640"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem is getting worse, not better
&lt;/h2&gt;

&lt;p&gt;Copilot, Cursor - these tools are genuinely useful. But they work by sending your code (and often a lot of surrounding context) to external APIs. Most developers don't think carefully about what's in that context.&lt;/p&gt;

&lt;p&gt;Common things that end up in AI requests without people realizing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS/GCP/Azure credentials accidentally committed or present in env files&lt;/li&gt;
&lt;li&gt;Database connection strings&lt;/li&gt;
&lt;li&gt;Internal API endpoints and tokens&lt;/li&gt;
&lt;li&gt;Customer emails or names in logs you're debugging&lt;/li&gt;
&lt;li&gt;JWTs from test sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this is malicious. It's just how development works. But "it's not malicious" doesn't mean it's not a problem when you're dealing with regulated data or working in an enterprise environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Velar works
&lt;/h2&gt;

&lt;p&gt;Velar runs locally as an HTTP/HTTPS proxy with MITM support. You configure it to intercept traffic to specific domains (like &lt;code&gt;api.openai.com&lt;/code&gt;), and it inspects outbound payloads before forwarding them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app → Velar → AI provider
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When it detects something sensitive, it replaces it with a deterministic placeholder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alice@company.com → [EMAIL_1]
AKIAIOSFODNN7EXAMPLE → [AWS_KEY_1]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, when the response comes back, Velar restores the original values — so your app keeps working exactly as expected.&lt;/p&gt;

&lt;p&gt;Everything happens locally. No external services, no logging to the cloud, no callbacks home. You can read the full source and verify this yourself — it's MIT-licensed Go code.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it detects
&lt;/h2&gt;

&lt;p&gt;Current detection is regex-based and covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Emails, phone numbers, names&lt;/li&gt;
&lt;li&gt;AWS, GCP, Azure credentials&lt;/li&gt;
&lt;li&gt;Private keys&lt;/li&gt;
&lt;li&gt;Database URLs&lt;/li&gt;
&lt;li&gt;JWTs&lt;/li&gt;
&lt;li&gt;High-entropy strings (potential secrets)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also optional ONNX NER support via a locally-downloaded model (&lt;code&gt;dslim/bert-base-NER&lt;/code&gt;) for more accurate PII detection. Fair warning: this part is still rough and doesn't always behave as expected — it's something I'm actively working on.&lt;/p&gt;




&lt;h2&gt;
  
  
  "But wait — you're asking me to install a MITM proxy?"
&lt;/h2&gt;

&lt;p&gt;Yes. This is the obvious concern, and it's a fair one.&lt;/p&gt;

&lt;p&gt;Here's the honest answer: Velar only intercepts traffic to domains you explicitly configure. By default that's &lt;code&gt;api.openai.com&lt;/code&gt;. It doesn't touch your banking traffic, your Slack messages, or anything else.&lt;/p&gt;

&lt;p&gt;More importantly — you can verify this. The network code is small and straightforward. There are no background processes phoning home. No analytics. No telemetry. Just a local proxy doing exactly what it says.&lt;/p&gt;

&lt;p&gt;I understand if that's still not enough for some people, and that's fine. But for developers who are already sending sensitive data to AI providers without any filtering layer — Velar represents a net improvement in privacy, not a reduction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ubcent/velar.git
&lt;span class="nb"&gt;cd &lt;/span&gt;velar
make build
./velar ca init
./velar start
./velar proxy on
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. You'll start seeing local notifications when Velar masks something in your AI traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where it's going
&lt;/h2&gt;

&lt;p&gt;Honestly — I'm not entirely sure yet. This is v0.0.3, explicitly experimental, and I'm still figuring out the right direction. Some things I'm thinking about: stricter blocking mode, a local dashboard, better cross-platform support (notifications are currently macOS-only, though the proxy itself runs anywhere). But nothing is set in stone.&lt;/p&gt;

&lt;p&gt;What I do know is that I'd rather ship something real and iterate based on feedback than plan in a vacuum.&lt;/p&gt;




&lt;p&gt;If this sounds useful, check it out on &lt;a href="https://github.com/ubcent/velar" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Issues, PRs, and honest feedback are all welcome.&lt;/p&gt;

&lt;p&gt;And if you've had your own "oh no, what have I been sending to ChatGPT/Claude" moment — I'd love to hear about it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
    </item>
  </channel>
</rss>
