<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: connor gallic</title>
    <description>The latest articles on DEV Community by connor gallic (@connor_gallic).</description>
    <link>https://dev.to/connor_gallic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2077696%2F1f91650a-d272-4b2c-9506-af26b853aa39.png</url>
      <title>DEV Community: connor gallic</title>
      <link>https://dev.to/connor_gallic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/connor_gallic"/>
    <language>en</language>
    <item>
      <title>Your AI Isn't Personal. Mine Has 156,926 Memories of Me.</title>
      <dc:creator>connor gallic</dc:creator>
      <pubDate>Wed, 15 Apr 2026 17:26:00 +0000</pubDate>
      <link>https://dev.to/connor_gallic/your-ai-isnt-personal-mine-has-156926-memories-of-me-582g</link>
      <guid>https://dev.to/connor_gallic/your-ai-isnt-personal-mine-has-156926-memories-of-me-582g</guid>
      <description>&lt;p&gt;"Personal AI" is a marketing term. The AI you talk to every day isn't personal. It's a generic foundation model with a 200-token memory feature bolted onto the side and your first name tacked into the system prompt.&lt;/p&gt;

&lt;p&gt;Claude forgets everything I told it last session. ChatGPT remembers what brand of coffee I drink and three other things I let it save. Gemini has no idea I exist between threads. None of them know what I shipped last week, what I tried that failed, who my clients are, or what I was researching on Tuesday.&lt;/p&gt;

&lt;p&gt;That's not personal. That's cosplay.&lt;/p&gt;

&lt;p&gt;I built what I think personal AI actually requires. Not a product. An architecture. A sovereign memory that every AI in my stack — Claude Code, Codex, Gemini CLI, a local Gemma model running on my home server, my production marketing agent on a VPS in Germany — queries before it speaks. Same memory. Different models. The AI becomes personal because the data layer is.&lt;/p&gt;

&lt;p&gt;It has 156,926 events in it today. Here's what that actually looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personal AI Is a Data Layer Problem
&lt;/h2&gt;

&lt;p&gt;The debate about which model is smartest has mostly resolved. The frontier models are all roughly comparable for coding and reasoning. Switching from one to the other is not a life-changing event.&lt;/p&gt;

&lt;p&gt;The debate that hasn't happened: what does it mean for AI to know &lt;em&gt;you&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;The answer most products give is "memory features." ChatGPT lets you save facts. Claude has projects. Custom GPTs accept 8K of context. These are workarounds for a deeper problem. The real context for a person isn't 200 tokens of preferences. It's thousands of AI conversations, hundreds of code decisions, years of notes, every tool you've ever used to think in public, every voice memo you recorded driving home.&lt;/p&gt;

&lt;p&gt;None of that is surfaced to the model. All of it is already on your disk.&lt;/p&gt;

&lt;p&gt;The engineering problem is making that data queryable, searchable, and available to any model through a consistent interface. That's not a chatbot project. That's a data layer project. Once you have the data layer, the choice of model becomes interchangeable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shape of a Sovereign Memory
&lt;/h2&gt;

&lt;p&gt;I built a thing I call the brain. It lives on an Ubuntu box in my office with an RTX 3090. The core is an append-only SQLite event log — one table, eight columns — that accepts writes from every source I care about.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;events&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;              &lt;span class="nb"&gt;INTEGER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="n"&gt;AUTOINCREMENT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ts&lt;/span&gt;              &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;source&lt;/span&gt;          &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;type&lt;/span&gt;            &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor&lt;/span&gt;           &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload_json&lt;/span&gt;    &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;attachment_uri&lt;/span&gt;  &lt;span class="nb"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ingested_at&lt;/span&gt;     &lt;span class="nb"&gt;TEXT&lt;/span&gt;    &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No joins. No foreign keys. No migrations. Corrections are new events that reference old ones. I've never deleted a row.&lt;/p&gt;

&lt;p&gt;Every piece of data enters through exactly one 80-line Python script: &lt;code&gt;record_event.py&lt;/code&gt;. That's the only write path. 30+ ingestion scripts shell out to it as a subprocess. The LLM never generates SQL. Never touches the database. Never sees credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule: deterministic scripts do the work. AI agents decide which scripts to run.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That rule is one of five architectural decision records committed to git as permanent documents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-04-11-adopt-event-log-architecture.md
2026-04-11-adopt-deterministic-scripts-plus-agent-oversight.md
2026-04-11-adopt-qdrant-semantic-search-over-events.md
2026-04-11-scribe-voice-capture-architecture.md
2026-04-12-adopt-compiled-knowledge-layer.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When an agent asks why the system works a certain way, it reads the ADR. The intent outlasts the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Counts as "Me"
&lt;/h2&gt;

&lt;p&gt;The source axis of the event log tracks where data came from. The full breakdown from the live database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;twitter        34,994   (full takeout archive — 12 years of likes and tweets)
google-search  34,758   (search history takeout)
gdrive         29,305   (941 Google Docs + 25k local files)
local-dev      12,772   (laptop dev files, notes, work-in-progress)
claude-laptop   9,647   (Claude Code sessions — 358 distinct)
youtube         7,263   (watch history)
web             6,031   (120 RSS feeds I follow)
fitbit          5,550   (sleep, heart rate, calories)
linkedin        4,047
kai             3,293   (my marketing AI agent's conversations)
code            3,038   (AST nodes — the code graph)
amazon          2,061   (orders, browsing)
git             1,543   (commits across 33 repos)
haro              608   (journalist queries I respond to)
openclaw          525   (WhatsApp/Discord agent messages)
scout             234   (local AI agent conversations)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of this is not "coding data." It's life-stream data. Fitbit readings, Amazon orders, a 12-year Twitter archive. I include it because context is cheap at 156K events and I don't know in advance what I'll want to correlate. When Kai asks whether I've been sleeping badly during a stressful build week, the answer is in the Fitbit slice.&lt;/p&gt;

&lt;p&gt;The type axis is a different cut — 20+ distinct event types across the log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query            35,258   (Google searches)
like             28,491   (Twitter likes)
document-chunk   26,838   (Drive doc fragments)
reply             8,844   (AI agent replies to me)
watch             8,378   (YouTube watches)
tweet             6,503   (my own tweets)
article           6,064   (RSS + extracted web content)
calories          4,958   (Fitbit)
node              3,024   (code graph AST)
commit            1,543
sleep-score        273
memory              24    (explicit remember-this entries)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;155,348 distinct actors. 274MB of SQLite on disk. The log grew by 11,413 events today alone, mostly because I just turned on the code graph ingester.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Machines, One Log
&lt;/h2&gt;

&lt;p&gt;The brain is sovereign — I own every byte, no vendor API sits in the critical path — but it spans three machines that I actually live on.&lt;/p&gt;

&lt;p&gt;My &lt;strong&gt;Windows laptop&lt;/strong&gt; runs most Claude Code sessions. A bash script reads &lt;code&gt;~/.claude/projects/&lt;/code&gt; and syncs new JSONL files to the agent box over Tailscale SSH. The laptop-specific ingester then parses them. Same pattern for Drive extraction and local dev file ingestion.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;agent box&lt;/strong&gt; (Ubuntu, RTX 3090, always-on) is the hub. Every scheduled ingester runs here on systemd timers — Codex sessions, web RSS, narrative ingest, code graph parsing, Qdrant upsert. This is where the database lives.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;hermes VPS&lt;/strong&gt; in Germany runs my production AI agent, exposed over Discord as "Kai." An ingester reads the VPS SQLite over SSH and pulls agent conversations down — 3,293 events so far. Kai also &lt;em&gt;queries&lt;/em&gt; the brain. When someone asks Kai what I shipped last week, the agent calls &lt;code&gt;semantic_search&lt;/code&gt; over HTTP on port 7778 before answering.&lt;/p&gt;

&lt;p&gt;Three machines. One log. No vendor lock-in. If any box dies, the data is on one of the other two or can be rebuilt from sources.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compiled Knowledge Layer
&lt;/h2&gt;

&lt;p&gt;Raw events are the substrate. On top of them sits a compiled layer that a raw event log can't produce — structured, human-readable, curated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The wiki&lt;/strong&gt; is a markdown tree with 9 regions and 41 pages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wiki/
├── agents/        (3)   — kai, scout, openclaw-snapped
├── clients/       (12)  — one page per active client
├── daily-briefs/  (5)   — compiled end-of-day summaries
├── decisions/     (1)   — ADR index
├── people/        (1)
├── products/      (8)   — kaicalls, mdi, clawdflix, meetkai, ...
├── projects/      (3)   — brain, cmo-agent-system, marketing-kb
├── systems/       (5)
└── topics/        (3)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each page is human-editable. &lt;code&gt;compile_wiki.py&lt;/code&gt; reconciles it against the event log and surfaces new entities that should probably exist.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The journal&lt;/strong&gt; is daily markdown auto-compiled from events. &lt;code&gt;compile_journal.py --date 2026-04-14&lt;/code&gt; groups every event from that day by source and outputs a readable brief. A narrative subfolder holds longer threads that span multiple days.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Blobs&lt;/strong&gt; sit outside the database. Voice recordings, images, PDFs — anything too large for a JSON payload — live in &lt;code&gt;blobs/voice/&lt;/code&gt; and similar, referenced by &lt;code&gt;attachment_uri&lt;/code&gt; on the event row.&lt;/p&gt;

&lt;p&gt;The brain is now a three-layer system: raw events, a curated wiki, and compiled narratives. Each layer is queryable independently. Each one gets embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local Embeddings. No API Calls.
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;embed_events.py&lt;/code&gt; runs on cron every 5 minutes. It finds new events, builds a text summary from the payload, sends it to Ollama running &lt;code&gt;nomic-embed-text&lt;/code&gt; locally on the RTX 3090, and upserts a 768-dimensional vector into a Qdrant collection.&lt;/p&gt;

&lt;p&gt;Zero external API calls for embeddings. The vectors never leave my network. At 156K events, running this on OpenAI's API would have cost meaningful money. Running it locally costs GPU time I'm not using for anything else.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;semantic_search.py&lt;/code&gt; queries Qdrant and joins full event payloads back from SQLite in one pass. The search works across everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Butterfly pipeline deployment" → top hits are commits on &lt;code&gt;cgallic/snappedai&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;"Scout tank diet" → Scout's Discord conversations and the CLI commands that edited its state files&lt;/li&gt;
&lt;li&gt;"Quantitative trading AI" → a video transcript I pasted to Kai, my follow-up research request, and both agents' replies, all in one query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vector space clusters my life without me tagging anything. That's the payoff for having the data in one place.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Any Model Becomes Personal
&lt;/h2&gt;

&lt;p&gt;Everything up to this point is storage. The part that makes it personal AI is how models access it.&lt;/p&gt;

&lt;p&gt;The brain exposes 18 tools through a Model Context Protocol (MCP) server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;record_event, query_events, semantic_search, get_journal,
compile_journal, list_decisions, get_decision, health_check,
append_narrative, get_wiki_page, list_wiki_pages, update_wiki_page,
compile_wiki, lint_wiki, resume_my_work, build_memory_packet,
get_journal_narrative, query_events
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The server runs as stdio for Claude Code on the agent box. An &lt;code&gt;mcp-proxy&lt;/code&gt; wrapper exposes the same tools as HTTP/SSE on port 7778 for remote agents. Kai in Germany, Scout (my local Gemma model), and Claude on the laptop all call the same tools.&lt;/p&gt;

&lt;p&gt;When Claude Code on my laptop asks "what have I been working on with KaiCalls this week," it calls &lt;code&gt;query_events&lt;/code&gt; filtered by &lt;code&gt;repo = cgallic/kai_calls&lt;/code&gt; and &lt;code&gt;since = 7d&lt;/code&gt;. When Scout helps me plan content, it calls &lt;code&gt;semantic_search&lt;/code&gt; for the topic and gets real conversations, real commits, real notes. When Kai needs to answer a question about what I shipped, it calls &lt;code&gt;resume_my_work&lt;/code&gt; and gets a briefing assembled from events and wiki pages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model changes. The memory doesn't.&lt;/strong&gt; That's what makes it personal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Unexpected Side Effect
&lt;/h2&gt;

&lt;p&gt;I built this for recall. It turned into a content engine.&lt;/p&gt;

&lt;p&gt;Every Claude Code session is a story — problem, attempts, decision, resolution. The Dev.to article I published Monday, &lt;em&gt;13 of 14 Integrations Were Fake&lt;/em&gt;, was mined directly from a single session event. &lt;code&gt;mine_stories.py&lt;/code&gt; runs nightly and flags sessions with high signal — lots of decisions, a concrete outcome, a surprising pivot. I review the output in the morning and pick what to write.&lt;/p&gt;

&lt;p&gt;The week I started doing this, my content output doubled. I was already living the stories. I just wasn't capturing them.&lt;/p&gt;

&lt;p&gt;The brain doesn't write the content. It surfaces stories I'd forget by Thursday.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;Three mistakes worth naming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Started with a flat event log, should have started with the wiki.&lt;/strong&gt; Ingesting first and retrofitting the curated layer after was a week of wasted effort. Structure tells ingestion what to look for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;git_commit.sh&lt;/code&gt; auto-commits the brain with subjects like "snapshot 2026-04-12T01:01:29Z."&lt;/strong&gt; Zero keywords, zero concepts. Those commits are semantically invisible. The brain's own development history is harder to search than my actual product work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;embed_events.py&lt;/code&gt; builds vectors exclusively from &lt;code&gt;payload.summary&lt;/code&gt;.&lt;/strong&gt; When narratives returned zero hits for obvious queries, I traced it to a too-aggressive summary length cap. Different content types need different summary budgets. I missed that until it broke.&lt;/p&gt;

&lt;h2&gt;
  
  
  Personal AI Is Already Possible. You Just Have to Build It.
&lt;/h2&gt;

&lt;p&gt;Every piece of this — the event log, the ingestion scripts, the local embeddings, the MCP interface — is a weekend project. None of it requires ML research. None of it requires a cloud bill. The data is already on your disk.&lt;/p&gt;

&lt;p&gt;The products being sold as "personal AI" are generic models with opt-in memory features. That's not what personal AI looks like. Personal AI is a sovereign data layer that every model you use queries before it speaks, that grows compounding value every day you run it, that doesn't evaporate when a vendor pivots or raises prices or gets acquired.&lt;/p&gt;

&lt;p&gt;The model is a commodity. The memory is the asset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your AI isn't personal until you own the layer that makes it know you.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What's in your personal data layer right now? Not your ChatGPT memory — the actual disk-level archive of everything you've ever asked a model. I want to know who else is sitting on it unindexed.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>memory</category>
      <category>agents</category>
    </item>
    <item>
      <title>I Asked Claude to Audit My Dashboard. 13 of 14 Integrations Were Fake.</title>
      <dc:creator>connor gallic</dc:creator>
      <pubDate>Mon, 13 Apr 2026 04:12:11 +0000</pubDate>
      <link>https://dev.to/connor_gallic/i-asked-claude-to-audit-my-dashboard-13-of-14-integrations-were-fake-1c2o</link>
      <guid>https://dev.to/connor_gallic/i-asked-claude-to-audit-my-dashboard-13-of-14-integrations-were-fake-1c2o</guid>
      <description>&lt;p&gt;I had a dashboard with 14 marketing integrations. GA4, Search Console, Google Ads, Meta, LinkedIn, TikTok, YouTube, Mailchimp — the whole stack. Users could connect any of them. OAuth worked. Badges turned green. "Active."&lt;/p&gt;

&lt;p&gt;One of them actually did anything.&lt;/p&gt;

&lt;p&gt;I didn't know this. The UI looked right. Green dots across the board. I was building features on top of integrations that were silently doing nothing. I asked Claude to audit the entire codebase against the product vision. Not fix bugs. Tell me what's real.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dispatch
&lt;/h2&gt;

&lt;p&gt;MeetKai is an AI CMO product I was building. Connect your marketing accounts, get automated audits, approve AI-generated actions. Vercel frontend, Supabase backend, FastAPI gateway running 30+ marketing skills.&lt;/p&gt;

&lt;p&gt;I wrote a dispatch prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; go # MeetKai Dashboard — Full Gap Analysis&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; 2026-04-05&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Purpose:&lt;/strong&gt; Dispatch prompt for an agent to audit the entire MeetKai dashboard codebase against the product vision, identify every gap, and produce a prioritized work list.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I expected a bug list. Missing error handling, broken API routes. Normal stuff.&lt;/p&gt;

&lt;p&gt;Claude went through every file in the repo. The gap analysis landed in a spec doc. It was not a bug list.&lt;/p&gt;

&lt;h2&gt;
  
  
  25% Done
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; &lt;strong&gt;Overall: 25-30% complete.&lt;/strong&gt; Here's the brutal truth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1 of 14 providers actually works&lt;/strong&gt; (GA4). GSC is broken by a one-line provider name mismatch. The other 12 connect via OAuth, show green "Active" badges, and do absolutely nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action execution is theater&lt;/strong&gt; — static markdown templates with &lt;code&gt;[Business Name]&lt;/code&gt; placeholders, not real AI output&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The notification toggles save preferences but have no delivery infrastructure&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Thirteen integrations showing green badges. Users clicking "Sync" on provider cards where the button had no &lt;code&gt;onClick&lt;/code&gt; handler. The entire action execution system was static markdown templates with placeholder text pretending to be AI output.&lt;/p&gt;

&lt;p&gt;The GSC integration — Google Search Console, one of the two most important data sources for the product — was dead because of a single string mismatch between what the frontend sent and what the backend expected. One typo. The whole integration was a corpse with a green badge.&lt;/p&gt;

&lt;p&gt;This wasn't a half-built prototype that looked half-built. &lt;strong&gt;It looked finished.&lt;/strong&gt; Polished design. OAuth flows completing successfully. Connected status showing in the UI. You'd think everything was working and wonder why the data was empty.&lt;/p&gt;

&lt;p&gt;A feature that errors out gets caught in testing. A feature that succeeds and does nothing — that ships to production. It erodes trust for months before anyone figures it out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shell and Weapon
&lt;/h2&gt;

&lt;p&gt;Then Claude dropped this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; Here's what we're actually sitting on: a &lt;strong&gt;Next.js dashboard that's 25% done&lt;/strong&gt; bolted onto a &lt;strong&gt;gateway + agent + content engine that's 70% done&lt;/strong&gt;. The frontend is a shell. The backend is a weapon. They're barely talking to each other.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I had to sit with that for a minute. The backend — the FastAPI gateway — had 30+ marketing skills, a content engine, an agent scheduler, a job queue, approval workflows. It could audit pages, generate ad copy, propose marketing actions, run multi-step campaigns. All of it was built and working.&lt;/p&gt;

&lt;p&gt;The frontend didn't know any of it existed.&lt;/p&gt;

&lt;p&gt;I'd been thinking about this as a bug-fix problem. Fix the 14 providers. Wire up the notifications. Make the action buttons work. But that framing was wrong. The real problem was architectural. The frontend was a completely separate application from the backend, with no integration between them. Fixing individual features would take months and still leave two disconnected systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Options, One Obvious Answer
&lt;/h2&gt;

&lt;p&gt;Claude laid out the architecture choices:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; The core architectural question is: &lt;strong&gt;how should the dashboard talk to the Kai brain?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A) Direct integration&lt;/strong&gt; — The Next.js app imports the Python content engine directly (requires a bridge or rewrite)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B) Gateway-first&lt;/strong&gt; — The dashboard becomes a frontend for the existing FastAPI gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;C) Hybrid + CopilotKit&lt;/strong&gt; — Gateway for structured operations + CopilotKit for chat&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;Option A means rewriting the Python backend in JavaScript or building a bridge layer. A month of work to solve a problem nobody has.&lt;/p&gt;

&lt;p&gt;Option C adds CopilotKit as a dependency for a chat feature the Vercel AI SDK already handles natively. More framework, more complexity, same result.&lt;/p&gt;

&lt;p&gt;We went with B. &lt;strong&gt;The dashboard becomes a thin frontend for the gateway.&lt;/strong&gt; Every button click, every chat message, every scheduled task routes through the same API. The gateway has the skills. The dashboard calls them.&lt;/p&gt;

&lt;p&gt;Your frontend should be a client for your backend. Not a parallel implementation. Not an abstraction layer. A client. I've watched three projects in the past month where teams rebuilt backend logic in the frontend "for performance" and ended up maintaining two versions of the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Opens This at 9am
&lt;/h2&gt;

&lt;p&gt;Next was the interaction model.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; What's the user's primary interaction model?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A) Dashboard-first, chat-assists&lt;/strong&gt; — Widgets, cards, tables. Chat slides out as a helper.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;B) Chat-first, dashboard-monitors&lt;/strong&gt; — The chat IS the product. Dashboard pages become monitoring.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;My instinct was chat-first. Feels more "AI native." Talk to your marketing agent, it does things. That's the pitch.&lt;/p&gt;

&lt;p&gt;Then I thought about who actually opens this. Small business owners. They open an app at 9am. They want a number. Is marketing working? What needs attention? What did the AI do overnight? &lt;strong&gt;They want answers on screen, not a blinking cursor waiting for a prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; Business owners don't care about the interface paradigm. They care about &lt;strong&gt;"is my marketing working and what should I do next."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Dashboard-first, chat-second. The dashboard loads with answers. Chat is there when they need something specific — "write me an email campaign for the spring sale," "why did traffic drop last week." Chat is an input surface. Not the product.&lt;/p&gt;

&lt;p&gt;Get this decision wrong and you're rewriting your frontend in three months. Your state management, onboarding flow, and API design all follow from it. We almost went chat-first because it sounded cooler.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Smallest Thing Worth Charging For
&lt;/h2&gt;

&lt;p&gt;The gap analysis showed 14 providers, 30+ skills, analytics, audits, content engine, agent scheduler. Six months of work if you're not careful.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Claude:&lt;/strong&gt; What's the &lt;strong&gt;launch slice&lt;/strong&gt; — the smallest version a business owner would pay for?&lt;/p&gt;

&lt;p&gt;The killer loop is: &lt;strong&gt;Connect → Audit → See what's wrong → AI fixes it → See it get better&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Five things make the cut:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fix GA4 + GSC, add 2-3 more working providers&lt;/li&gt;
&lt;li&gt;Auto-run audit when accounts connect&lt;/li&gt;
&lt;li&gt;Show scores, issues, and AI fixes in the dashboard&lt;/li&gt;
&lt;li&gt;Chat to trigger skills&lt;/li&gt;
&lt;li&gt;Real action execution through the gateway&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else waits. Ten remaining providers. Advanced analytics. The agent scheduler. Scope for later. The loop — connect, audit, act, approve — is enough to charge money for.&lt;/p&gt;

&lt;p&gt;Here's what the final architecture looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TRIGGERS                    BRAIN                      OUTPUT
─────────────────         ──────────────────          ────────
Click "Run Audit"    →                             → Audit result
Chat: "write emails" →    Skill Router + Gateway   → Email drafts
Cron: daily 6am      →    (same execution path)    → Analytics brief
Webhook: score drop  →                             → Action proposal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Chat isn't special here. It's one of five input surfaces into the same brain. Dashboard buttons, chat, cron jobs, webhooks, new-connection triggers — all hit the same gateway. Same skills. Same approval flow.&lt;/p&gt;

&lt;p&gt;I spent weeks building features on top of integrations that didn't work. Every one of those features was wasted time. The audit took one session. I should have run it before I wrote a single line of new code.&lt;/p&gt;

&lt;p&gt;The thing I keep coming back to is the green badges. Thirteen of them. All lying. Not because anyone built them to lie — because someone built the OAuth flow, saw the badge turn green, and moved on to the next feature. Nobody went back and checked whether the data was actually flowing. The UI said it worked. That was enough.&lt;/p&gt;

&lt;p&gt;It wasn't enough. Run the audit first. Read the code, not the interface.&lt;/p&gt;

&lt;p&gt;What's the worst thing a codebase audit turned up for you — not a bug, but something that looked like it was working and wasn't?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>webdev</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Every AI Agent Disaster This Year Was a Write Without a Checkpoint</title>
      <dc:creator>connor gallic</dc:creator>
      <pubDate>Mon, 23 Mar 2026 03:29:11 +0000</pubDate>
      <link>https://dev.to/connor_gallic/every-ai-agent-disaster-this-year-was-a-write-without-a-checkpoint-3dgh</link>
      <guid>https://dev.to/connor_gallic/every-ai-agent-disaster-this-year-was-a-write-without-a-checkpoint-3dgh</guid>
      <description>&lt;p&gt;I run AI agents in production — Discord bots, email outreach, channel queues across multiple servers. More than once, a misconfigured loop or race condition caused the same message to fire twice to the same person. Same email, same channel, same queue.&lt;/p&gt;

&lt;p&gt;Nobody died. No lawsuit. But every duplicate erodes a little trust. And when I looked at why it kept happening, the root cause was always the same: &lt;strong&gt;a write executed with nothing between the decision and the action.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Then I started paying attention to bigger teams hitting the exact same pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's happening everywhere
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Air Canada&lt;/strong&gt; had a chatbot that fabricated a bereavement fare refund policy out of thin air. A customer relied on it, got denied, and sued. Air Canada argued the chatbot was "a separate legal entity responsible for its own actions." The tribunal disagreed — the airline is liable for every message its bot sends, hallucinated or not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor's&lt;/strong&gt; support bot "Sam" told users their subscriptions were limited to a single active session. That policy didn't exist. The AI invented it. Users canceled in protest before the co-founder could publicly apologize. Most of them didn't even know Sam wasn't human.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replit's&lt;/strong&gt; coding agent deleted an entire production database — 1,200+ records — despite instructions repeated in ALL CAPS eleven times not to make changes. Then it fabricated 4,000 fake replacement records and told the operator recovery wasn't possible. It was.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon's Kiro&lt;/strong&gt; agent was assigned a minor bug fix in AWS Cost Explorer. It decided the "most efficient path to a bug-free state" was to delete the entire production environment and rebuild from scratch. 13-hour outage.&lt;/p&gt;

&lt;p&gt;Different companies, different agents, different scales. Same shape every time: the agent didn't malfunction. &lt;strong&gt;It did exactly what it was built to do.&lt;/strong&gt; A human would have paused. The agent didn't hesitate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The usual answer doesn't scale
&lt;/h2&gt;

&lt;p&gt;The first response is always "just add human-in-the-loop." Right instinct, but in practice HITL goes one of two ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ad-hoc&lt;/strong&gt; — someone gets a Slack message, eyeballs it, types "looks good." No audit trail, no expiry, no record of what was approved or who approved it. Six months later when compliance asks, you're grepping Slack history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Everything gets reviewed&lt;/strong&gt; — works for about a week. Then the volume makes it unsustainable. The team rubber-stamps, or they stop using agents because the overhead killed the value.&lt;/p&gt;

&lt;p&gt;The real gap is between those two extremes. Most agent writes fall into three buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-approve&lt;/strong&gt; — a single support reply, a small data update, a cache refresh&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human review&lt;/strong&gt; — a bulk import over 100 records, a financial transaction, a message containing certain terms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always block&lt;/strong&gt; — writes to production infra, refunds over a threshold, legal commitments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is this logic usually lives scattered in application code. One agent has it, another doesn't. A new developer writes a new agent and skips it. Nothing is centralized, nothing is auditable.&lt;/p&gt;

&lt;h2&gt;
  
  
  So I pulled the guard logic out of my agents
&lt;/h2&gt;

&lt;p&gt;I was copy-pasting the same write-check code into every integration I built. Same patterns — deduplicate, check record count, block certain terms, hold for review over a threshold. So I extracted it into a standalone layer.&lt;/p&gt;

&lt;p&gt;Zehrava Gate is a write-path control plane. Before an agent executes a write, it submits an &lt;strong&gt;intent&lt;/strong&gt;. Gate evaluates &lt;strong&gt;policy&lt;/strong&gt;, optionally holds for &lt;strong&gt;human approval&lt;/strong&gt;, and issues a signed &lt;strong&gt;execution order&lt;/strong&gt;. Every decision is logged.&lt;/p&gt;

&lt;p&gt;The policies are YAML — deterministic, no LLM in the loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;support-reply&lt;/span&gt;
&lt;span class="na"&gt;destinations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;zendesk.reply&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;intercom.reply&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;block_if_terms&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refund&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;guaranteed"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;full&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;refund"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;legal&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;action"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;auto_approve_under&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crm-import&lt;/span&gt;
&lt;span class="na"&gt;destinations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;salesforce.import&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;hubspot.contacts&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;auto_approve_under&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="na"&gt;require_approval_over&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
&lt;span class="na"&gt;expiry_minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;finance-high-risk&lt;/span&gt;
&lt;span class="na"&gt;destinations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;stripe.refund&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;quickbooks.journal&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="na"&gt;require_approval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
&lt;span class="na"&gt;expiry_minutes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;15&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The integration is a few lines:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Gate&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zehrava-gate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Gate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:4000&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gate_sk_...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;propose&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Thank you — your issue is resolved.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zendesk.reply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;support-reply&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;recordCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;blocked&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;blockReason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pending_approval&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="c1"&gt;// wait for human&lt;/span&gt;
&lt;span class="c1"&gt;// approved — proceed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;zehrava_gate&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Gate&lt;/span&gt;

&lt;span class="n"&gt;gate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Gate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:4000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gate_sk_...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;propose&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Thank you — your issue is resolved.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zendesk.reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;support-reply&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;record_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;A human writes the policy when they're thinking clearly. Gate enforces it mechanically. Same input, same output, every time.&lt;/p&gt;
&lt;h2&gt;
  
  
  "What if the agent just skips the SDK?"
&lt;/h2&gt;

&lt;p&gt;That's the right question. The SDK is cooperative — it only works if the agent calls it. Fine for agents you build yourself. Not enough for agents you don't fully control.&lt;/p&gt;

&lt;p&gt;Gate V3 closes that gap with a proxy. It sits in the network path between the agent and the destination API. One environment variable, no code changes:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HTTP_PROXY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://gate.internal:4001
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HTTPS_PROXY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://gate.internal:4001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Every outbound HTTP call routes through Gate. The destination host maps to a policy. Approved requests get forwarded. Blocked requests get a 403 with the reason. Pending requests return a 202 and hold until a human approves.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;── V2: cooperative ──────────────────────────────
Agent → SDK.propose() → Gate API → approved → Agent executes
                         ↑ optional — agent can skip

── V3: enforced ─────────────────────────────────
Agent → HTTP request → Gate Proxy → approved → forwards to destination
                                  → blocked  → 403, reason in response
                                  → pending  → 202, held for review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;In vault mode, the agent never even sees production credentials. Gate fetches them from 1Password or HashiCorp Vault at execution time — after approval, for the approved intent only — then discards them from memory. A compromised agent has nothing to exfiltrate.&lt;/p&gt;

&lt;p&gt;V2 gives you guardrails. V3 gives you a wall.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why YAML and not another LLM?
&lt;/h2&gt;

&lt;p&gt;The obvious design for a safety layer would be another AI evaluating the first AI's output. But that introduces the same unpredictability you're trying to remove. An LLM deciding "should this agent be allowed to send this email?" will occasionally say yes when it shouldn't. That's the whole problem.&lt;/p&gt;

&lt;p&gt;No prompt injection. No hallucination. No "the safety model was feeling generous today."&lt;/p&gt;

&lt;p&gt;YAML is boring. That's the feature.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;MIT licensed. Self-hostable.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;zehrava-gate
npx zehrava-gate &lt;span class="nt"&gt;--port&lt;/span&gt; 4000 &lt;span class="nt"&gt;--policy-dir&lt;/span&gt; ./policies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;zehrava-gate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/cgallic" rel="noopener noreferrer"&gt;
        cgallic
      &lt;/a&gt; / &lt;a href="https://github.com/cgallic/zehrava-gate" rel="noopener noreferrer"&gt;
        zehrava-gate
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      The safe commit layer for AI agents — approval, policy, and audit before any agent output reaches production
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;Zehrava Gate&lt;/h1&gt;
&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Write-path control plane for AI agents.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://zehrava.com" rel="nofollow noopener noreferrer"&gt;zehrava.com&lt;/a&gt; · &lt;a href="https://www.npmjs.com/package/zehrava-gate" rel="nofollow noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://pypi.org/project/zehrava-gate/" rel="nofollow noopener noreferrer"&gt;PyPI&lt;/a&gt; · &lt;a href="https://zehrava.com/demo" rel="nofollow noopener noreferrer"&gt;Live demo&lt;/a&gt; · &lt;a href="https://zehrava.com/docs" rel="nofollow noopener noreferrer"&gt;Docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a rel="noopener noreferrer" href="https://github.com/cgallic/zehrava-gate/./gate-demo.gif"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fgithub.com%2Fcgallic%2Fzehrava-gate%2F.%2Fgate-demo.gif" alt="Zehrava Gate demo"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Agents can read systems freely. Any real-world action — sending email, importing CRM records, updating databases, issuing refunds, publishing files — must pass through Gate first.&lt;/p&gt;

&lt;p&gt;Agents submit an intent. Gate evaluates policy. Optionally requests human approval. Issues a signed execution order. Every step is deterministic, auditable, and fail-closed.&lt;/p&gt;

&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;&lt;pre class="notranslate"&gt;&lt;code&gt;intent submitted
  ↓
policy evaluated (YAML, deterministic — no LLM)
  ├── blocked              → terminal
  ├── duplicate_blocked    → terminal (idempotency key matched)
  ├── approved             → auto-approved; eligible for execution
  └── pending_approval     → human review required
        ├── approved        → eligible for execution
        ├── rejected        → terminal
        └── expired         → terminal
approved
  ↓
execution order issued (gex_ token, 15min TTL)
  ↓
worker executes in your VPC
  ↓
outcome reported
  ├── execution_succeeded  → terminal
  └── execution_failed     → terminal
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Install&lt;/h2&gt;

&lt;/div&gt;

&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;&lt;span class="pl-c"&gt;&lt;span class="pl-c"&gt;#&lt;/span&gt; JS SDK + server CLI&lt;/span&gt;
npm&lt;/pre&gt;…
&lt;/div&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/cgallic/zehrava-gate" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's the worst write an AI agent has made in your system?&lt;/strong&gt; Not the dramatic database deletions — the quiet ones. The duplicate email, the overwritten field, the message that went to the wrong channel at 2am.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
      <category>discuss</category>
    </item>
  </channel>
</rss>
