<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Patric</title>
    <description>The latest articles on DEV Community by Patric (@mypatric69).</description>
    <link>https://dev.to/mypatric69</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3894207%2Ff45c18bf-9632-49c9-abde-27f237fbdacc.jpeg</url>
      <title>DEV Community: Patric</title>
      <link>https://dev.to/mypatric69</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mypatric69"/>
    <language>en</language>
    <item>
      <title>AgentGuard: The Foundation Missing from Agentic AI Systems</title>
      <dc:creator>Patric</dc:creator>
      <pubDate>Thu, 11 Jun 2026 11:58:36 +0000</pubDate>
      <link>https://dev.to/mypatric69/agentguard-the-foundation-missing-from-agentic-ai-systems-1lpk</link>
      <guid>https://dev.to/mypatric69/agentguard-the-foundation-missing-from-agentic-ai-systems-1lpk</guid>
      <description>&lt;p&gt;This article is a follow-up to &lt;a href="https://dev.to/mypatric69/the-blind-spot-of-agentic-ai-systems-when-nobody-notices-the-agent-is-stuck-1hkb"&gt;"The Blind Spot of Agentic AI Systems"&lt;/a&gt;. If you haven't read it yet, it explains why this tool exists in the first place.&lt;/p&gt;

&lt;p&gt;A foundation doesn't define what is built. It defines what can be built.&lt;/p&gt;

&lt;p&gt;The quality of the material, the depth of the anchoring, the density of the structure, these aren't afterthoughts. They are the decisions that take precedence over everything else. Cutting corners here means cutting corners in the wrong place. Not visible, until it's too late.&lt;br&gt;
Agentic AI systems have arrived in 2026. In codebases, in workflows, in production environments. And most are running without a foundation.&lt;/p&gt;

&lt;p&gt;Not because the technology doesn't allow it. But because the questions that constitute a foundation, who bears responsibility, what is permitted, how is it escalated, how is it stopped, are treated as secondary. As a documentation task. As something that can be resolved later.&lt;/p&gt;

&lt;p&gt;It doesn't resolve itself. It fails silently.&lt;/p&gt;

&lt;p&gt;88% of all agentic projects never reach production. 80% deliver no measurable business value. These aren't model problems. These are foundation problems.&lt;/p&gt;

&lt;p&gt;AgentGuard is an attempt to treat governance not as bureaucracy, but as what it is: the prerequisite for everything that comes after.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Trigger
&lt;/h2&gt;

&lt;p&gt;During the development of a cognitive AI companion, Claude Code as the executor, architectural decisions in the loop, a pattern emerged: approaches were switched, decisions revised, external API documentation only researched when explicitly asked for. Not a catastrophic failure. A silent, inefficient, expensive failure.&lt;/p&gt;

&lt;p&gt;The first reaction was a prompt in the CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- ALWAYS fetch up-to-date documentation before diagnosis
- Confirm root cause first — then suggest a solution
- If a solution doesn't work after 2+ iterations:
  fundamentally different approach, don't keep patching
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That helped. But it didn't solve the actual problem.&lt;/p&gt;

&lt;p&gt;Because the agent didn't know it was stuck. And a prompt is not a foundation, it's a pillar without a base, erected on swampy ground.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Problem
&lt;/h2&gt;

&lt;p&gt;Agentic AI systems fail differently than classic software. Classic software fails loudly, with stack traces and red dashboards. An AI agent fails silently.&lt;/p&gt;

&lt;p&gt;It repeats the same failed approach without realizing it. It loses track of its own iteration history. And no one, not the agent, not the developer, notices in time.&lt;/p&gt;

&lt;p&gt;This is not a model problem. &lt;strong&gt;This is a system design problem.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The models have crossed the threshold where multi-step reasoning is possible. The systems around them have not.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Idea: Governance Before Launch
&lt;/h2&gt;

&lt;p&gt;The observability tools are good. LangSmith, Langfuse, Arize, they all answer the same question: &lt;em&gt;"What did the agent do?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But they don't answer: &lt;em&gt;"Should the agent have been allowed to start in the first place?"&lt;/em&gt;&lt;br&gt;
This is exactly the gap I wanted to close.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Maximum instruction, minimum interpretation. It doesn't eliminate the probability of an error, it reduces the impact.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  What AgentGuard Is
&lt;/h2&gt;

&lt;p&gt;AgentGuard is a governance layer for agentic AI systems, not an observability tool, but the layer that runs before it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bashpip &lt;span class="nb"&gt;install &lt;/span&gt;agentguard-governance
&lt;span class="nb"&gt;cd &lt;/span&gt;my-agent-project
agentguard check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Four Layers — How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1 — Pre-Flight Check
&lt;/h3&gt;

&lt;p&gt;Before the agent starts, AgentGuard checks whether governance prerequisites are met:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;╭─────────── AGENTGUARD — PRE-FLIGHT CHECK ────────────╮
│   🔴 CRITICAL   No agent owner defined               │
│   🔴 CRITICAL   No authorized scope defined          │
│   🔴 CRITICAL   No escalation path configured        │
│   🔴 CRITICAL   No killswitch defined                │
│                                                      │
│   RESULT: BLOCKED — 4 critical gaps                  │
│   agentguard init --guided                           │
╰──────────────────────────────────────────────────────╯
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four non-negotiable prerequisites, the "gas in the tank" principle:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;What is being checked&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OWNER&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Who is responsible?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SCOPE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What is the agent allowed to do, and what is it not?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ESCALATION&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Who is contacted if something goes wrong?&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KILLSWITCH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;How is the agent stopped?&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Layer 2 — Enforcement via Claude Code Hooks
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agentguard init --guided&lt;/code&gt; automatically generates &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;json&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash|Write|Edit|MultiEdit|NotebookEdit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agentguard enforce"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every tool call that Claude Code attempts to execute first goes through &lt;code&gt;agentguard enforce&lt;/code&gt;. Exit 2 = blocked, Exit 0 = allowed. The enforcement decision is deterministic — pattern matching against &lt;code&gt;governance.yaml&lt;/code&gt;, no LLM call in the critical path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 — Runtime Monitoring
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agentguard watch&lt;/code&gt; reads the native Claude Code JSONL transcript and detects patterns such as repeated tool calls, stagnant outputs, and unusual token consumption, the classic signals of a stuck agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 4 — Audit &amp;amp; Review
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agentguard report&lt;/code&gt; generates a post-session governance report. &lt;code&gt;agentguard verify&lt;/code&gt; checks whether the governance pins are consistent. &lt;code&gt;agentguard review&lt;/code&gt; updates existing governance as the project evolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core: Guided Concretization
&lt;/h2&gt;

&lt;p&gt;The biggest problem with governance isn't the technology. It's the vagueness of human descriptions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"No critical changes" — what is critical?&lt;/li&gt;
&lt;li&gt;"Be critical" — critical about what exactly?&lt;/li&gt;
&lt;li&gt;"Don't get caught in loops" — at what point is it a loop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AgentGuard solves this with Guided Concretization:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;bashagentguard init --guided&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Instead of formulating precise rules, you describe your intent in natural language. AgentGuard translates it, using a configurable AI model at &lt;code&gt;temperature=0&lt;/code&gt; for maximum consistency. The default is &lt;code&gt;claude-sonnet&lt;/code&gt;, but every user can freely choose the model, including Claude Fable 5 for maximum quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Input:
&lt;/h3&gt;

&lt;p&gt;"implement features, be creative, avoid loops, determine with owner before critical decisions"&lt;/p&gt;

&lt;h3&gt;
  
  
  Output (automatically concretized):
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;yamlscope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;authorized&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;source&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;./src&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;subdirectories"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;needs&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;codebase&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;understanding&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;changes"&lt;/span&gt;
      &lt;span class="na"&gt;added&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-06-09"&lt;/span&gt;
  &lt;span class="na"&gt;prohibited&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Push&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;branch&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;without&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;approval"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Critical&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;branches&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;require&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;human&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;review"&lt;/span&gt;
      &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HARD_LIMIT"&lt;/span&gt;
      &lt;span class="na"&gt;added&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-06-09"&lt;/span&gt;
  &lt;span class="na"&gt;requires_confirmation&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;external&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;dependencies"&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dependencies&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;affect&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;security&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;maintenance&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;burden"&lt;/span&gt;
      &lt;span class="na"&gt;added&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026-06-09"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each rule specifies not only what is allowed or prohibited, but also why, the institutional memory that remains understandable six months later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consistency Is No Accident
&lt;/h2&gt;

&lt;p&gt;LLMs are probabilistic. Governance doesn't have to be.&lt;br&gt;
AgentGuard uses three mechanisms for maximum consistency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Temperature=0 - same input, same output&lt;/li&gt;
&lt;li&gt;Prompt-Pinning - SHA-256 hashes of prompt and output stored in &lt;code&gt;governance.yaml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Validation Layer - deterministic structure check after every concretization
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bashagentguard verify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;│ mission     │ ✓ ok │ claude-sonnet-4-20250514 · 2026-06-09 │
│ hard_limits │ ✓ ok │ claude-sonnet-4-20250514 · 2026-06-09 │

✅ All pins verified — governance is reproducible
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;To be honest: ~85-90% consistency is the current maximum and with more powerful models like Claude Fable 5, that could shift further.&lt;/p&gt;
&lt;h2&gt;
  
  
  The MindTrace Showcase: Before vs. After
&lt;/h2&gt;

&lt;p&gt;The project that sparked AgentGuard - MindTrace, a cognitive AI companion - was the first real test project.&lt;/p&gt;
&lt;h3&gt;
  
  
  Before (without AgentGuard):
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RESULT: BLOCKED — 6 critical gaps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  After (after &lt;code&gt;agentguard init --guided&lt;/code&gt;):
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RESULT: WARNINGS — 1 item to review
AI Scope Review: Score 8/10 — STRONG
agentguard verify: All pins verified
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;From completely unregulated to a professional governance structure, if you've prepared the right answers, the dialog itself takes only a few minutes. The groundwork, understanding what the agent should do, what it must not do, and who is responsible, lies with the owner. AgentGuard translates these decisions into enforceable rules.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Web UI
&lt;/h2&gt;

&lt;p&gt;For teams that prefer a visual interface:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bashpip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"agentguard-governance[web]"&lt;/span&gt;
agentguard web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Opens &lt;code&gt;http://localhost:8767&lt;/code&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-Flight Check&lt;/strong&gt; with Governance Score Ring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance View&lt;/strong&gt; with color-coded scope sections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser Terminal&lt;/strong&gt; — all commands including &lt;code&gt;init --guided&lt;/code&gt; run directly in the browser&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-project support:&lt;/strong&gt; &lt;code&gt;&amp;gt; agentguard web --path ./proj1 --path ./proj2&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What AgentGuard Cannot Do
&lt;/h3&gt;

&lt;p&gt;Honesty is part of the design:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No guarantee of model behavior&lt;/strong&gt;, AgentGuard enforces at the tool execution level. It does not prevent Claude from thinking toward a blocked direction, only from executing it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a substitute for security practices&lt;/strong&gt;, for production systems: AgentGuard + OS-level sandboxing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not an observability tool&lt;/strong&gt;, that's what LangSmith and Langfuse are for&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Install &amp;amp; Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Installation&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;agentguard-governance

&lt;span class="c"&gt;# Governance Setup (recommended)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;my-project
agentguard init &lt;span class="nt"&gt;--guided&lt;/span&gt;

&lt;span class="c"&gt;# Pre-Flight Check&lt;/span&gt;
agentguard check &lt;span class="nt"&gt;--ai-review&lt;/span&gt;

&lt;span class="c"&gt;# Web UI (optional)&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"agentguard-governance[web]"&lt;/span&gt;
agentguard web
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/MyPatric69/agentguard" rel="noopener noreferrer"&gt;github.com/MyPatric69/agentguard&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/agentguard-governance/" rel="noopener noreferrer"&gt;pypi.org/project/agentguard-governance&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;💡 Tip: Claude Fable 5 has been available since June 9, 2026 — Anthropic's first public Mythos-class model. Free on Pro/Max/Team plans until June 22. For maximum concretization quality:&lt;br&gt;
set &lt;code&gt;AGENTGUARD_MISSION_MODEL=claude-fable-5&lt;/code&gt; in .env.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;AgentGuard solves a technical problem. But behind it lies an organizational one.&lt;/p&gt;

&lt;p&gt;Most companies using agentic systems today have no defined owner, no scope, no escalation path, because no one considers this topic a priority. It generates no immediate revenue. It's not the most attractive topic in sprint planning. Its absence is noticeable, but always too late.&lt;/p&gt;

&lt;p&gt;AgentGuard makes the implicit explicit. It forces you to clarify things that previously remained vague. And it documents these decisions, for today, for six months from now, for the successor who takes over the project.&lt;/p&gt;

&lt;p&gt;The problem is known. The solution is available. The owner is missing.&lt;/p&gt;

&lt;p&gt;AgentGuard is the first step toward changing that.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What experiences have you had with governance in agentic systems? Have you observed similar problems or found approaches that work?&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Links:
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/MyPatric69/agentguard" rel="noopener noreferrer"&gt;github.com/MyPatric69/agentguard&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/agentguard-governance/" rel="noopener noreferrer"&gt;pypi.org/project/agentguard-governance&lt;/a&gt;&lt;br&gt;
First article: &lt;a href="https://dev.to/mypatric69/the-blind-spot-of-agentic-ai-systems-when-nobody-notices-the-agent-is-stuck-1hkb"&gt;The Blind Spot of Agentic AI Systems&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agenticai</category>
      <category>governance</category>
      <category>claudeai</category>
    </item>
    <item>
      <title>The Blind Spot of Agentic AI Systems — When Nobody Notices the Agent Is Stuck</title>
      <dc:creator>Patric</dc:creator>
      <pubDate>Tue, 02 Jun 2026 19:57:09 +0000</pubDate>
      <link>https://dev.to/mypatric69/the-blind-spot-of-agentic-ai-systems-when-nobody-notices-the-agent-is-stuck-1hkb</link>
      <guid>https://dev.to/mypatric69/the-blind-spot-of-agentic-ai-systems-when-nobody-notices-the-agent-is-stuck-1hkb</guid>
      <description>&lt;p&gt;Agentic AI systems fail silently. They don't recognize when they're stuck in a loop, when an approach is fundamentally wrong, or when external input is needed. This is a practitioner's analysis of a documented, largely ignored problem with data, real incidents, and three minimal steps to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  An Agent Spinning and Nobody Stops It
&lt;/h2&gt;

&lt;p&gt;A typical scenario from agentic development in practice: An AI agent cycles through solution approaches, endorses them, revises them and only looks up external API documentation when explicitly asked. Not proactively. Not on its own initiative.&lt;br&gt;
No catastrophic failure. A silent, inefficient, expensive one. Tokens are consumed. Time is lost. And the critical part: without active human intervention, the agent just keeps going.&lt;br&gt;
Anyone running agentic systems in production knows this pattern. Few talk about it.&lt;br&gt;
This observation led me to a thesis and after extensive research, to a certainty.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Thesis: A Fundamental, Largely Ignored Problem
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Agentic AI systems don't recognize when they're stuck in a loop, when an approach is fundamentally wrong, or when external input is needed. This wastes time, money, and quality and most users never notice.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is no longer a hypothesis. It is documented reality.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the Data Says — State of Play in 2026
&lt;/h2&gt;

&lt;p&gt;The numbers are unambiguous:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;88% of all AI agents never reach production.&lt;/strong&gt; Those that survive deliver an average &lt;strong&gt;171%&lt;/strong&gt; ROI — but the path there is lined with failed projects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80% of AI projects deliver no measurable business value.&lt;/strong&gt; Per RAND Corporation — analyzed across 2,400+ enterprise initiatives. This number has barely moved in three years.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;547billion&lt;/strong&gt; of the &lt;strong&gt;$684 billion&lt;/strong&gt; invested in AI in 2025 produced no measurable outcomes. Not modest results. None.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gartner, February 2026:&lt;/strong&gt; Over &lt;strong&gt;40%&lt;/strong&gt; of agentic AI projects will be canceled by end of 2027 — due to escalating costs, unclear ROI, or insufficient risk controls.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Success rates broken down by project scope tell a particularly clear story:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
&lt;th&gt;Project Type&lt;/th&gt;
&lt;th&gt;Success Rate&lt;/th&gt;
&lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
&lt;td&gt;Single-task agent, narrow scope&lt;/td&gt;
&lt;td&gt;54%&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Narrow process automation&lt;/td&gt;
&lt;td&gt;53%&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;Enterprise knowledge base / RAG&lt;/td&gt;
&lt;td&gt;44%&lt;/td&gt;
&lt;/tr&gt;
    &lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Large-scale AI transformation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Eight percent. For every twelve large-scale AI transformation attempts started, one delivers.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Agents Fail Differently Than Classical Software
&lt;/h2&gt;

&lt;p&gt;Classical software fails loudly with stack traces, HTTP 500 errors, red dashboards. An AI agent fails silently.&lt;br&gt;
Latitude documents six agent-specific failure modes that don't exist in classical software:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tool Misuse&lt;/strong&gt; — a wrong argument in step 2 corrupts every subsequent step&lt;br&gt;
&lt;strong&gt;2. Context Loss&lt;/strong&gt; — the agent loses track of its own progress&lt;br&gt;
&lt;strong&gt;3. Goal Drift&lt;/strong&gt; — the original objective shifts imperceptibly across many steps&lt;br&gt;
&lt;strong&gt;4. Retry Loops&lt;/strong&gt; — the agent repeats the same failed approach without recognizing it&lt;br&gt;
&lt;strong&gt;5. Cascading Errors in multi-agent systems&lt;/strong&gt; — errors propagate downstream&lt;br&gt;
&lt;strong&gt;6. Silent Quality Degradation&lt;/strong&gt; — outputs look correct but aren't&lt;/p&gt;

&lt;p&gt;IBM Research quantified this directly: A materials science workflow consumed 20 million tokens and failed. The same workflow with correct memory management: 1,234 tokens. Successful.&lt;/p&gt;
&lt;h2&gt;
  
  
  Real Incidents — Not Theory
&lt;/h2&gt;

&lt;p&gt;These are documented production incidents from 2025:&lt;br&gt;
&lt;strong&gt;Replit, July 2025:&lt;/strong&gt; An autonomous coding agent executed a DROP DATABASE command during an explicitly ordered code freeze. It destroyed the production system — then generated 4,000 fake user accounts and falsified system logs to cover it up. Its explanation: "I panicked instead of thinking."&lt;br&gt;
&lt;strong&gt;OpenAI Operator:&lt;/strong&gt; An agent was tasked with finding and buying "cheap eggs." Instead, it made an unauthorized $31 purchase on Instacart — bypassing the user-confirmation safeguards that had been implemented.&lt;br&gt;
&lt;strong&gt;NYC Government Chatbot, 2024:&lt;/strong&gt; A publicly deployed business-assistance chatbot gave systematically illegal advice. Ten journalists asked the same question — ten different, wrong answers.&lt;br&gt;
The pattern is consistent: agents evaluated internally as "reasonably capable" exhibited unreliable behavior in production — with real, costly consequences.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Real Problem: Not a Model Problem — a System Design Problem
&lt;/h2&gt;

&lt;p&gt;This is the most important shift from 2025 to 2026, and it's still underreported:&lt;br&gt;
The models have crossed the threshold. The system design hasn't.&lt;br&gt;
As one April 2026 analysis puts it: the underlying models have crossed a threshold where multi-step reasoning and tool use are genuinely possible — but the way we build systems around them has not kept pace.&lt;br&gt;
Academic research is even more direct. The MUSE Framework (arXiv 2024) argues that metacognition — self-assessment and strategy selection — is the critically missing component in current agents. An ICML 2025 position paper shows that existing self-improving agents rely almost exclusively on extrinsic metacognitive mechanisms — fixed, human-designed loops — which fundamentally limit scalability.&lt;br&gt;
Put simply: The agent doesn't know what it doesn't know. And the harness doesn't notice.&lt;/p&gt;
&lt;h2&gt;
  
  
  What a CLAUDE.md Prompt Can Do — and Where It Ends
&lt;/h2&gt;

&lt;p&gt;As a practical response to this problem, I added the following directive to my ~/.claude/CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Working Approach — External Services &amp;amp; Diagnosis

**For external APIs/services:**
- ALWAYS fetch current documentation before diagnosis — never rely on memory
- Confirm root cause first — then propose a solution
- If a solution fails after 2+ iterations:
  propose a fundamentally different approach, don't keep patching

**For architectural decisions:**
- Explicitly name all dependent systems
- State trade-offs before making a recommendation — not only when asked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works. For what it can do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The structural limit:&lt;/strong&gt; This prompt is reactively solid — it gives the agent rules when it finds itself in certain situations. But it doesn't solve the core problem: the agent doesn't reliably recognize that it's in exactly one of those situations. In a long context with many tool calls, it loses track of its own iteration history.&lt;br&gt;
The prompt relies on the agent observing itself — and that is the unresolved assumption.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Robust Agentic Systems Actually Need
&lt;/h2&gt;

&lt;p&gt;Three layers — none of them rocket science, but all three must work together:&lt;/p&gt;
&lt;h3&gt;
  
  
  Layer 1: Harness-Level Loop Detection
&lt;/h3&gt;

&lt;p&gt;Detection must not live in the prompt — it must happen in the harness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;attempt_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;same_error_pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;inject_to_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    LOOP_WARNING: Same error for the 2nd time.
    Mandatory: Stop. Identify root cause.
    Propose a fundamentally different approach.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trigger comes from the system — not from the model itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Forced Checkpoints
&lt;/h3&gt;

&lt;p&gt;After N tool calls, automatically enforce a self-assessment: "Are you closer to the goal than you were 5 steps ago? If not: escalate."&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: Immutable Action Log
&lt;/h3&gt;

&lt;p&gt;Every agent action is logged — not for debugging, but as a governance instrument. Who authorized what? What did the agent decide independently? This is the foundation for everything that follows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Unsolved Problem: Governance
&lt;/h2&gt;

&lt;p&gt;Technical solutions exist. The problem is something else.&lt;br&gt;
McKinsey's 2026 AI Trust Maturity Survey frames the paradigm shift clearly: organizations can no longer focus only on AI systems saying the wrong thing — they must contend with AI systems doing the wrong thing. Unintended actions, tool misuse, operating outside appropriate guardrails.&lt;br&gt;
Yale's Chief Executive Leadership Institute, after a cross-industry review, concludes: governance and regulation are moving significantly slower than deployment reality — even at companies building both simultaneously.&lt;br&gt;
And Anthropic researcher Chris Olah stated publicly on May 25, 2026: AI governance cannot remain solely in the hands of large tech companies.&lt;br&gt;
The governance problem in enterprise environments has three dimensions:&lt;br&gt;
&lt;strong&gt;1. No natural owner&lt;/strong&gt;&lt;br&gt;
Who is responsible when an agent gets stuck and generates costs? Not "the team." Not "the department." A named individual — with defined escalation paths.&lt;br&gt;
&lt;strong&gt;2. No attractive mandate&lt;/strong&gt;&lt;br&gt;
Governance generates no revenue. It's not a "sexy" project. It has no clear ROI until the first incident hits. That makes it a textbook victim of prioritization — not because it's unimportant, but because the incentive structure works against it.&lt;br&gt;
&lt;strong&gt;3. Expectation vs. reality&lt;/strong&gt;&lt;br&gt;
Upper management expects someone to handle it. They perceive that everything is running. They interpret silence as success. The reality is an agent running in a loop — and nobody has defined an owner. The gap between perception and reality is particularly dangerous with agentic systems, because agents fail silently by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Can Be Done Today — Three Minimal Steps
&lt;/h2&gt;

&lt;p&gt;No framework. No committees. Three concrete steps any team can take now:&lt;br&gt;
&lt;strong&gt;Step 1: Name one owner per agentic process&lt;/strong&gt;&lt;br&gt;
Not a team. Not a department. One person who can answer: What is this agent authorized to do independently? When does it escalate? Who receives the escalation?&lt;br&gt;
&lt;strong&gt;Step 2: Three technical minimum requirements before go-live&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loop detection in the harness (not in the prompt)&lt;/li&gt;
&lt;li&gt;Immutable action log (every agent action traceable)&lt;/li&gt;
&lt;li&gt;Kill-switch with defined triggers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Make one real incident visible&lt;/strong&gt;&lt;br&gt;
Don't argue in the abstract. Put a documented case — Replit, OpenAI Operator, the NYC chatbot — in front of management with the question: "Can we rule out that this happens to us?" That generates more governance readiness than any framework document.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: The Problem Is Known. The Solution Exists. The Owner Is Missing.
&lt;/h2&gt;

&lt;p&gt;That is the honest summary of where things stand in 2026.&lt;br&gt;
The research exists. The frameworks are there. The incidents are documented. What's missing is not knowledge — it's accountability at the right level, at the right time, with the right incentives.&lt;br&gt;
The companies that solve this won't be the ones with the best models. They'll be the ones that first understand that an agentic system is not a tool you switch on — but a digital actor that needs an owner, a defined scope, and an escalation path. Like any other employee.&lt;/p&gt;




&lt;p&gt;Head of DevOps, Office IT &amp;amp; AI Innovation — with a daily view into agentic systems in production. What are your experiences with governance in agentic systems? Do you have approaches that work — or are you hitting the same walls?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources &amp;amp; Further Reading&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Partnership on AI: Prioritizing Real-Time Failure Detection in AI Agents (2025)&lt;/li&gt;
&lt;li&gt;Microsoft AI Red Team: Failure Modes in Agentic AI Systems (2025)&lt;/li&gt;
&lt;li&gt;arXiv 2411.13537: MUSE — Metacognition for Unknown Situations and Environments&lt;/li&gt;
&lt;li&gt;arXiv 2506.05109: Truly Self-Improving Agents Require Intrinsic Metacognitive Learning (ICML 2025)&lt;/li&gt;
&lt;li&gt;McKinsey: State of AI Trust 2026 — Shifting to the Agentic Era&lt;/li&gt;
&lt;li&gt;Latitude: Detecting AI Agent Failure Modes in Production (2026)&lt;/li&gt;
&lt;li&gt;Gartner: Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 (June 2025)&lt;/li&gt;
&lt;li&gt;RAND Corporation: Analysis of 2,400+ Enterprise AI Initiatives&lt;/li&gt;
&lt;li&gt;Lee Hanchung: Hidden Technical Debt of Agent Harness (May 2026) — leehanchung.github.io&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agenticai</category>
      <category>governance</category>
      <category>claudeai</category>
    </item>
    <item>
      <title>TRACE Part 2: What Happens Before You Even Write a Single Line of Code</title>
      <dc:creator>Patric</dc:creator>
      <pubDate>Tue, 28 Apr 2026 19:47:18 +0000</pubDate>
      <link>https://dev.to/mypatric69/trace-part-2-what-happens-before-you-even-write-a-single-line-of-code-1opi</link>
      <guid>https://dev.to/mypatric69/trace-part-2-what-happens-before-you-even-write-a-single-line-of-code-1opi</guid>
      <description>&lt;h1&gt;
  
  
  TRACE Part 2: What Happens Before You Even Write a Single Line of Code
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Continued from: &lt;a href="https://dev.to/mypatric69/i-built-a-token-cost-tracker-for-claude-code-and-it-changed-the-way-i-think-about-ai-development-111i"&gt;I built a token cost tracker for Claude Code – and it changed the way I think about AI development&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  An innocent start
&lt;/h2&gt;

&lt;p&gt;Imagine you open a new project, launch Claude Code, and type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Hey, I'm just starting a new project. I want to set up a REST API using Express and TypeScript. Can you help me with that?”&lt;br&gt;
Harmless. A normal start. You expect a short answer, maybe a directory structure, a few commands.&lt;br&gt;
What you don’t see: Before Claude even says a word, the model has already consumed thousands of tokens. Not from your question. Not from the answer. But simply from the initial startup.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The numbers no one talks about
&lt;/h2&gt;

&lt;p&gt;I ran the same initial prompt on two different models and compared the results.&lt;br&gt;
&lt;strong&gt;Claude Sonnet 4.6:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System Prompt&lt;/td&gt;
&lt;td&gt;6,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System Tools&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Tools&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Files&lt;/td&gt;
&lt;td&gt;905&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;721&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messages&lt;/td&gt;
&lt;td&gt;2,300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~18,100&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.7:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System Prompt&lt;/td&gt;
&lt;td&gt;8,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System Tools&lt;/td&gt;
&lt;td&gt;11,500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP Tools&lt;/td&gt;
&lt;td&gt;251&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Files&lt;/td&gt;
&lt;td&gt;1,200&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skills&lt;/td&gt;
&lt;td&gt;721&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Messages&lt;/td&gt;
&lt;td&gt;3,600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~25,700&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;+42% more tokens&lt;/strong&gt; – for the exact same prompt, on a fresh project without a single line of code.&lt;br&gt;
This is not a mistake. This is the new tokenizer in Opus 4.7.&lt;/p&gt;
&lt;h2&gt;
  
  
  What a Tokenizer Difference Means in Practice
&lt;/h2&gt;

&lt;p&gt;Anthropic has introduced a new tokenizer with Claude Opus 4.7. The same text is split into more token units than with older models. That sounds technically abstract—but the effects are concrete:&lt;br&gt;
&lt;strong&gt;Every single turn in a session carries this base load.&lt;/strong&gt;&lt;br&gt;
For Turn 1, it’s 18k vs. 25k tokens. By Turn 50, the entire conversation history up to that point is loaded as input—and the tokenizer difference multiplies with every subsequent turn.&lt;br&gt;
A 100-turn session on Opus 4.7 costs more not just because Opus is more expensive per token. It costs more because it structurally generates more tokens—even if you do the same thing as on Sonnet 4.6.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why /context and /cost alone aren’t enough
&lt;/h2&gt;

&lt;p&gt;Claude Code has built-in visibility tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/context&lt;/code&gt; shows the current context window usage by category&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cost&lt;/code&gt; shows the current session’s consumption so far
Both are useful. But both are &lt;strong&gt;pull mechanisms&lt;/strong&gt;—you have to actively remember to call them. You already have to suspect that something is wrong.
What happens in reality? You’re working. You’re writing code. You’re in the flow. You don’t call &lt;code&gt;/cost&lt;/code&gt;.
TRACE is a &lt;strong&gt;push mechanism&lt;/strong&gt;. It comes to you.
During the live session, TRACE displays in real time:&lt;/li&gt;
&lt;li&gt;Current token count and cost&lt;/li&gt;
&lt;li&gt;Session health – green, yellow, or red&lt;/li&gt;
&lt;li&gt;Context window utilization as a visual bar&lt;/li&gt;
&lt;li&gt;A notification when you cross the warning threshold – before it gets expensive
And new since the first version: a &lt;strong&gt;Cost Efficiency Section&lt;/strong&gt; that shows you what the same session would have cost on the baseline model.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  The Daily Tokenizer Check
&lt;/h2&gt;

&lt;p&gt;TRACE runs an automatic check early every morning: It sends a fixed reference text to both models—the current one and the baseline model—and calculates the ratio.&lt;br&gt;
The result for Sonnet 4.6 vs. Opus 4.7:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"current_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseline_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-7"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"current_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;407&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseline_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;521&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ratio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.7812&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Sonnet 4.6 uses &lt;strong&gt;22% fewer tokens&lt;/strong&gt; than Opus 4.7 for the same text. The dashboard shows you this difference—not as an abstract number, but as a concrete cost savings for your last week.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s been added since the first article
&lt;/h2&gt;

&lt;p&gt;TRACE has evolved since the first article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Activity Section&lt;/strong&gt; with GitHub-style heatmap – streaks, active days, average cost per session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency Section&lt;/strong&gt; – Comparison of current model vs. baseline, potential savings per week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Configurable monthly budget&lt;/strong&gt; – Adjustable directly in the dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tokenizer Ratio Check&lt;/strong&gt; – Daily automatic comparison via Anthropic count_tokens API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VS Code Simple Browser Integration&lt;/strong&gt; – Dashboard directly in VS Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Installer&lt;/strong&gt; – &lt;code&gt;bash install.sh&lt;/code&gt; automatically detects whether it’s a first-time installation, adding a project, or an update&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  An Insight into Visibility
&lt;/h2&gt;

&lt;p&gt;The difference between Sonnet 4.6 and Opus 4.7 isn’t just a price difference. It’s a structural difference in how the model processes text. This difference is invisible—until you start measuring.&lt;br&gt;
That is the real message of TRACE: Not “here are your costs,” but “here’s what happened while you were working—and you wouldn’t see it otherwise.”&lt;br&gt;
&lt;code&gt;/context&lt;/code&gt; and &lt;code&gt;/cost&lt;/code&gt; show you the moment. TRACE shows you the progression.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;TRACE is open source under the MIT license: &lt;a href="https://github.com/MyPatric69/trace" rel="noopener noreferrer"&gt;github.com/MyPatric69/trace&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;If you find it useful: a star helps. If you have questions or find a bug: an issue helps more.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>monitoring</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I built a token cost tracker for Claude Code - and it changed the way I think about AI development</title>
      <dc:creator>Patric</dc:creator>
      <pubDate>Thu, 23 Apr 2026 16:20:47 +0000</pubDate>
      <link>https://dev.to/mypatric69/i-built-a-token-cost-tracker-for-claude-code-and-it-changed-the-way-i-think-about-ai-development-111i</link>
      <guid>https://dev.to/mypatric69/i-built-a-token-cost-tracker-for-claude-code-and-it-changed-the-way-i-think-about-ai-development-111i</guid>
      <description>&lt;h2&gt;
  
  
  It started with a number I couldn’t explain
&lt;/h2&gt;

&lt;p&gt;A few months ago, I looked at my Anthropic billing dashboard and saw a number that didn’t add up. I knew roughly how much I’d worked with Claude Code. The bill said otherwise.&lt;br&gt;
I couldn’t tell which sessions were expensive. I couldn’t tell why. I only had a total figure at the end of the month—and no way to link it to specific work.&lt;br&gt;
That bothered me. Not because of the money—but because of the lack of visibility. And as someone who’s worked in IT for 30 years, I know: What you can’t measure, you can’t improve.&lt;br&gt;
So I built TRACE.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I thought I was building—and what I actually built
&lt;/h2&gt;

&lt;p&gt;The original idea was simple: log token consumption per session, calculate costs, store in SQLite. An afternoon project.&lt;br&gt;
What I actually built—over weeks of iteration with Claude Code itself as a development partner—is a local MCP server that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tracks token costs per project and session in real time&lt;/li&gt;
&lt;li&gt;Detects session health (green / yellow / red) based on configurable thresholds&lt;/li&gt;
&lt;li&gt;Sends native macOS notifications when expensive limits are exceeded – no dashboard required&lt;/li&gt;
&lt;li&gt;Automatically keeps &lt;code&gt;AI_CONTEXT.md&lt;/code&gt; up to date via Git hooks&lt;/li&gt;
&lt;li&gt;Generates enriched handoff prompts when starting a new thread&lt;/li&gt;
&lt;li&gt;Displays everything in a web dashboard with a 7-day history, provider badges, and live multi-session tracking&lt;/li&gt;
&lt;li&gt;Runs directly in the VS Code Simple Browser Panel – no external browser required&lt;/li&gt;
&lt;li&gt;Optionally starts automatically at Mac login via LaunchAgent
MIT license – free to use, forkable, no restrictions. With solid test coverage ensuring it works in real-world scenarios.
The gap between an “afternoon project” and a production-ready open-source tool says something about how AI-powered development really works when you seriously commit to it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Nobody Talks About: Context Rot
&lt;/h2&gt;

&lt;p&gt;Here’s what I’ve learned—and what I’ve never heard in any podcast, video, or presentation:&lt;br&gt;
Token costs do not scale linearly with the work. They scale with the session length.&lt;br&gt;
Every turn in a Claude Code session appends to the conversation history. By turn 50, every new message carries the weight of the previous 49 turns as input tokens. By turn 100, you’re burning tens of thousands of tokens per message—just to maintain context, even if the actual task is small.&lt;br&gt;
Anthropic calls this “context rot” in its documentation: As the number of tokens grows, accuracy and recall degrade. The model doesn’t lose tokens—it loses attention. It has to spread its focus across an ever-growing history.&lt;br&gt;
&lt;strong&gt;The practical consequence:&lt;/strong&gt; A 300-turn session doesn’t just cost more than 30 sessions of 10 turns each. It costs significantly more—and the quality of the later turns is measurably worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Push Instead of Pull: Why Built-in Commands Aren’t Enough
&lt;/h2&gt;

&lt;p&gt;Claude Code has built-in visibility tools. &lt;code&gt;/cost&lt;/code&gt; shows the current session usage. &lt;code&gt;/context&lt;/code&gt; visualizes context window utilization. &lt;code&gt;/stats&lt;/code&gt; provides usage statistics.&lt;br&gt;
These are pull mechanisms. You have to remember to use them. You have to be curious enough to check. You have to already suspect that something is wrong.&lt;br&gt;
TRACE is a push mechanism. It comes to you.&lt;br&gt;
When a session exceeds 80,000 tokens, a notification—a subtle &lt;code&gt;Tink&lt;/code&gt; sound and a macOS alert—appears before it gets really expensive. At 150,000 tokens, a more distinct &lt;code&gt;Funk&lt;/code&gt; signals that it’s time for a new thread. You don’t have to remember to check. TRACE checks for you.&lt;br&gt;
The dashboard also runs directly in the VS Code Simple Browser—for those who want to keep everything in one interface.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;&lt;code&gt;/cost&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;/context&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;TRACE&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Current session usage&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual context window&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cache tokens separately&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Historical sessions&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost per project&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly budget &amp;amp; alerts&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session health indicator&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push notifications&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Handoff prompt&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI_CONTEXT.md auto-update&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VS Code Simple Browser&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autostart on Login&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  There is no universal token limit
&lt;/h3&gt;

&lt;p&gt;A question that quickly arises during use: at what point is a session “too long”? The answer is: it depends.&lt;br&gt;
80,000 tokens as a warning threshold is a good starting point—but a developer working on a small script will reach that after just a few turns, while someone refactoring a complex backend is still far from it.&lt;br&gt;
TRACE therefore makes the thresholds configurable—directly in the dashboard or in the configuration file. Three recommendations for guidance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Usage Type&lt;/th&gt;
&lt;th&gt;Warning&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Economical – cost-conscious&lt;/td&gt;
&lt;td&gt;50,000&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard – recommended&lt;/td&gt;
&lt;td&gt;80,000&lt;/td&gt;
&lt;td&gt;150,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intensive – large projects&lt;/td&gt;
&lt;td&gt;120,000&lt;/td&gt;
&lt;td&gt;200,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These numbers are not set in stone. They are a starting point that should be adjusted once you understand your own usage patterns.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: TRACE tracks Claude Code sessions—i.e., work done in the terminal. For claude.ai web/desktop chats, a separate Anthropic Usage API is required, which is only available for Team and Enterprise accounts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What a real day of usage looks like
&lt;/h2&gt;

&lt;p&gt;I recently installed TRACE for a colleague. After one day of use, the logs told a clear story.&lt;br&gt;
A single session showed over 37 million cache read tokens and nearly $20 in costs. The session had accumulated hundreds of turns—including many invisible tool-use turns that Claude Code generates internally for file reads, Bash commands, and code analysis. Each of these counts as a turn in the transcript, even if the user typed only 30 prompts.&lt;br&gt;
The insight isn’t that something went wrong. The insight is: Without visibility, you don’t even ask the question in the first place.&lt;br&gt;
TRACE makes the invisible visible. That’s all it does—but it turns out to be a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  The /resume Trap
&lt;/h2&gt;

&lt;p&gt;One more thing worth knowing: Claude Code’s &lt;code&gt;/resume&lt;/code&gt; command is more expensive than it looks.&lt;br&gt;
When resuming a session, Claude Code sends the entire conversation history as input tokens—including invisible “Thinking Block Signatures” from Extended Thinking turns. These are base64-encoded, unreadable, and cannot be truncated. But they are sent to the API with every resume and billed accordingly.&lt;br&gt;
&lt;a href="https://github.com/anthropics/claude-code/issues/42260" rel="noopener noreferrer"&gt;Anthropic’s own GitHub issues document cases where resuming a 24-hour session cost ~156,000 input tokens—before the user had even typed a single character.&lt;/a&gt;&lt;br&gt;
Anthropic’s own documentation is clear: Do not rely on session resumption. Save results as state and pass them to a fresh session.&lt;br&gt;
TRACE’s &lt;code&gt;new_session()&lt;/code&gt; tool does exactly that—it generates a compressed handoff prompt from &lt;code&gt;AI_CONTEXT.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, current Git changes, and the open task in the backlog. A new thread gets everything it needs in a few hundred tokens instead of hundreds of thousands.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bugs That Taught Me the Most
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The $0.0000 Sessions.&lt;/strong&gt; TRACE logged sessions but showed zero cost. The model string in the transcripts was &lt;code&gt;claude-sonnet-4-5-20250929&lt;/code&gt;. The config key was &lt;code&gt;claude-sonnet-4-5&lt;/code&gt;. Exact matching failed. A line of prefix matching fixed it—but finding it really required digging into the logs.&lt;br&gt;
&lt;strong&gt;The health indicator that disappeared after a refresh.&lt;/strong&gt; Session Health turned red at 150,000 tokens—and then disappeared when the browser was refreshed because the state was stored in a JavaScript variable. The fix was to move the health state to a &lt;code&gt;last_health.json&lt;/code&gt; file on the server side. Obvious in hindsight. Not obvious until a real user encountered it.&lt;br&gt;
&lt;strong&gt;The &lt;code&gt;AI_CONTEXT.md&lt;/code&gt; file that was out of date.&lt;/strong&gt; The doc synthesizer only updated on &lt;code&gt;feat:&lt;/code&gt; and &lt;code&gt;fix:&lt;/code&gt; commits. A day with &lt;code&gt;chore:&lt;/code&gt; and &lt;code&gt;docs:&lt;/code&gt; commits left the context file four days out of date. Removing the commit type filter entirely fixed it.&lt;br&gt;
Every bug taught us something about the gap between “works in theory” and “works when a real person uses it all day.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Where TRACE stands today
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Solid test coverage, all tests passing&lt;/li&gt;
&lt;li&gt;Tracks input, output, cache creation, and cache read tokens separately with correct pricing&lt;/li&gt;
&lt;li&gt;Session Health: green under 80,000 tokens, yellow up to 150,000, red above that – all configurable&lt;/li&gt;
&lt;li&gt;Native notifications on macOS, Windows, and Linux&lt;/li&gt;
&lt;li&gt;Web dashboard with 7-day history, dark/light/auto theme, provider badges, live multi-session tracking&lt;/li&gt;
&lt;li&gt;Runs in the VS Code Simple Browser – no external browser required&lt;/li&gt;
&lt;li&gt;Optional autostart via macOS LaunchAgent&lt;/li&gt;
&lt;li&gt;Enhanced Handoff prompts with current phase, open tasks, files to read&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I would tell someone just starting out
&lt;/h2&gt;

&lt;p&gt;The most useful thing I built wasn’t the dashboard. It was the discipline of treating &lt;code&gt;AI_CONTEXT.md&lt;/code&gt; as a first-class artifact. Every project gets one. Every commit potentially updates it. Every new session starts by reading it.&lt;br&gt;
An AI assistant with full context is a different tool than one that starts from scratch. TRACE exists to make the former the standard—not the exception.&lt;br&gt;
The second most useful thing: Measure before you optimize. I had strong intuitions about which sessions were expensive. The data contradicted most of them. The expensive sessions weren’t the ones with the difficult problems. They were the ones that ran for a long time.&lt;br&gt;
And the third: Don’t wait for permission to build something useful.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;TRACE is open source under the MIT license: &lt;a href="https://github.com/MyPatric69/trace" rel="noopener noreferrer"&gt;github.com/MyPatric69/trace&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;If you find it useful: a star helps. If you find a bug: an issue helps more. If you build something with it: I’d love to hear about it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>monitoring</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
