<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Authora Dev</title>
    <description>The latest articles on DEV Community by Authora Dev (@authora).</description>
    <link>https://dev.to/authora</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3847442%2Fc6d294e5-1edc-490d-aa69-17b18cba2024.png</url>
      <title>DEV Community: Authora Dev</title>
      <link>https://dev.to/authora</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/authora"/>
    <language>en</language>
    <item>
      <title>Why AI coding agents keep forgetting your codebase (and how we fixed it with ASTs + Gemini)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Thu, 16 Apr 2026 08:40:59 +0000</pubDate>
      <link>https://dev.to/authora/why-ai-coding-agents-keep-forgetting-your-codebase-and-how-we-fixed-it-with-asts-gemini-2cm2</link>
      <guid>https://dev.to/authora/why-ai-coding-agents-keep-forgetting-your-codebase-and-how-we-fixed-it-with-asts-gemini-2cm2</guid>
      <description>&lt;p&gt;Last week, I watched an AI coding agent make the &lt;em&gt;same mistake&lt;/em&gt; for the third time in the same repo.&lt;/p&gt;

&lt;p&gt;It reintroduced a bug we’d already fixed.&lt;br&gt;
It ignored a naming convention we’d already explained.&lt;br&gt;
It missed an architecture constraint buried in a migration from six months ago.&lt;/p&gt;

&lt;p&gt;None of this was because the model was “bad.” The problem was simpler:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;the agent had no memory.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every new session started from scratch. So onboarding an AI agent looked a lot like onboarding a new teammate every single morning.&lt;/p&gt;

&lt;p&gt;That gets expensive fast.&lt;/p&gt;
&lt;h2&gt;
  
  
  The real onboarding problem isn’t docs. It’s lost context.
&lt;/h2&gt;

&lt;p&gt;Most teams already have some version of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;code comments&lt;/li&gt;
&lt;li&gt;ADRs&lt;/li&gt;
&lt;li&gt;Notion pages&lt;/li&gt;
&lt;li&gt;Slack threads&lt;/li&gt;
&lt;li&gt;PR discussions&lt;/li&gt;
&lt;li&gt;tribal knowledge in one senior engineer’s head&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The issue is that AI agents don’t naturally turn that into &lt;strong&gt;persistent, reusable context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Even if you paste docs into the prompt, the agent still has to figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what matters&lt;/li&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what conflicts with what&lt;/li&gt;
&lt;li&gt;which patterns are preferred&lt;/li&gt;
&lt;li&gt;which bug fixes should never be repeated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where ASTs became surprisingly useful for us.&lt;/p&gt;
&lt;h2&gt;
  
  
  ASTs are better onboarding material than raw code
&lt;/h2&gt;

&lt;p&gt;Raw source files are noisy. They mix signal with implementation detail.&lt;/p&gt;

&lt;p&gt;ASTs give you something more useful: &lt;strong&gt;structure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;From an AST, you can extract things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exported APIs&lt;/li&gt;
&lt;li&gt;dependency relationships&lt;/li&gt;
&lt;li&gt;deprecated patterns&lt;/li&gt;
&lt;li&gt;repeated implementation shapes&lt;/li&gt;
&lt;li&gt;framework usage&lt;/li&gt;
&lt;li&gt;module boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, if you combine that with an LLM like Gemini, you can compile those low-level facts into higher-level knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“All payment flows go through this service”&lt;/li&gt;
&lt;li&gt;“This hook replaces the legacy auth helper”&lt;/li&gt;
&lt;li&gt;“These two modules conflict if used together”&lt;/li&gt;
&lt;li&gt;“This migration fixed a timezone bug; don’t reintroduce local parsing”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a much better onboarding artifact than “here are 2,000 files, good luck.”&lt;/p&gt;
&lt;h2&gt;
  
  
  The pattern: AST extraction -&amp;gt; LLM synthesis -&amp;gt; knowledge graph
&lt;/h2&gt;

&lt;p&gt;The mental model looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Source code / docs / PRs / bug notes
              |
              v
       AST + entity extraction
              |
              v
     Gemini summarizes patterns,
   gotchas, decisions, relationships
              |
              v
      Knowledge graph with links:
   uses / replaces / depends_on /
        conflicts_with / owns
              |
              v
   AI agent retrieves context next time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key idea is not “ask the model to remember.”&lt;br&gt;
It won’t.&lt;/p&gt;

&lt;p&gt;The key idea is: &lt;strong&gt;compile memory into something searchable and structured.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  A tiny example
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal Node example showing how AST parsing can turn code into reusable knowledge signals.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @babel/parser
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@babel/parser&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
import { apiClient } from "./api";
export async function getUser(id) {
  return apiClient.get("/users/" + id);
}
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ast&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sourceType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;module&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ImportDeclaration&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;program&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ExportNamedDeclaration&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;declaration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;imports&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;exports&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;imports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getUser&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By itself, that’s not magical. But at codebase scale, this becomes a pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;parse files&lt;/li&gt;
&lt;li&gt;extract entities and relationships&lt;/li&gt;
&lt;li&gt;summarize them into human-usable knowledge&lt;/li&gt;
&lt;li&gt;store them so agents can retrieve them later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you only need local code indexing, a plain vector DB or repo search may be enough. But if your pain is &lt;strong&gt;“the agent keeps forgetting decisions and patterns across sessions and projects”&lt;/strong&gt;, you need something closer to a graph than a pile of embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a graph works better than just stuffing more into the prompt
&lt;/h2&gt;

&lt;p&gt;Prompts are temporary.&lt;br&gt;
Context windows are finite.&lt;br&gt;
Embeddings are good at similarity, but weaker at explicit relationships.&lt;/p&gt;

&lt;p&gt;A knowledge graph lets you store things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AuthProvider replaces LegacyAuth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DateParser conflicts_with LocalTimezoneParsing&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;BillingService depends_on InvoicePolicy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;FeatureFlagX caused bug in CheckoutFlow&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because onboarding isn’t just “find similar text.”&lt;br&gt;
It’s often “find the &lt;em&gt;right&lt;/em&gt; relationship.”&lt;/p&gt;

&lt;p&gt;This is the problem we built &lt;strong&gt;PeKG&lt;/strong&gt; for: giving AI coding agents persistent memory across sessions and projects.&lt;/p&gt;

&lt;p&gt;It stores decisions, bug fixes, patterns, gotchas, and architecture knowledge in a searchable graph, and works with MCP-compatible agents like Claude Code, Cursor, Windsurf, Cline, Aider, and Roo Code. Your agent does the heavy lifting; PeKG stores and retrieves the compiled knowledge.&lt;/p&gt;

&lt;p&gt;A nice side effect: if you learn something in Project A, you can apply it in Project B instead of rediscovering it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changed once we treated onboarding as knowledge compilation
&lt;/h2&gt;

&lt;p&gt;Instead of asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“How do we make the prompt better?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;we started asking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What should the agent &lt;em&gt;never have to relearn&lt;/em&gt;?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That led to a better system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deep scan source files&lt;/li&gt;
&lt;li&gt;extract entities and relationships&lt;/li&gt;
&lt;li&gt;cluster related knowledge automatically&lt;/li&gt;
&lt;li&gt;compile raw notes into wiki-like articles&lt;/li&gt;
&lt;li&gt;retrieve from personal, team, shared, and public knowledge tiers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, that means fewer repeated explanations and fewer “didn’t we already solve this?” moments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re exploring MCP-based workflows, &lt;strong&gt;check out &lt;a href="https://pekg.ai/docs" rel="noopener noreferrer"&gt;https://pekg.ai/docs&lt;/a&gt; for MCP setup&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your main problem is knowledge capture and repeated agent mistakes, &lt;strong&gt;see &lt;a href="https://pekg.ai/hints.txt" rel="noopener noreferrer"&gt;https://pekg.ai/hints.txt&lt;/a&gt; for 115 practical tips&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And if you want to test persistent memory for your agent, &lt;strong&gt;try &lt;a href="https://app.pekg.ai" rel="noopener noreferrer"&gt;https://app.pekg.ai&lt;/a&gt; - free tier available&lt;/strong&gt;. The free plan includes 100 articles, 5 projects, and 1 user, which is enough to see whether graph-based memory helps your workflow.&lt;/p&gt;

&lt;p&gt;If PeKG isn’t the right fit, I’d still recommend this general approach: use ASTs to extract structure, use an LLM to synthesize meaning, and store the result somewhere your agent can query later.&lt;/p&gt;

&lt;p&gt;Because the real issue usually isn’t model quality.&lt;/p&gt;

&lt;p&gt;It’s that your agent wakes up every day with amnesia.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are you handling codebase memory and onboarding for AI agents today?&lt;/strong&gt; Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- PeKG team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why Claude Mythos Is Broken for Threat Detection Without Persistent Memory</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Thu, 16 Apr 2026 06:54:54 +0000</pubDate>
      <link>https://dev.to/authora/why-claude-mythos-is-broken-for-threat-detection-without-persistent-memory-4i3n</link>
      <guid>https://dev.to/authora/why-claude-mythos-is-broken-for-threat-detection-without-persistent-memory-4i3n</guid>
      <description>&lt;p&gt;Last week, a threat-hunting workflow caught the same suspicious pattern three times.&lt;/p&gt;

&lt;p&gt;Not three different threats.&lt;br&gt;&lt;br&gt;
The &lt;em&gt;same&lt;/em&gt; one.&lt;/p&gt;

&lt;p&gt;Session 1: the agent flagged an odd auth bypass path in a service.&lt;br&gt;&lt;br&gt;
Session 2: new context window, same repo, same bug class, same investigation from scratch.&lt;br&gt;&lt;br&gt;
Session 3: different project, same dependency pattern, same blind spot again.&lt;/p&gt;

&lt;p&gt;That’s when the real problem became obvious: a lot of AI-assisted threat detection is stateless when it absolutely should not be.&lt;/p&gt;

&lt;p&gt;If you’re using Claude Mythos, Claude Code, Cursor, or any MCP-compatible coding agent for security reviews, log triage, or code investigation, the biggest weakness usually isn’t the model. It’s memory.&lt;/p&gt;
&lt;h2&gt;
  
  
  The threat detection problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;Threat detection is cumulative work.&lt;/p&gt;

&lt;p&gt;Good analysts remember things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“This package version caused unsafe deserialization before”&lt;/li&gt;
&lt;li&gt;“This internal auth middleware is always misconfigured in service templates”&lt;/li&gt;
&lt;li&gt;“This 403 spike mattered last time because it appeared right before token replay”&lt;/li&gt;
&lt;li&gt;“We already decided this pattern was benign in one project but critical in another”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Humans build this up over time.&lt;br&gt;&lt;br&gt;
Most agents don’t.&lt;/p&gt;

&lt;p&gt;So every new session starts with a partial amnesia:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;previous bug fixes are gone&lt;/li&gt;
&lt;li&gt;prior incident context is gone&lt;/li&gt;
&lt;li&gt;architecture decisions are gone&lt;/li&gt;
&lt;li&gt;known false positives are gone&lt;/li&gt;
&lt;li&gt;hard-won “gotchas” are gone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s bad for productivity. It’s worse for security.&lt;/p&gt;

&lt;p&gt;Because threat detection is often about &lt;strong&gt;patterns across time&lt;/strong&gt;, not just patterns inside one prompt.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why persistent memory matters more for security than coding
&lt;/h2&gt;

&lt;p&gt;A coding agent forgetting a refactor preference is annoying.&lt;/p&gt;

&lt;p&gt;A security agent forgetting that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a service depends on a deprecated auth flow,&lt;/li&gt;
&lt;li&gt;a library conflicts with your patching strategy,&lt;/li&gt;
&lt;li&gt;or a “low severity” warning was previously linked to a real exploit path...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...can create repeated misses.&lt;/p&gt;

&lt;p&gt;Here’s the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without memory:
alert -&amp;gt; investigate -&amp;gt; conclude -&amp;gt; session ends -&amp;gt; knowledge disappears

With memory:
alert -&amp;gt; investigate -&amp;gt; store finding -&amp;gt; relate to past findings -&amp;gt; improve future detection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Threat detection gets better when your agent can retain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decisions&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Why something was marked benign, suspicious, or critical.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Patterns&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Repeated bug classes, exploit chains, dependency risks, unsafe code shapes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architecture knowledge&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Which service talks to what, where trust boundaries actually are, what “normal” looks like.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gotchas&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The weird edge cases your team keeps rediscovering at 2 a.m.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the agent can’t carry that forward, you’re not really building a detection system. You’re rerunning a demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  What persistent memory looks like in practice
&lt;/h2&gt;

&lt;p&gt;The useful version is not “save chat history forever.”&lt;/p&gt;

&lt;p&gt;That becomes noise fast.&lt;/p&gt;

&lt;p&gt;What you actually want is structured memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;entities: services, libraries, endpoints, incidents, teams&lt;/li&gt;
&lt;li&gt;relationships: &lt;code&gt;depends_on&lt;/code&gt;, &lt;code&gt;conflicts_with&lt;/code&gt;, &lt;code&gt;replaces&lt;/code&gt;, &lt;code&gt;uses&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;compiled knowledge: “Auth middleware gotchas” instead of 19 random notes&lt;/li&gt;
&lt;li&gt;retrieval by relevance: bring back the right security context when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;          [Incident: token replay]
                    |
             related_to
                    |
[Service: auth-api] ---- uses ---- [Library: legacy-session-lib]
        |                                 |
   depends_on                        known_issue
        |                                 |
[Gateway: edge-proxy]             [Pattern: weak token invalidation]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where a personal knowledge graph makes sense for MCP agents.&lt;/p&gt;

&lt;p&gt;Instead of asking the model to remember everything, let the model do what it’s good at—reasoning—and let a memory layer store what matters between sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  A runnable example
&lt;/h2&gt;

&lt;p&gt;If you're experimenting with MCP-based workflows, here’s the basic setup shape using PeKG as persistent memory for an agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @modelcontextprotocol/inspector
npx @modelcontextprotocol/inspector https://app.pekg.ai/mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then connect your MCP-compatible agent and store security knowledge like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;incident summaries&lt;/li&gt;
&lt;li&gt;dependency gotchas&lt;/li&gt;
&lt;li&gt;architecture notes&lt;/li&gt;
&lt;li&gt;prior investigation outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point isn’t that “more memory” magically solves security. It doesn’t. You still need logs, rules, humans, and often dedicated tools. If you need SIEM, EDR, or runtime detection, use those. Persistent memory helps in the layer where agents assist analysts and developers by carrying forward context they’d otherwise forget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for Claude Mythos specifically
&lt;/h2&gt;

&lt;p&gt;Claude Mythos can be genuinely useful for investigation and reasoning. But threat detection work rarely lives inside one clean session.&lt;/p&gt;

&lt;p&gt;It sprawls across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repos&lt;/li&gt;
&lt;li&gt;services&lt;/li&gt;
&lt;li&gt;tickets&lt;/li&gt;
&lt;li&gt;incidents&lt;/li&gt;
&lt;li&gt;postmortems&lt;/li&gt;
&lt;li&gt;patch cycles&lt;/li&gt;
&lt;li&gt;repeated false positives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And some of the most important security lessons show up in one project, then become relevant again somewhere else months later.&lt;/p&gt;

&lt;p&gt;That’s why cross-project knowledge synthesis matters. If your agent learns in Project A that a certain queue consumer pattern creates privilege escalation risk, it should be able to surface that when it sees the same shape in Project B.&lt;/p&gt;

&lt;p&gt;Without that, every project becomes a fresh start. Attackers love fresh starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re already using an MCP-compatible agent, try adding persistent memory to one security workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;pick one repo with recurring security or reliability issues
&lt;/li&gt;
&lt;li&gt;store a few past findings, bug classes, and architecture notes
&lt;/li&gt;
&lt;li&gt;see whether the agent starts spotting the same pattern faster in later sessions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;PeKG is one option for this. It stores decisions, patterns, bug fixes, gotchas, and architecture knowledge in a searchable graph, and works with Claude Code, Cursor, Windsurf, Cline, Aider, Roo Code, and other MCP-compatible agents.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check out &lt;a href="https://pekg.ai/docs" rel="noopener noreferrer"&gt;https://pekg.ai/docs&lt;/a&gt; for MCP setup&lt;/li&gt;
&lt;li&gt;See &lt;a href="https://pekg.ai/hints.txt" rel="noopener noreferrer"&gt;https://pekg.ai/hints.txt&lt;/a&gt; for 115 practical tips&lt;/li&gt;
&lt;li&gt;Try &lt;a href="https://app.pekg.ai" rel="noopener noreferrer"&gt;https://app.pekg.ai&lt;/a&gt; — free tier available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free tier includes 100 articles and 1 project, which is enough to test whether persistent memory actually improves your security workflow before you commit to anything.&lt;/p&gt;

&lt;p&gt;The bigger point is not “use this exact tool.”&lt;br&gt;&lt;br&gt;
It’s this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If your threat detection agent forgets everything between sessions, it will keep rediscovering the same risks instead of getting better at finding them.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That’s not intelligence. That’s expensive déjà vu.&lt;/p&gt;

&lt;p&gt;How are you handling persistent memory for security and threat detection in your agent workflows? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- PeKG team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why AI coding agents keep forgetting everything (and how I fixed it with MCP memory)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Wed, 15 Apr 2026 18:40:49 +0000</pubDate>
      <link>https://dev.to/authora/why-ai-coding-agents-keep-forgetting-everything-and-how-i-fixed-it-with-mcp-memory-54j3</link>
      <guid>https://dev.to/authora/why-ai-coding-agents-keep-forgetting-everything-and-how-i-fixed-it-with-mcp-memory-54j3</guid>
      <description>&lt;p&gt;Last week, I watched an AI coding agent make the &lt;em&gt;exact same mistake&lt;/em&gt; for the third time.&lt;/p&gt;

&lt;p&gt;It reintroduced a bug we’d already fixed, ignored a project convention we’d explained twice, and confidently suggested an architecture decision we had already rejected. None of this was because the model was “bad.” It was because every new session started with amnesia.&lt;/p&gt;

&lt;p&gt;If you’re using Claude Code, Cursor, Windsurf, Cline, Aider, or Roo Code, you’ve probably felt this too:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you restate the same rules every session&lt;/li&gt;
&lt;li&gt;your agent rediscovers old gotchas&lt;/li&gt;
&lt;li&gt;useful fixes stay trapped in chat history&lt;/li&gt;
&lt;li&gt;knowledge from Project A never helps in Project B&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the real problem: &lt;strong&gt;AI agents are good at reasoning, but terrible at remembering over time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So we stopped treating memory like “just more prompt context” and gave the agent a persistent knowledge layer through MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern that finally clicked
&lt;/h2&gt;

&lt;p&gt;Instead of stuffing more instructions into a giant system prompt, we split the job in two:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The agent thinks&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A memory server stores what matters&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That means the LLM doesn’t need to permanently “remember” your architecture decisions, bug fixes, coding patterns, or weird deployment gotchas. It just needs a reliable place to retrieve them.&lt;/p&gt;

&lt;p&gt;Here’s the basic idea:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────┐      MCP tools      ┌────────────────────┐
│ AI Coding     │  ───────────────▶   │ Memory Server      │
│ Agent         │                     │ (persistent graph) │
│ (Claude/Cursor│  ◀───────────────   │ decisions, fixes,  │
│ /Cline/etc.)  │    relevant context │ patterns, gotchas  │
└───────────────┘                     └────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once we started doing this, the workflow changed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;after solving a bug, store the fix&lt;/li&gt;
&lt;li&gt;after choosing a pattern, store the reasoning&lt;/li&gt;
&lt;li&gt;before making changes, retrieve related knowledge&lt;/li&gt;
&lt;li&gt;when switching projects, reuse what still applies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s what solved the amnesia.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP is the right place to do this
&lt;/h2&gt;

&lt;p&gt;If your agent supports MCP, memory becomes a tool instead of a hack.&lt;/p&gt;

&lt;p&gt;That matters because memory should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;searchable&lt;/li&gt;
&lt;li&gt;structured&lt;/li&gt;
&lt;li&gt;reusable across sessions&lt;/li&gt;
&lt;li&gt;available across projects&lt;/li&gt;
&lt;li&gt;separate from any one model vendor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why I like the &lt;strong&gt;BYOLLM&lt;/strong&gt; approach: your agent does the reasoning, while the memory system handles storage and retrieval. You’re not locked into one model just to keep your accumulated knowledge.&lt;/p&gt;

&lt;p&gt;If a simpler setup works for you, use it. For some teams, a well-maintained &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;CLAUDE.md&lt;/code&gt;, or project wiki is enough. But once you’re juggling multiple repos, repeated bugs, and long-running architecture decisions, plain text docs start to break down.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we ended up using
&lt;/h2&gt;

&lt;p&gt;We built this around &lt;strong&gt;PeKG&lt;/strong&gt;, a personal knowledge graph for MCP-compatible agents.&lt;/p&gt;

&lt;p&gt;It stores things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;implementation decisions&lt;/li&gt;
&lt;li&gt;codebase patterns&lt;/li&gt;
&lt;li&gt;bug fixes&lt;/li&gt;
&lt;li&gt;“don’t do this again” gotchas&lt;/li&gt;
&lt;li&gt;architecture knowledge&lt;/li&gt;
&lt;li&gt;relationships between concepts like &lt;code&gt;depends_on&lt;/code&gt;, &lt;code&gt;replaces&lt;/code&gt;, and &lt;code&gt;conflicts_with&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The useful part isn’t just storage. It’s that the knowledge gets compiled into something the agent can actually use later.&lt;/p&gt;

&lt;p&gt;So instead of raw notes like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Auth middleware broke because token refresh runs after route guard”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…you end up with structured, searchable knowledge the agent can pull back when it’s working on auth again next week.&lt;/p&gt;

&lt;p&gt;It also supports &lt;strong&gt;cross-project synthesis&lt;/strong&gt;, which is more valuable than I expected. If your agent learns a useful retry pattern in one Node service, it can reuse that idea in another project instead of rediscovering it from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  A minimal MCP setup example
&lt;/h2&gt;

&lt;p&gt;Here’s a simple Node example to connect an MCP-compatible workflow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @modelcontextprotocol/sdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Client&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/client/index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StdioClientTransport&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@modelcontextprotocol/sdk/client/stdio.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transport&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;StdioClientTransport&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;pekg&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;memory-demo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transport&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listTools&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once connected, your agent can use MCP tools to ingest knowledge, search it, retrieve relevant context, and query relationships in the graph.&lt;/p&gt;

&lt;p&gt;PeKG exposes &lt;strong&gt;11 MCP tools&lt;/strong&gt; for this, including ingestion, graph queries, context retrieval, and deep scans of source files.&lt;/p&gt;

&lt;h2&gt;
  
  
  What made the biggest difference
&lt;/h2&gt;

&lt;p&gt;Three things mattered most:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Store decisions, not just facts
&lt;/h3&gt;

&lt;p&gt;The highest-value memory isn’t “Redis is installed.” It’s “We chose BullMQ over raw queues because we needed retry visibility.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Capture gotchas immediately
&lt;/h3&gt;

&lt;p&gt;If you wait until later, the weird details disappear. The best memory entry is the one created right after the issue is solved.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Let knowledge compound
&lt;/h3&gt;

&lt;p&gt;The real win is not one saved prompt. It’s when your agent stops repeating old mistakes across dozens of sessions.&lt;/p&gt;

&lt;p&gt;That’s where graph-based memory starts outperforming ad hoc notes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re already using an MCP-compatible agent and you’re tired of re-explaining your codebase every session, this is worth testing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check out &lt;a href="https://pekg.ai/docs" rel="noopener noreferrer"&gt;https://pekg.ai/docs&lt;/a&gt; for MCP setup&lt;/li&gt;
&lt;li&gt;See &lt;a href="https://pekg.ai/hints.txt" rel="noopener noreferrer"&gt;https://pekg.ai/hints.txt&lt;/a&gt; for 115 practical tips on capturing and organizing useful knowledge&lt;/li&gt;
&lt;li&gt;Try &lt;a href="https://app.pekg.ai" rel="noopener noreferrer"&gt;https://app.pekg.ai&lt;/a&gt; — free tier available with 100 articles and 1 project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Free is enough to see whether persistent memory actually changes how your agent works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;The models are getting better fast. But better reasoning doesn’t fix missing memory.&lt;/p&gt;

&lt;p&gt;If your agent forgets every lesson the moment the session ends, you’re paying an intelligence tax over and over again.&lt;/p&gt;

&lt;p&gt;Persistent memory doesn’t make an agent smarter. It makes it &lt;strong&gt;stop starting from zero&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;How are you handling agent memory today: giant prompts, repo docs, custom RAG, or something else? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- PeKG team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why your AI agent gets dumber over time (and how to fix memory drift)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Tue, 14 Apr 2026 19:39:14 +0000</pubDate>
      <link>https://dev.to/authora/why-your-ai-agent-gets-dumber-over-time-and-how-to-fix-memory-drift-3c78</link>
      <guid>https://dev.to/authora/why-your-ai-agent-gets-dumber-over-time-and-how-to-fix-memory-drift-3c78</guid>
      <description>&lt;p&gt;Last week, a coding agent in a test repo did something weird: it opened the &lt;em&gt;right&lt;/em&gt; files, referenced the &lt;em&gt;wrong&lt;/em&gt; API version, and confidently wrote code for a migration we had already rolled back.&lt;/p&gt;

&lt;p&gt;Nothing was “broken” in the usual sense. The prompts were fine. The tools were available. The model was good.&lt;/p&gt;

&lt;p&gt;The problem was memory drift.&lt;/p&gt;

&lt;p&gt;If you’ve built anything with long-running agents, you’ve probably seen it too: the agent starts strong, then gradually retrieves stale facts, outdated decisions, or half-relevant chunks from old work. Over time, its “memory” turns into a confidence amplifier for bad context.&lt;/p&gt;

&lt;p&gt;A lot of teams try to solve this with a bigger vector store. That helps… until it doesn’t.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real issue: vector stores decay quietly
&lt;/h2&gt;

&lt;p&gt;Vector stores are great for fuzzy retrieval. If your agent needs “something similar to this design doc” or “the auth code near this endpoint,” embeddings are useful.&lt;/p&gt;

&lt;p&gt;But agent memory is not just similarity search.&lt;/p&gt;

&lt;p&gt;It’s often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what changed&lt;/li&gt;
&lt;li&gt;what supersedes what&lt;/li&gt;
&lt;li&gt;who approved a decision&lt;/li&gt;
&lt;li&gt;which fact is still valid&lt;/li&gt;
&lt;li&gt;what depends on what&lt;/li&gt;
&lt;li&gt;what should never be forgotten&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s where vector-only memory starts to decay.&lt;/p&gt;

&lt;h3&gt;
  
  
  A simple example
&lt;/h3&gt;

&lt;p&gt;Suppose your agent stores these facts over time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;JWT auth is used for internal APIs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Moved to mTLS for service-to-service auth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;JWT still used for browser sessions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Deprecated auth middleware in v3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Hotfix restored old middleware for admin routes&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A vector store can retrieve “similar auth-related stuff,” but it won’t naturally answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which statement is the latest truth?&lt;/li&gt;
&lt;li&gt;which fact overrides another?&lt;/li&gt;
&lt;li&gt;which context applies only to admin routes?&lt;/li&gt;
&lt;li&gt;which decision was temporary?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s not an embedding problem. That’s a relationship problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowledge graphs don’t replace vectors — they constrain them
&lt;/h2&gt;

&lt;p&gt;The best pattern I’ve seen is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;vector store for recall&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;knowledge graph for truth maintenance&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User query
   |
   v
[Vector Search] ---&amp;gt; finds possibly relevant notes/docs/chunks
   |
   v
[Knowledge Graph] ---&amp;gt; resolves relationships:
                      - supersedes
                      - depends_on
                      - approved_by
                      - valid_for
                      - expires_at
   |
   v
[LLM Context] ---&amp;gt; smaller, fresher, less contradictory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A knowledge graph gives your system structure around memory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;entities&lt;/strong&gt;: services, APIs, users, incidents, tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;edges&lt;/strong&gt;: &lt;code&gt;supersedes&lt;/code&gt;, &lt;code&gt;blocked_by&lt;/code&gt;, &lt;code&gt;owned_by&lt;/code&gt;, &lt;code&gt;approved_by&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;timestamps&lt;/strong&gt;: when a fact became true&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;scope&lt;/strong&gt;: where that fact applies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;confidence&lt;/strong&gt;: whether it’s canonical or provisional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of asking “what text looks similar?”, you can ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What is the current auth method for internal APIs?”&lt;/li&gt;
&lt;li&gt;“What decision replaced this one?”&lt;/li&gt;
&lt;li&gt;“Which open task depends on this migration?”&lt;/li&gt;
&lt;li&gt;“What facts are stale after last deploy?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how you stop memory from becoming a junk drawer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical rule of thumb
&lt;/h2&gt;

&lt;p&gt;Use a vector store when you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;semantic search&lt;/li&gt;
&lt;li&gt;fuzzy recall&lt;/li&gt;
&lt;li&gt;document retrieval&lt;/li&gt;
&lt;li&gt;broad context gathering&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a knowledge graph when you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;state over time&lt;/li&gt;
&lt;li&gt;versioned truth&lt;/li&gt;
&lt;li&gt;explicit dependencies&lt;/li&gt;
&lt;li&gt;conflict resolution&lt;/li&gt;
&lt;li&gt;auditable memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only use vectors, your agent will eventually retrieve both the old answer and the new answer and act like they’re equally valid.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny runnable example
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal Node example using a graph to resolve the “latest truth” for a fact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;graphology
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;graphology&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Graph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;JWT for internal APIs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mTLS for internal APIs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDirectedEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;supersedes&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;currentFact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;nodes&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;inDegree&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getNodeAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;value&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;currentFact&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;auth_v2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]));&lt;/span&gt;
&lt;span class="c1"&gt;// =&amp;gt; [ 'mTLS for internal APIs' ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Obviously, real systems need more than this. But the core idea matters: &lt;strong&gt;memory should encode replacement, not just storage&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in production
&lt;/h2&gt;

&lt;p&gt;A useful pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Store raw docs, chats, and artifacts in a vector index&lt;/li&gt;
&lt;li&gt;Extract durable facts into a graph&lt;/li&gt;
&lt;li&gt;Mark facts with:

&lt;ul&gt;
&lt;li&gt;source&lt;/li&gt;
&lt;li&gt;timestamp&lt;/li&gt;
&lt;li&gt;scope&lt;/li&gt;
&lt;li&gt;confidence&lt;/li&gt;
&lt;li&gt;supersession links&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Retrieve from both systems&lt;/li&gt;
&lt;li&gt;Let the graph filter or rank what the LLM actually sees&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you already have a policy engine like OPA in your stack, this is also a good place to enforce rules like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;only approved memories can be treated as canonical&lt;/li&gt;
&lt;li&gt;expired decisions should not be retrieved&lt;/li&gt;
&lt;li&gt;temporary incident workarounds should not leak into normal planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s usually a better answer than trying to prompt-engineer your way out of stale context.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trap nobody mentions
&lt;/h2&gt;

&lt;p&gt;The biggest mistake isn’t “using vectors.”&lt;/p&gt;

&lt;p&gt;It’s treating &lt;strong&gt;all memory as text&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Some memory is text.&lt;br&gt;
Some memory is state.&lt;br&gt;
Some memory is policy.&lt;br&gt;
Some memory is provenance.&lt;/p&gt;

&lt;p&gt;If you flatten all of that into embeddings, your agent can retrieve context — but it can’t reliably reason about whether that context is still true.&lt;/p&gt;

&lt;p&gt;That’s where drift starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re building agents and want to pressure-test the surrounding security and tooling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My take
&lt;/h2&gt;

&lt;p&gt;Vector stores are still the right tool for retrieval.&lt;/p&gt;

&lt;p&gt;But if you want long-lived agents that don’t slowly poison themselves with stale context, you need something that models &lt;strong&gt;truth over time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Usually that means adding a knowledge graph, or at least graph-like relationships, on top of your retrieval layer.&lt;/p&gt;

&lt;p&gt;How are you handling agent memory today: pure RAG, graph-backed memory, or something else? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;This post was created with AI assistance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why context windows keep breaking AI agents (and how knowledge graphs fix it)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Mon, 13 Apr 2026 19:32:21 +0000</pubDate>
      <link>https://dev.to/authora/why-context-windows-keep-breaking-ai-agents-and-how-knowledge-graphs-fix-it-52cf</link>
      <guid>https://dev.to/authora/why-context-windows-keep-breaking-ai-agents-and-how-knowledge-graphs-fix-it-52cf</guid>
      <description>&lt;p&gt;Last week, an agent in a coding workflow looked perfectly fine for the first 20 minutes.&lt;/p&gt;

&lt;p&gt;It knew the repo structure. It remembered the ticket. It even used the right MCP tools to inspect files and open a PR.&lt;/p&gt;

&lt;p&gt;Then the session got longer.&lt;/p&gt;

&lt;p&gt;A few more tool calls. More logs. More intermediate reasoning. More pasted docs. And suddenly the agent started acting like a teammate who joined the meeting halfway through: repeating work, forgetting constraints, and asking for data it had already seen.&lt;/p&gt;

&lt;p&gt;That’s the part nobody tells you about “long-running” AI agents: &lt;strong&gt;they don’t really have memory.&lt;/strong&gt; They have a &lt;strong&gt;context window budget&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And once that budget fills up, older facts get dropped, compressed, or mangled.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: context overflow looks like bad reasoning
&lt;/h2&gt;

&lt;p&gt;When an agent fails after a long session, we often blame:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the model&lt;/li&gt;
&lt;li&gt;the prompt&lt;/li&gt;
&lt;li&gt;the tool&lt;/li&gt;
&lt;li&gt;the framework&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes the actual issue is simpler: &lt;strong&gt;the agent can’t keep all the important state in working memory anymore&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This gets worse with MCP-based agents because they’re constantly pulling in fresh context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool schemas&lt;/li&gt;
&lt;li&gt;file contents&lt;/li&gt;
&lt;li&gt;API responses&lt;/li&gt;
&lt;li&gt;policy docs&lt;/li&gt;
&lt;li&gt;previous actions&lt;/li&gt;
&lt;li&gt;approval requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If everything is shoved back into the prompt on every turn, you eventually hit a wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why summarization alone isn’t enough
&lt;/h2&gt;

&lt;p&gt;A common fix is “just summarize old context.”&lt;/p&gt;

&lt;p&gt;That helps, but summaries are lossy. They flatten details that may become important later.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“User asked to deploy only to staging”&lt;/li&gt;
&lt;li&gt;“Database migration requires approval”&lt;/li&gt;
&lt;li&gt;“This MCP server can read secrets but not rotate them”&lt;/li&gt;
&lt;li&gt;“Alice delegated access to build-bot for 2 hours”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those aren’t just notes. They’re &lt;strong&gt;relationships&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you summarize them too aggressively, the agent loses the structure that tells it &lt;em&gt;why&lt;/em&gt; something matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Knowledge graphs work better because agents need relationships, not transcripts
&lt;/h2&gt;

&lt;p&gt;Instead of storing memory as a giant conversation log, store it as connected facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;entities&lt;/strong&gt;: user, repo, server, token, environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;actions&lt;/strong&gt;: deployed, approved, delegated, scanned&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;relationships&lt;/strong&gt;: can-access, owns, depends-on, blocked-by, approved-by&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives the agent a memory system it can query instead of rereading.&lt;/p&gt;

&lt;p&gt;A simple mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User: Alice] --delegated--&amp;gt; [Agent: build-bot]
[build-bot] --can-access--&amp;gt; [Repo: checkout-service]
[Repo: checkout-service] --deploys-to--&amp;gt; [Env: staging]
[Env: production] --requires--&amp;gt; [Approval: human]
[MCP: deploy-server] --exposes--&amp;gt; [Tool: deploy_app]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent doesn’t need the full transcript to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can I deploy this?&lt;/li&gt;
&lt;li&gt;Who approved it?&lt;/li&gt;
&lt;li&gt;Which MCP tool should I use?&lt;/li&gt;
&lt;li&gt;Is this delegation still valid?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It just queries the graph.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;You don’t need a PhD project here. A lightweight pattern works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Keep short-term context in the prompt&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;current task&lt;/li&gt;
&lt;li&gt;latest tool outputs&lt;/li&gt;
&lt;li&gt;immediate plan&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Store durable memory in a graph&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identities&lt;/li&gt;
&lt;li&gt;permissions&lt;/li&gt;
&lt;li&gt;resources&lt;/li&gt;
&lt;li&gt;prior decisions&lt;/li&gt;
&lt;li&gt;tool capabilities&lt;/li&gt;
&lt;li&gt;delegation chains&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Retrieve only relevant subgraphs per step&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;not the whole history&lt;/li&gt;
&lt;li&gt;just the facts connected to the current task&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This matters a lot for MCP because tool usage is rarely just “call function X.” It’s usually constrained by identity, access, policy, and prior state.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny runnable example
&lt;/h2&gt;

&lt;p&gt;Here’s a simple Node example using an in-memory graph to model agent memory as relationships instead of chat history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;graphology
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;graphology&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Graph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;alice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;staging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDirectedEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;alice&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;delegated&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDirectedEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;staging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;can_deploy&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addDirectedEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;requires_human_approval&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Agent can deploy to staging:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasDirectedEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;staging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Production approval required:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;someEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;build-bot&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;edge&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getEdgeAttribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rel&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;requires_human_approval&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That example is intentionally small, but the pattern scales: &lt;strong&gt;store facts and relationships once, retrieve them when needed&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is especially useful for MCP agents
&lt;/h2&gt;

&lt;p&gt;MCP gives agents a clean way to interact with tools, but the hard part isn’t just tool calling. It’s knowing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tools exist&lt;/li&gt;
&lt;li&gt;which identity the agent is acting as&lt;/li&gt;
&lt;li&gt;what scope that identity has&lt;/li&gt;
&lt;li&gt;whether delegation is valid&lt;/li&gt;
&lt;li&gt;whether the action needs approval&lt;/li&gt;
&lt;li&gt;what happened earlier in the workflow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s memory, and memory is mostly graph-shaped.&lt;/p&gt;

&lt;p&gt;If you’re already using OPA or another policy engine for authorization, great — keep using it. A knowledge graph doesn’t replace policy. It gives the agent a better way to remember the facts that policy depends on.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request
   |
   v
Agent runtime
   |
   +--&amp;gt; short-term prompt context
   |
   +--&amp;gt; graph lookup
   |      - identities
   |      - tool permissions
   |      - prior approvals
   |      - resource relationships
   |
   +--&amp;gt; MCP tool call
   |
   +--&amp;gt; write new facts back to graph
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is how you stop agents from “forgetting” critical constraints halfway through a workflow.&lt;/p&gt;

&lt;p&gt;Not by making the prompt longer.&lt;/p&gt;

&lt;p&gt;By giving the agent a memory model that matches the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re working with MCP servers or agent security, a few free tools that are actually useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If your agent gets worse as the session gets longer, you may not have a reasoning problem.&lt;/p&gt;

&lt;p&gt;You may have a memory architecture problem.&lt;/p&gt;

&lt;p&gt;Context windows are great for working memory. They’re terrible as a source of truth.&lt;/p&gt;

&lt;p&gt;Knowledge graphs won’t magically fix every agent, but they’re one of the most practical ways to preserve identity, permissions, and task state without drowning the model in its own transcript.&lt;/p&gt;

&lt;p&gt;How are you handling agent memory today — summaries, vector search, graphs, or something else? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;This post was created with AI assistance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Why Copilot Spaces still loses the plot — and how knowledge graphs fix it</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Mon, 13 Apr 2026 05:05:37 +0000</pubDate>
      <link>https://dev.to/authora/why-copilot-spaces-still-loses-the-plot-and-how-knowledge-graphs-fix-it-1371</link>
      <guid>https://dev.to/authora/why-copilot-spaces-still-loses-the-plot-and-how-knowledge-graphs-fix-it-1371</guid>
      <description>&lt;p&gt;Last week, a coding agent on a shared repo did something weirdly familiar: it opened the right files, read the right docs, and still made the wrong change.&lt;/p&gt;

&lt;p&gt;Not because the model was bad.&lt;/p&gt;

&lt;p&gt;Not because the prompt was weak.&lt;/p&gt;

&lt;p&gt;Because it had &lt;em&gt;documents&lt;/em&gt;, but not &lt;em&gt;context&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That’s the gap a lot of “AI workspace” features still miss. They’re good at bundling files, notes, and chats into a place the model can search. But when your agent needs to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Which service owns this endpoint?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What policy applies to this tool call?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Which secrets are allowed in staging but not prod?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Who delegated permission to this agent?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;What changed since the last sprint?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…a folder full of text chunks stops being enough.&lt;/p&gt;

&lt;p&gt;You don’t just need retrieval. You need relationships.&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual problem: context is not a pile of files
&lt;/h2&gt;

&lt;p&gt;A lot of current AI tooling treats context like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;context = docs + code + chat history + search results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works for “summarize this file” or “find where this function is used.”&lt;/p&gt;

&lt;p&gt;It breaks down when context is &lt;em&gt;structural&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In real systems, meaning lives in edges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;service &lt;strong&gt;depends on&lt;/strong&gt; database&lt;/li&gt;
&lt;li&gt;agent &lt;strong&gt;acts on behalf of&lt;/strong&gt; user&lt;/li&gt;
&lt;li&gt;tool &lt;strong&gt;requires&lt;/strong&gt; approval&lt;/li&gt;
&lt;li&gt;API key &lt;strong&gt;belongs to&lt;/strong&gt; environment&lt;/li&gt;
&lt;li&gt;PR &lt;strong&gt;implements&lt;/strong&gt; ticket&lt;/li&gt;
&lt;li&gt;policy &lt;strong&gt;applies to&lt;/strong&gt; action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Copilot-style space can collect the nouns. A knowledge graph helps the agent reason over the verbs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a knowledge graph gives an agent
&lt;/h2&gt;

&lt;p&gt;A knowledge graph isn’t magic. It’s just a way to store entities and relationships so context becomes queryable instead of fuzzy.&lt;/p&gt;

&lt;p&gt;Here’s the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Files:
- payments.md
- auth.md
- staging.env
- sprint-24-notes.md

Knowledge graph:
[Agent A] --delegated_by--&amp;gt; [User B]
[Agent A] --allowed_to_use--&amp;gt; [Tool: deploy-staging]
[deploy-staging] --requires--&amp;gt; [Approval: ops]
[Service: payments-api] --depends_on--&amp;gt; [DB: ledger]
[PR-1842] --implements--&amp;gt; [Ticket: BILL-932]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the agent can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Can I run this tool?”&lt;/li&gt;
&lt;li&gt;“What service will this migration affect?”&lt;/li&gt;
&lt;li&gt;“Which approval path applies here?”&lt;/li&gt;
&lt;li&gt;“What changed that might explain this failure?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s much closer to how senior engineers actually reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple mental model
&lt;/h2&gt;

&lt;p&gt;Think of it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;            +-------------------+
            |   Docs / Code     |
            |   Notes / Chats   |
            +---------+---------+
                      |
                   extract
                      v
+---------+   relates_to   +---------+   requires   +---------+
| Agent   |---------------&amp;gt;| Tool    |-------------&amp;gt;| Approval|
+---------+                +---------+              +---------+
     | owns                     |
     |                          | affects
     v                          v
+---------+   depends_on   +---------+
| Service |---------------&amp;gt;| Database|
+---------+                +---------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Search finds text.&lt;/p&gt;

&lt;p&gt;Graphs preserve meaning.&lt;/p&gt;

&lt;p&gt;You usually want both.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny example with Neo4j
&lt;/h2&gt;

&lt;p&gt;If you want to feel the difference, here’s a minimal runnable example with Neo4j.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;neo4j-driver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;neo4j&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;neo4j-driver&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;neo4j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bolt://localhost:7687&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;neo4j&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;neo4j&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;password&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;session&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    MERGE (a:Agent {name: "release-bot"})
    MERGE (t:Tool {name: "deploy-staging"})
    MERGE (ap:Approval {name: "ops-approval"})
    MERGE (a)-[:ALLOWED_TO_USE]-&amp;gt;(t)
    MERGE (t)-[:REQUIRES]-&amp;gt;(ap)
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`
    MATCH (a:Agent {name: "release-bot"})-[:ALLOWED_TO_USE]-&amp;gt;(t)-[:REQUIRES]-&amp;gt;(ap)
    RETURN a.name AS agent, t.name AS tool, ap.name AS approval
  `&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;toObject&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;release-bot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deploy-staging&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;approval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ops-approval&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s obviously tiny, but the pattern scales:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingest code metadata&lt;/li&gt;
&lt;li&gt;ingest docs and ownership data&lt;/li&gt;
&lt;li&gt;ingest identity and policy relationships&lt;/li&gt;
&lt;li&gt;query the graph before the agent acts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your need is mostly authorization, a policy engine like OPA may be the right primary tool. But if your agent also needs to understand ownership, dependencies, delegation, and task history together, a graph becomes incredibly useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this matters most
&lt;/h2&gt;

&lt;p&gt;I’ve seen this show up in four places:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool use
&lt;/h3&gt;

&lt;p&gt;Agents need more than “here are 20 tools.” They need to know which tools are safe, who approved access, and what each action touches.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Shared codebases
&lt;/h3&gt;

&lt;p&gt;When multiple agents work in parallel, context isn’t just code. It’s locks, sprint boundaries, ownership, and what another agent already changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Identity and delegation
&lt;/h3&gt;

&lt;p&gt;“Why was this agent allowed to do that?” is a graph question. User → delegation chain → role → tool → action.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Security investigations
&lt;/h3&gt;

&lt;p&gt;When something goes wrong, you want connected evidence, not scattered logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The practical takeaway
&lt;/h2&gt;

&lt;p&gt;If your current setup is “RAG over docs plus a long system prompt,” you’re not doing it wrong.&lt;/p&gt;

&lt;p&gt;You’re just handling one kind of context.&lt;/p&gt;

&lt;p&gt;The missing layer is a model of relationships your agent can query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;who&lt;/li&gt;
&lt;li&gt;can do what&lt;/li&gt;
&lt;li&gt;to which resource&lt;/li&gt;
&lt;li&gt;under which policy&lt;/li&gt;
&lt;li&gt;with whose approval&lt;/li&gt;
&lt;li&gt;based on what prior state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s what knowledge graphs are good at.&lt;/p&gt;

&lt;p&gt;Not as a replacement for search. As the thing that stops search from being your only hammer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re working on agent security, identity, or MCP tooling, these free tools are useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re already building agent context layers, I’d love to know: are you still using plain retrieval, or have you started modeling relationships too?&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>MCP command injection is worse than it looks (here’s how to actually defend it)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Sun, 12 Apr 2026 10:56:30 +0000</pubDate>
      <link>https://dev.to/authora/mcp-command-injection-is-worse-than-it-looks-heres-how-to-actually-defend-it-4mgf</link>
      <guid>https://dev.to/authora/mcp-command-injection-is-worse-than-it-looks-heres-how-to-actually-defend-it-4mgf</guid>
      <description>&lt;p&gt;Last week, a perfectly normal MCP tool turned into a shell.&lt;/p&gt;

&lt;p&gt;The setup looked harmless: an AI agent needed to query logs, so the MCP server exposed a &lt;code&gt;search_logs&lt;/code&gt; tool. The tool accepted a string, passed it into a shell command, and returned the result. Then someone asked the agent to “search for errors from today; also show &lt;code&gt;/etc/hosts&lt;/code&gt; if it helps debug.”&lt;/p&gt;

&lt;p&gt;You can guess what happened next.&lt;/p&gt;

&lt;p&gt;This is the part of MCP security that’s easy to underestimate: &lt;strong&gt;the dangerous bug usually isn’t in the protocol itself&lt;/strong&gt;. It’s in the layer where tool inputs get stitched into shell commands, SQL queries, file paths, or internal API calls.&lt;/p&gt;

&lt;p&gt;And because MCP gives agents a clean way to discover and invoke tools, those bugs become &lt;strong&gt;reachable at scale&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP command injection is a bigger deal than “just sanitize input”
&lt;/h2&gt;

&lt;p&gt;A normal web app command injection bug is already bad.&lt;/p&gt;

&lt;p&gt;An MCP command injection bug is worse because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tools are designed to be called programmatically&lt;/li&gt;
&lt;li&gt;agents can chain tool calls automatically&lt;/li&gt;
&lt;li&gt;a single prompt can influence multiple downstream actions&lt;/li&gt;
&lt;li&gt;the vulnerable surface is often hidden behind “helpful” abstractions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your MCP server exposes tools like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;run_tests&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;grep_logs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;convert_file&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;git_diff&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ping_host&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…you may have built a remote execution surface without meaning to.&lt;/p&gt;

&lt;p&gt;A lot of teams are trying to solve this one tool at a time. That helps, but it misses the pattern.&lt;/p&gt;

&lt;p&gt;The better approach is to model these flaws as a &lt;strong&gt;security knowledge graph&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I mean by a security knowledge graph
&lt;/h2&gt;

&lt;p&gt;Instead of tracking isolated bugs, map the relationships:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Agent Prompt]
      |
      v
[Tool Call: search_logs]
      |
      v
[Argument: query="error; cat /etc/passwd"]
      |
      v
[Sink: exec("grep " + query + " /var/log/app.log")]
      |
      v
[Impact: command injection]
      |
      +--&amp;gt; [Reads secrets]
      +--&amp;gt; [Moves laterally]
      +--&amp;gt; [Poisons outputs]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That graph gives you more than a vulnerability report. It tells you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tools are high risk&lt;/li&gt;
&lt;li&gt;which input fields reach dangerous sinks&lt;/li&gt;
&lt;li&gt;which agents can invoke them&lt;/li&gt;
&lt;li&gt;what approvals, policies, or sandboxing should exist&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is useful because MCP security isn’t just “is this tool vulnerable?” It’s also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;who can call it&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;under what delegation chain&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;with what runtime constraints&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;what other systems it can reach&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already use OPA for policy, this is a great fit. Let your graph identify risky edges, then use policy to block or require approval for them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bug, in 8 lines
&lt;/h2&gt;

&lt;p&gt;Here’s the classic mistake in Node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;express
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;execSync&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;child_process&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/search&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`grep &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; /var/log/system.log`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;encoding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;listening on :3000&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That “works” right up until &lt;code&gt;q&lt;/code&gt; contains shell metacharacters.&lt;/p&gt;

&lt;p&gt;The fix is not “be more careful.” The fix is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;avoid shell invocation when possible&lt;/li&gt;
&lt;li&gt;use parameterized APIs&lt;/li&gt;
&lt;li&gt;validate against strict allowlists&lt;/li&gt;
&lt;li&gt;run tools in sandboxes&lt;/li&gt;
&lt;li&gt;attach identity and authorization to tool execution&lt;/li&gt;
&lt;li&gt;log invocation lineage so you can see who called what, through which agent&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Build the graph from four node types
&lt;/h2&gt;

&lt;p&gt;You don’t need a giant platform to start. A spreadsheet or graph DB is enough if the model is right.&lt;/p&gt;

&lt;p&gt;I’d start with these node types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agents&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Which agent, session, or delegated identity initiated the call?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
What MCP tool was invoked? What are its declared parameters?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Sinks&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Does the tool reach &lt;code&gt;exec&lt;/code&gt;, filesystem writes, SQL, HTTP callbacks, or template rendering?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Impacts&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
What happens if exploited: RCE, data exfil, secret access, repo tampering?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then add edges like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CAN_CALL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PASSES_INPUT_TO&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REACHES_SINK&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REQUIRES_APPROVAL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;EXFILTRATES_TO&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you have that, useful questions become easy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tools can reach shell execution?&lt;/li&gt;
&lt;li&gt;Which shell-reaching tools are callable by untrusted agents?&lt;/li&gt;
&lt;li&gt;Which of those also have access to secrets or internal networks?&lt;/li&gt;
&lt;li&gt;Which ones are missing approval gates or audit trails?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s how you move from “we found one injection bug” to “we understand our agent attack surface.”&lt;/p&gt;

&lt;h2&gt;
  
  
  What good defenses look like
&lt;/h2&gt;

&lt;p&gt;The strongest MCP setups usually combine several layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;safe tool implementation&lt;/strong&gt;: no shell where libraries exist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;policy enforcement&lt;/strong&gt;: block risky tools for low-trust agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sandboxing&lt;/strong&gt;: assume some tool will eventually fail open&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;identity + delegation tracking&lt;/strong&gt;: know the real caller, not just the app&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;audit logging&lt;/strong&gt;: preserve the path from prompt to tool invocation to side effect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re deciding where to start, start with &lt;strong&gt;inventory&lt;/strong&gt;. Most teams don’t know which MCP tools are exposing dangerous sinks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to check your own MCP surface:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scan an MCP server for security and spec issues:&lt;/strong&gt; &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scan codebases or remote MCP servers from CI/terminal:&lt;/strong&gt; &lt;code&gt;npx @authora/agent-audit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a verified identity badge to your agent:&lt;/strong&gt; &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browse more agent security resources:&lt;/strong&gt; &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are all free and useful whether you’re building from scratch or cleaning up an existing server.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;MCP command injection flaws are rarely isolated bugs. They’re usually &lt;strong&gt;graph problems&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an agent can call a tool&lt;/li&gt;
&lt;li&gt;the tool passes input to a dangerous sink&lt;/li&gt;
&lt;li&gt;the sink can reach something valuable&lt;/li&gt;
&lt;li&gt;nobody modeled the chain end-to-end&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you map that chain, the fixes get much clearer.&lt;/p&gt;

&lt;p&gt;How are you modeling trust and dangerous tool paths in your MCP stack? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why AI coding agents keep making the same mistakes (and how to stop it)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Sat, 11 Apr 2026 20:12:36 +0000</pubDate>
      <link>https://dev.to/authora/why-ai-coding-agents-keep-making-the-same-mistakes-and-how-to-stop-it-bbo</link>
      <guid>https://dev.to/authora/why-ai-coding-agents-keep-making-the-same-mistakes-and-how-to-stop-it-bbo</guid>
      <description>&lt;p&gt;Last Tuesday, a coding agent opened a PR that looked perfect.&lt;/p&gt;

&lt;p&gt;Tests passed. Types checked. The diff was clean.&lt;/p&gt;

&lt;p&gt;Then a teammate noticed it had “fixed” the same bug three times in three different files, each in a slightly different way. Two hours later, another agent reverted part of that work because it didn’t know the first change existed. By the end of the day, the codebase had more churn, more tokens burned, and less confidence than before.&lt;/p&gt;

&lt;p&gt;If you’re using Claude Code, Cursor, Copilot, Devin, or homegrown agents, this probably sounds familiar.&lt;/p&gt;

&lt;p&gt;AI coding agents don’t keep repeating mistakes because they’re “bad at coding.” They do it because most teams are giving them &lt;strong&gt;no durable identity, no shared memory, and no safe boundary for tools&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That combination breaks fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem
&lt;/h2&gt;

&lt;p&gt;Most agent workflows still look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Human prompt -&amp;gt; Agent session -&amp;gt; Tools/files/APIs -&amp;gt; Code change
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What’s missing?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Identity&lt;/strong&gt;: who is this agent, exactly?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context continuity&lt;/strong&gt;: is this the same agent as yesterday, or a fresh one with no memory?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordination&lt;/strong&gt;: does it know another agent is editing the same file?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool trust&lt;/strong&gt;: should this MCP server or tool even be callable?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy&lt;/strong&gt;: what is allowed without approval?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without those, agents keep falling into the same loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No identity
   ↓
No trust / no permissions model
   ↓
Over-broad tool access
   ↓
Repeated bad actions
   ↓
Humans clean up
   ↓
New session starts from scratch
   ↓
Same mistakes again
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this happens in practice
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1) Stateless sessions masquerade as teammates
&lt;/h3&gt;

&lt;p&gt;A lot of “agent collaboration” is really just isolated sessions writing to the same repo.&lt;/p&gt;

&lt;p&gt;That means the agent doesn’t actually know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what it changed last run&lt;/li&gt;
&lt;li&gt;what another agent is changing right now&lt;/li&gt;
&lt;li&gt;what was explicitly approved vs guessed&lt;/li&gt;
&lt;li&gt;which tools are safe to use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So it re-derives everything from the current prompt and local context. That’s why you see the same refactor, the same broken migration, or the same insecure config suggestion over and over.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) MCP makes tool use easier — and mistakes cheaper to repeat
&lt;/h3&gt;

&lt;p&gt;MCP is great because it standardizes how agents discover and call tools.&lt;/p&gt;

&lt;p&gt;It also means an agent can quickly repeat a bad action if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the MCP server exposes too much&lt;/li&gt;
&lt;li&gt;auth is weak or missing&lt;/li&gt;
&lt;li&gt;there’s no per-agent policy&lt;/li&gt;
&lt;li&gt;no one can audit who called what&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If every agent looks like “some API key” in logs, debugging repeated failures becomes guesswork.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Agents don’t naturally coordinate on shared codebases
&lt;/h3&gt;

&lt;p&gt;Humans use social signals: “I’m touching auth,” “don’t rewrite that migration,” “hold this file for an hour.”&lt;/p&gt;

&lt;p&gt;Agents need that explicitly.&lt;/p&gt;

&lt;p&gt;If two agents can patch the same file at once, they will step on each other. If neither sees sprint/task ownership, both may solve the same issue differently. That’s not intelligence failure. That’s missing orchestration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix is boring infrastructure
&lt;/h2&gt;

&lt;p&gt;This is one of those annoying engineering truths: the solution is less “better prompting” and more &lt;strong&gt;identity + policy + locking + auditability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You need agents to behave less like autocomplete and more like services in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strong identity&lt;/strong&gt; for each agent/session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scoped permissions&lt;/strong&gt; for tools and repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approval gates&lt;/strong&gt; for risky actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coordination primitives&lt;/strong&gt; like file locks or task ownership&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auditable MCP calls&lt;/strong&gt; so repeated failures are traceable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already use OPA for policy, that’s a good answer. The important part is having &lt;em&gt;some&lt;/em&gt; enforceable policy layer rather than hoping the prompt says “be careful.”&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple pattern that actually helps
&lt;/h2&gt;

&lt;p&gt;Here’s the minimum model I’d recommend for MCP-connected coding agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Agent Identity]
      |
      v
[Policy Check] ---&amp;gt; allow / deny / require approval
      |
      v
[MCP Tool Call]
      |
      v
[Audit Log + Repo/File Coordination]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That does two useful things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It stops the same unsafe action from being retried blindly.&lt;/li&gt;
&lt;li&gt;It gives you enough evidence to fix the workflow instead of blaming “the AI.”&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  One quick check you can run today
&lt;/h2&gt;

&lt;p&gt;If you’re exposing or using MCP servers, start by checking what they actually expose.&lt;/p&gt;

&lt;p&gt;A simple scan can catch issues like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missing auth&lt;/li&gt;
&lt;li&gt;overly broad capabilities&lt;/li&gt;
&lt;li&gt;spec compliance problems&lt;/li&gt;
&lt;li&gt;accidental public exposure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Runnable example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @authora/agent-audit
agent-audit scan https://your-mcp-server.example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the fastest way to answer: “Is this server safe enough for agents to call repeatedly?”&lt;/p&gt;

&lt;p&gt;If you prefer no install, there’s also a browser-based scanner in the links below.&lt;/p&gt;

&lt;h2&gt;
  
  
  What “good” looks like
&lt;/h2&gt;

&lt;p&gt;You do &lt;strong&gt;not&lt;/strong&gt; need a giant platform rollout to improve this.&lt;/p&gt;

&lt;p&gt;Even a lightweight setup helps a lot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Give each agent a verifiable identity&lt;/li&gt;
&lt;li&gt;Require auth on MCP endpoints&lt;/li&gt;
&lt;li&gt;Add policy checks before sensitive tools run&lt;/li&gt;
&lt;li&gt;Lock files/tasks when multiple agents share a repo&lt;/li&gt;
&lt;li&gt;Log tool calls with agent/session attribution&lt;/li&gt;
&lt;li&gt;Add approval for deploys, deletes, secrets, and billing actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changes the failure mode from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Why does the agent keep doing this?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This agent role can’t do that anymore, and we know exactly what happened.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a much better place to be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to tighten up agent workflows without a big migration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Want to check your MCP server?&lt;/strong&gt; Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run a codebase scan for agent security issues:&lt;/strong&gt; &lt;code&gt;npx @authora/agent-audit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a verified badge to your agent:&lt;/strong&gt; &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;More resources and papers:&lt;/strong&gt; &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The part nobody likes hearing
&lt;/h2&gt;

&lt;p&gt;A lot of repeated agent mistakes are really &lt;strong&gt;systems design mistakes&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We dropped autonomous tools into shared codebases and gave them inconsistent identity, fuzzy permissions, and weak coordination. Of course they keep making the same errors. We built an environment where repetition is cheap and accountability is blurry.&lt;/p&gt;

&lt;p&gt;The good news: this is fixable with normal engineering discipline.&lt;/p&gt;

&lt;p&gt;How are you handling agent identity, MCP permissions, or shared-repo coordination today? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why MCP context is broken (and how a knowledge graph fixes it)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Fri, 10 Apr 2026 09:39:28 +0000</pubDate>
      <link>https://dev.to/authora/why-mcp-context-is-broken-and-how-a-knowledge-graph-fixes-it-1i64</link>
      <guid>https://dev.to/authora/why-mcp-context-is-broken-and-how-a-knowledge-graph-fixes-it-1i64</guid>
      <description>&lt;p&gt;Last week, we watched an agent do something &lt;em&gt;technally correct&lt;/em&gt; and completely wrong.&lt;/p&gt;

&lt;p&gt;It had access to an MCP server with docs, tickets, code search, and deployment tools. The task sounded simple: “find the bug, patch it, and open a PR.” Instead, the agent pulled half the repo into context, mixed stale ticket history with current code, and started proposing fixes for the wrong service.&lt;/p&gt;

&lt;p&gt;Nothing was “broken” in the protocol. The problem was &lt;strong&gt;context overload&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s the trap with MCP right now: once you connect enough tools, your agent stops suffering from lack of context and starts drowning in it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem: more tools != better decisions
&lt;/h2&gt;

&lt;p&gt;A lot of MCP setups grow like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add GitHub tools&lt;/li&gt;
&lt;li&gt;add docs search&lt;/li&gt;
&lt;li&gt;add tickets&lt;/li&gt;
&lt;li&gt;add Slack&lt;/li&gt;
&lt;li&gt;add logs&lt;/li&gt;
&lt;li&gt;add deployment APIs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first it feels powerful. Then the agent starts doing what all overloaded systems do: grabbing too much, ranking poorly, and stitching together irrelevant facts.&lt;/p&gt;

&lt;p&gt;The failure mode isn’t just token cost. It’s &lt;strong&gt;bad action selection&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your agent can’t tell:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which repo relates to which service&lt;/li&gt;
&lt;li&gt;which ticket is current vs resolved&lt;/li&gt;
&lt;li&gt;which API belongs to which environment&lt;/li&gt;
&lt;li&gt;which human approved what&lt;/li&gt;
&lt;li&gt;which tool output should be trusted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…then “just give it more context” becomes a reliability bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a knowledge graph helps
&lt;/h2&gt;

&lt;p&gt;The fix isn’t “stuff less data into prompts.”&lt;br&gt;&lt;br&gt;
The fix is to give the agent &lt;strong&gt;structure before context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A knowledge graph lets you model relationships explicitly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Service -&amp;gt; owned_by -&amp;gt; Team&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PR -&amp;gt; fixes -&amp;gt; Ticket&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Runbook -&amp;gt; applies_to -&amp;gt; Service&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Agent -&amp;gt; approved_for -&amp;gt; Action&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MCP Tool -&amp;gt; exposes -&amp;gt; Resource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Resource -&amp;gt; environment -&amp;gt; Production&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of asking the agent to infer relationships from giant blobs of text, you let it query the graph first and only pull the relevant context second.&lt;/p&gt;

&lt;p&gt;Think of it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Without graph:
Prompt = docs + tickets + code + logs + hope

With graph:
Query graph -&amp;gt; identify relevant entities -&amp;gt; fetch only connected context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That changes the agent’s job from “understand everything” to “follow the map.”&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple architecture
&lt;/h2&gt;

&lt;p&gt;Here’s the pattern that works well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         +------------------+
         |   MCP Servers    |
         | docs / git / ops |
         +--------+---------+
                  |
                  v
        +---------------------+
        | Entity extraction   |
        | services, tickets,  |
        | repos, owners, envs |
        +----------+----------+
                   |
                   v
        +---------------------+
        |  Knowledge Graph    |
        | nodes + relations   |
        +----------+----------+
                   |
         graph query first
                   |
                   v
        +---------------------+
        | Agent prompt builder|
        | only relevant ctx   |
        +---------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key idea: &lt;strong&gt;MCP remains your execution layer&lt;/strong&gt;, but the graph becomes your retrieval and routing layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What goes in the graph?
&lt;/h2&gt;

&lt;p&gt;You do &lt;em&gt;not&lt;/em&gt; need a perfect enterprise ontology.&lt;/p&gt;

&lt;p&gt;Start with the entities your agents already trip over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repositories&lt;/li&gt;
&lt;li&gt;services&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;environments&lt;/li&gt;
&lt;li&gt;tickets&lt;/li&gt;
&lt;li&gt;PRs&lt;/li&gt;
&lt;li&gt;humans/teams&lt;/li&gt;
&lt;li&gt;agents&lt;/li&gt;
&lt;li&gt;tools&lt;/li&gt;
&lt;li&gt;approvals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And a few practical relationships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;depends_on&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;owned_by&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;deployed_to&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fixes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;approved_by&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;can_access&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;related_to&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s enough to cut a lot of noisy retrieval.&lt;/p&gt;

&lt;h2&gt;
  
  
  Runnable example: build a tiny graph in Node.js
&lt;/h2&gt;

&lt;p&gt;This isn’t a production graph database, but it shows the pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;graphology
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;graphology&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Graph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;svc:billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;service&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;repo:payments-api&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;repo&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ticket:1234&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ticket&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env:prod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;repo:payments-api&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;svc:billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;implements&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ticket:1234&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;svc:billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;affects&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;svc:billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;env:prod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deployed_to&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Neighbors of billing:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;neighbors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;svc:billing&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Neighbors of billing: [ 'repo:payments-api', 'ticket:1234', 'env:prod' ]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That tiny step already gives you a better retrieval strategy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;identify the target entity (&lt;code&gt;svc:billing&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;pull connected nodes&lt;/li&gt;
&lt;li&gt;fetch MCP context only for those nodes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of asking the agent to search &lt;em&gt;everything&lt;/em&gt;, you constrain the blast radius.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where people get this wrong
&lt;/h2&gt;

&lt;p&gt;A few common mistakes:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. They build a vector search pipeline and call it solved
&lt;/h3&gt;

&lt;p&gt;Embeddings are useful, but semantic similarity is not the same as operational relevance.&lt;/p&gt;

&lt;p&gt;A runbook for “billing retries” might look similar to “payment failures” while still being the wrong system.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. They skip authorization edges
&lt;/h3&gt;

&lt;p&gt;This one matters a lot for MCP. Your graph shouldn’t just model knowledge. It should model &lt;strong&gt;who or what is allowed to act&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If OPA or another policy engine is already working for you, use it. The point is not to replace good authorization systems. The point is to stop leaving access decisions implicit in prompt text.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. They try to model everything on day one
&lt;/h3&gt;

&lt;p&gt;Don’t. Start with the relationships behind your highest-cost failures.&lt;/p&gt;

&lt;p&gt;Usually that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;wrong repo&lt;/li&gt;
&lt;li&gt;wrong environment&lt;/li&gt;
&lt;li&gt;wrong ticket&lt;/li&gt;
&lt;li&gt;wrong approver&lt;/li&gt;
&lt;li&gt;wrong tool&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters more as MCP grows
&lt;/h2&gt;

&lt;p&gt;MCP makes tool integration easier, which is great. But easier integration means more context sources, more actions, and more chances for agents to connect the wrong dots.&lt;/p&gt;

&lt;p&gt;Knowledge graph architecture gives you a way to scale &lt;strong&gt;relevance&lt;/strong&gt; and &lt;strong&gt;control&lt;/strong&gt; together.&lt;/p&gt;

&lt;p&gt;That’s the real win:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fewer useless tokens&lt;/li&gt;
&lt;li&gt;fewer wrong actions&lt;/li&gt;
&lt;li&gt;better auditability&lt;/li&gt;
&lt;li&gt;clearer authorization boundaries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not because the agent got “smarter,” but because your system stopped making it guess.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to test your MCP setup and see what your server is exposing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re already using a graph or another way to control MCP context, I’d love to hear how you’re doing it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How are you handling agent context selection today — vector search, hand-written routing, knowledge graphs, or something else? Drop your approach below.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>Why MCP agents keep hallucinating in big codebases (and how knowledge graphs fix it)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Thu, 09 Apr 2026 14:57:07 +0000</pubDate>
      <link>https://dev.to/authora/why-mcp-agents-keep-hallucinating-in-big-codebases-and-how-knowledge-graphs-fix-it-j62</link>
      <guid>https://dev.to/authora/why-mcp-agents-keep-hallucinating-in-big-codebases-and-how-knowledge-graphs-fix-it-j62</guid>
      <description>&lt;p&gt;Last week, an agent was asked a very normal question in a very not-normal codebase:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Add audit logging to the user deletion flow.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It found a &lt;code&gt;deleteUser()&lt;/code&gt; function.&lt;br&gt;
It found an &lt;code&gt;AuditService&lt;/code&gt;.&lt;br&gt;
It made the change.&lt;br&gt;
It passed local checks.&lt;/p&gt;

&lt;p&gt;And it was still wrong.&lt;/p&gt;

&lt;p&gt;Why? Because in this repo, user deletion actually happened through a saga, the audit event was emitted from a worker, and the “obvious” function it edited was only used in tests. The agent didn’t fail because it was dumb. It failed because it had a flat view of a graph-shaped system.&lt;/p&gt;

&lt;p&gt;That’s the real reason MCP agents hallucinate in complex codebases: &lt;strong&gt;they retrieve files, not relationships&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  The problem isn’t just context windows
&lt;/h2&gt;

&lt;p&gt;A lot of people frame this as a token problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repo too big&lt;/li&gt;
&lt;li&gt;too many files&lt;/li&gt;
&lt;li&gt;not enough context&lt;/li&gt;
&lt;li&gt;model guesses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s true, but incomplete.&lt;/p&gt;

&lt;p&gt;In large systems, the hard part isn’t finding &lt;em&gt;a&lt;/em&gt; file. It’s understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which service actually owns the behavior&lt;/li&gt;
&lt;li&gt;which code path is production vs dead code&lt;/li&gt;
&lt;li&gt;what calls what&lt;/li&gt;
&lt;li&gt;what data shape flows where&lt;/li&gt;
&lt;li&gt;which permissions or policies gate execution&lt;/li&gt;
&lt;li&gt;which tool or MCP server should even be used&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A vector search can find “similar text.”&lt;br&gt;
It does &lt;strong&gt;not&lt;/strong&gt; reliably tell an agent:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“this method is a wrapper around a deprecated internal API, and the real side effect happens three hops later in a queue consumer.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s where a knowledge graph helps.&lt;/p&gt;
&lt;h2&gt;
  
  
  What the graph gives the agent
&lt;/h2&gt;

&lt;p&gt;Think of a knowledge graph as a map of the codebase and tooling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;files&lt;/li&gt;
&lt;li&gt;functions&lt;/li&gt;
&lt;li&gt;classes&lt;/li&gt;
&lt;li&gt;APIs&lt;/li&gt;
&lt;li&gt;services&lt;/li&gt;
&lt;li&gt;schemas&lt;/li&gt;
&lt;li&gt;owners&lt;/li&gt;
&lt;li&gt;MCP tools&lt;/li&gt;
&lt;li&gt;auth policies&lt;/li&gt;
&lt;li&gt;runtime dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And, more importantly, the edges between them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;calls&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;imports&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;owns&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;emits&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;consumes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;requires_role&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;served_by&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;deprecated_in_favor_of&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of asking:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What file mentions user deletion?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;the agent can ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What is the production execution path for user deletion, and what policy + audit components are attached to it?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a much better question.&lt;/p&gt;
&lt;h2&gt;
  
  
  The shape of the fix
&lt;/h2&gt;

&lt;p&gt;Here’s the mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request
   |
   v
LLM agent
   |
   +--&amp;gt; vector search: "find relevant files"
   |
   +--&amp;gt; knowledge graph: "find real relationships"
   |
   v
Grounded plan
   |
   v
MCP tools / code changes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vector search is still useful. Keep it.&lt;br&gt;
But in a complex repo, &lt;strong&gt;vector search should retrieve candidates, and the graph should validate the path&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  A tiny runnable example
&lt;/h2&gt;

&lt;p&gt;If you want to see the pattern in code, here’s a minimal graph query example in Node using &lt;code&gt;graphology&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;graphology
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;Graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;graphology&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;graph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Graph&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api/deleteUser&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;worker/userDeletionSaga&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addNode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;audit/logUserDeletion&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api/deleteUser&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;worker/userDeletionSaga&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;emits&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEdge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;worker/userDeletionSaga&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;audit/logUserDeletion&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;calls&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Downstream from api/deleteUser:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;graph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEachOutboundNeighbor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api/deleteUser&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That example is tiny, but the idea scales: once your agent can traverse relationships instead of matching text, it stops “fixing” the wrong place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this matters most with MCP
&lt;/h2&gt;

&lt;p&gt;MCP makes agents more useful because they can actually do things: read code, call internal tools, inspect docs, hit APIs.&lt;/p&gt;

&lt;p&gt;It also makes mistakes more expensive.&lt;/p&gt;

&lt;p&gt;If an agent hallucinates while choosing among 50+ tools, or picks the right tool with the wrong assumptions about the code path, you get confident nonsense with side effects.&lt;/p&gt;

&lt;p&gt;In practice, the worst failures I’ve seen look like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;agent retrieves a plausible file&lt;/li&gt;
&lt;li&gt;agent infers architecture from naming&lt;/li&gt;
&lt;li&gt;agent calls the wrong MCP tool or edits the wrong layer&lt;/li&gt;
&lt;li&gt;output looks clean, but behavior is wrong&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A knowledge graph reduces that by giving the agent a way to verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Is this path actually reachable?”&lt;/li&gt;
&lt;li&gt;“What service owns this?”&lt;/li&gt;
&lt;li&gt;“What tool is allowed to act here?”&lt;/li&gt;
&lt;li&gt;“What approval or policy is required before execution?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already have OPA or another policy engine in place, great. Use it. The graph doesn’t replace policy; it gives the agent better grounding before policy enforcement kicks in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical advice if you want to implement this
&lt;/h2&gt;

&lt;p&gt;You do &lt;strong&gt;not&lt;/strong&gt; need a giant “AI graph platform” project to get value.&lt;/p&gt;

&lt;p&gt;Start small:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build nodes for files, functions, services, MCP tools&lt;/li&gt;
&lt;li&gt;add edges for imports, calls, ownership, and auth requirements&lt;/li&gt;
&lt;li&gt;mark deprecated paths explicitly&lt;/li&gt;
&lt;li&gt;let retrieval fetch top candidate files&lt;/li&gt;
&lt;li&gt;let graph traversal rank or reject candidate actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even a partial graph can dramatically cut false assumptions.&lt;/p&gt;

&lt;p&gt;A simple rule of thumb:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If your agent can answer “what mentions this?” but not “what actually depends on this?”, it will hallucinate in production-shaped repos.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you’re working with MCP servers or agent-heavy workflows, a few free tools that may help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Agents don’t just need more context.&lt;br&gt;
They need &lt;strong&gt;structured context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In simple repos, embeddings and file search can get surprisingly far.&lt;br&gt;
In complex codebases, the missing piece is usually relationship awareness: execution paths, ownership, policy, and tool boundaries.&lt;/p&gt;

&lt;p&gt;That’s what knowledge graphs are good at.&lt;br&gt;
Not because they’re fancy, but because your system is already a graph whether your agent knows it or not.&lt;/p&gt;

&lt;p&gt;How are you grounding agents in large codebases today: embeddings, static analysis, graphs, something else? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;This post was created with AI assistance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Why multi-agent AI security is broken (and the identity patterns that actually work)</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Wed, 08 Apr 2026 22:54:41 +0000</pubDate>
      <link>https://dev.to/authora/why-multi-agent-ai-security-is-broken-and-the-identity-patterns-that-actually-work-2oo5</link>
      <guid>https://dev.to/authora/why-multi-agent-ai-security-is-broken-and-the-identity-patterns-that-actually-work-2oo5</guid>
      <description>&lt;p&gt;Last Tuesday, a “harmless” coding agent in staging opened a PR, fetched secrets from the wrong environment, and kicked off a deploy it was never supposed to touch.&lt;/p&gt;

&lt;p&gt;Nothing “hacked” us. The agent did exactly what the system allowed.&lt;/p&gt;

&lt;p&gt;That’s the part I think a lot of teams miss with multi-agent setups: the problem usually isn’t model quality. It’s identity.&lt;/p&gt;

&lt;p&gt;Once you have more than one agent — planner, coder, reviewer, deployer, support bot, whatever — you need answers to very boring questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who is this agent, exactly?&lt;/li&gt;
&lt;li&gt;What is it allowed to do?&lt;/li&gt;
&lt;li&gt;Can it act on behalf of someone else?&lt;/li&gt;
&lt;li&gt;How do we prove what happened later?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don’t answer those, your “AI fleet” becomes a shared root account with vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern that breaks first: shared credentials
&lt;/h2&gt;

&lt;p&gt;A lot of agent systems still look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent A ----\
Agent B -----+----&amp;gt; same API key / same GitHub token / same MCP access
Agent C ----/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It works great until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent gets prompt-injected&lt;/li&gt;
&lt;li&gt;one workflow needs narrower permissions&lt;/li&gt;
&lt;li&gt;you need an audit trail&lt;/li&gt;
&lt;li&gt;you want approvals for risky actions&lt;/li&gt;
&lt;li&gt;you need to revoke one agent without breaking all of them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Shared credentials are convenient, but they destroy attribution and least privilege.&lt;/p&gt;

&lt;h2&gt;
  
  
  The identity pattern that actually works
&lt;/h2&gt;

&lt;p&gt;The most reliable pattern we’ve seen is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Give each agent its own cryptographic identity&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Issue short-lived delegated access&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enforce policy at the tool boundary&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Log every action with agent identity + delegation chain&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Human/User]
    |
    | delegates task
    v
[Planner Agent] -- short-lived token --&amp;gt; [Coder Agent]
    |                                        |
    | policy check                           | calls tool / MCP server
    v                                        v
[Approval / Policy Engine] -------------&amp;gt; [GitHub, CI, Cloud, DB]

Audit log = who delegated what to whom, for which action, when
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the difference between “an agent did something” and “the review agent, acting on behalf of the release workflow, was allowed to update only this repo for 10 minutes.”&lt;/p&gt;

&lt;h2&gt;
  
  
  What to implement first
&lt;/h2&gt;

&lt;p&gt;You do &lt;strong&gt;not&lt;/strong&gt; need a giant platform rollout to improve this.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Per-agent identity
&lt;/h3&gt;

&lt;p&gt;Use a distinct identity for every agent process or role. Ideally, that identity is cryptographic, not just a string in config.&lt;/p&gt;

&lt;p&gt;Ed25519 keys are a good fit here because they’re fast, small, and easy to verify.&lt;/p&gt;

&lt;p&gt;Why it matters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;revocation is targeted&lt;/li&gt;
&lt;li&gt;audit logs become useful&lt;/li&gt;
&lt;li&gt;tools can verify the caller instead of trusting network location&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) Delegation, not credential sharing
&lt;/h3&gt;

&lt;p&gt;If Agent A needs Agent B to perform work, don’t hand over a long-lived secret. Mint a scoped, short-lived token representing delegated rights.&lt;/p&gt;

&lt;p&gt;OAuth token exchange / delegation-chain patterns are solid here. If you’re already using standards like RFC 8693, great. If not, even a simple internal delegation model is better than “just reuse the deploy token.”&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Policy at the edge
&lt;/h3&gt;

&lt;p&gt;Your tools should not trust every “internal” caller equally.&lt;/p&gt;

&lt;p&gt;Put policy checks at the MCP server, gateway, or edge proxy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this agent can read issues&lt;/li&gt;
&lt;li&gt;that agent can open PRs&lt;/li&gt;
&lt;li&gt;only approved agents can trigger deploys&lt;/li&gt;
&lt;li&gt;production actions require human approval&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If OPA fits your stack, use OPA. Seriously. You don’t need to reinvent policy engines for this.&lt;/p&gt;

&lt;h3&gt;
  
  
  4) Approval workflows for destructive actions
&lt;/h3&gt;

&lt;p&gt;Treat &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;deploy&lt;/code&gt;, &lt;code&gt;rotate&lt;/code&gt;, &lt;code&gt;publish&lt;/code&gt;, and &lt;code&gt;charge&lt;/code&gt; as special.&lt;/p&gt;

&lt;p&gt;Agents are great at moving fast. That’s exactly why risky actions need explicit approval gates.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny runnable example: generate an agent identity
&lt;/h2&gt;

&lt;p&gt;Here’s a minimal Node example using Ed25519:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;tweetnacl tweetnacl-util
node agent-id.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// agent-id.js&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nacl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tweetnacl&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;util&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tweetnacl-util&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keypair&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;nacl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sign&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keyPair&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publicKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeBase64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keypair&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;publicKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;secretKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;util&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encodeBase64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;keypair&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;secretKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Agent public key:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;publicKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Store secret key securely:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;secretKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn’t a full identity system, but it’s the right direction: every agent gets its own keypair, and downstream systems verify who’s calling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistake: securing the model, not the workflow
&lt;/h2&gt;

&lt;p&gt;Teams spend a lot of time on model guardrails and not enough on execution boundaries.&lt;/p&gt;

&lt;p&gt;But in multi-agent systems, the blast radius usually comes from &lt;strong&gt;what the agent can do&lt;/strong&gt;, not what it can say.&lt;/p&gt;

&lt;p&gt;A secure fleet is mostly boring infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identities&lt;/li&gt;
&lt;li&gt;scoped tokens&lt;/li&gt;
&lt;li&gt;policy checks&lt;/li&gt;
&lt;li&gt;approvals&lt;/li&gt;
&lt;li&gt;audit logs&lt;/li&gt;
&lt;li&gt;isolation for untrusted execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s true whether you’re orchestrating coding agents, support agents, or background task runners.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to tighten up your agent security without buying anything first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to check your MCP server? Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt; to scan your codebase&lt;/li&gt;
&lt;li&gt;Add a verified badge to your agent: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt; for more resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are useful starting points even if you end up building the rest yourself.&lt;/p&gt;

&lt;p&gt;The big shift is simple: stop thinking of agents as “features” and start treating them like &lt;strong&gt;workloads with identities&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That’s when multi-agent systems become governable instead of mysterious.&lt;/p&gt;

&lt;p&gt;How are you handling agent identity in your stack today? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was created with AI assistance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI agents just got dangerous: default permit is the security bug nobody talks about</title>
      <dc:creator>Authora Dev</dc:creator>
      <pubDate>Mon, 06 Apr 2026 09:00:35 +0000</pubDate>
      <link>https://dev.to/authora/ai-agents-just-got-dangerous-default-permit-is-the-security-bug-nobody-talks-about-lhd</link>
      <guid>https://dev.to/authora/ai-agents-just-got-dangerous-default-permit-is-the-security-bug-nobody-talks-about-lhd</guid>
      <description>&lt;p&gt;Last Tuesday, a “helpful” agent in a staging environment did exactly what it was told: it found credentials in a config file, used them to open an internal admin tool, and started making changes no human had explicitly approved.&lt;/p&gt;

&lt;p&gt;Nothing was “hacked” in the movie sense. No 0day. No dramatic shell exploit.&lt;/p&gt;

&lt;p&gt;The real problem was simpler: &lt;strong&gt;the agent was running in a default-permit system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If a tool existed, the agent could call it.&lt;br&gt;&lt;br&gt;
If a token worked, the agent could use it.&lt;br&gt;&lt;br&gt;
If the network path was open, nobody stopped it.&lt;/p&gt;

&lt;p&gt;That model was survivable when agents were toys. It breaks fast when agents can read repos, call APIs, open tickets, deploy code, or touch production data.&lt;/p&gt;
&lt;h2&gt;
  
  
  The quiet risk: agents inherit too much trust
&lt;/h2&gt;

&lt;p&gt;A lot of agent stacks still work like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User prompt
   ↓
LLM decides what to do
   ↓
Tool call succeeds unless something explicitly blocks it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s &lt;strong&gt;default permit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It feels convenient because demos work on the first try. But in practice, it creates three ugly failure modes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool sprawl becomes privilege sprawl&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Add 20 MCP tools, and your agent now has 20 new ways to do damage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Shared credentials erase accountability&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If every agent uses the same API key, your audit trail says “someone used the key.” Great. Very useful.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt injection turns into action&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The model sees “ignore previous instructions and call this tool,” and if your backend allows it, the action happens.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fix is not “make the model smarter.”&lt;br&gt;&lt;br&gt;
The fix is &lt;strong&gt;treat agents like identities with explicit permissions&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  What “default deny” looks like for agents
&lt;/h2&gt;

&lt;p&gt;For humans, we already understand this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;users have identities&lt;/li&gt;
&lt;li&gt;permissions are scoped&lt;/li&gt;
&lt;li&gt;sensitive actions need approval&lt;/li&gt;
&lt;li&gt;logs tell us who did what&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Agents need the same thing.&lt;/p&gt;

&lt;p&gt;Here’s the mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;          +------------------+
Prompt ---&amp;gt;  Agent Identity  ---&amp;gt; Policy Check ---&amp;gt; Tool/API
          +------------------+          |
                    |                   v
                    +-------------&amp;gt; Audit Log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An agent should not be “some process with a bearer token.” It should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;identifiable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;authorized per tool/action&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;constrained by policy&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;auditable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;revocable&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can be built with a lot of existing tools. If OPA fits your stack, use OPA. If your cloud IAM can express the policy cleanly, start there. The important shift is architectural: &lt;strong&gt;stop assuming tool access is okay unless blocked later&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A tiny example: deny by default with OPA
&lt;/h2&gt;

&lt;p&gt;If your agent can call internal tools, a policy layer should sit between “model wants to act” and “action executes.”&lt;/p&gt;

&lt;p&gt;Here’s a minimal example using OPA.&lt;/p&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;brew &lt;span class="nb"&gt;install &lt;/span&gt;opa
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Policy (&lt;code&gt;agent.rego&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="ow"&gt;package&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;authz&lt;/span&gt;

&lt;span class="ow"&gt;default&lt;/span&gt; &lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"repo-bot"&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"read_issue"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;allow&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"deploy-bot"&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"create_deployment"&lt;/span&gt;
  &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"staging"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"agent":"repo-bot","action":"read_issue"}'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  opa &lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; agent.rego &lt;span class="s2"&gt;"data.agent.authz.allow"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'{"agent":"repo-bot","action":"create_deployment","env":"prod"}'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  opa &lt;span class="nb"&gt;eval&lt;/span&gt; &lt;span class="nt"&gt;-I&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; agent.rego &lt;span class="s2"&gt;"data.agent.authz.allow"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first should evaluate to &lt;code&gt;true&lt;/code&gt;. The second should be &lt;code&gt;false&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That’s the point: &lt;strong&gt;if you didn’t explicitly allow it, it doesn’t happen&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can put this in front of MCP tools, internal APIs, CI actions, or deployment jobs. The policy engine matters less than the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where teams get stuck
&lt;/h2&gt;

&lt;p&gt;The hardest part isn’t writing the deny rule. It’s untangling assumptions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“the agent runs inside our VPC, so it’s trusted”&lt;/li&gt;
&lt;li&gt;“it only has staging creds”&lt;/li&gt;
&lt;li&gt;“we’ll inspect logs if something weird happens”&lt;/li&gt;
&lt;li&gt;“the tool server already has auth”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those controls are not useless. They’re just incomplete.&lt;/p&gt;

&lt;p&gt;An agent is an &lt;strong&gt;actor&lt;/strong&gt; making decisions at runtime. Once it can chain tools together, static trust boundaries stop being enough.&lt;/p&gt;

&lt;p&gt;A good baseline looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;unique identity per agent&lt;/li&gt;
&lt;li&gt;short-lived credentials&lt;/li&gt;
&lt;li&gt;per-tool authorization&lt;/li&gt;
&lt;li&gt;delegation with scope and expiry&lt;/li&gt;
&lt;li&gt;approval for high-risk actions&lt;/li&gt;
&lt;li&gt;immutable audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If that sounds like overkill, compare it to what you already require from humans touching prod.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP makes this more urgent, not less
&lt;/h2&gt;

&lt;p&gt;MCP is making tool integration much easier. That’s good for developers, but it also means agents can reach more systems with less friction.&lt;/p&gt;

&lt;p&gt;The danger is obvious: &lt;strong&gt;easy tool connectivity without strong authorization becomes default permit at scale&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you’re exposing an MCP server, ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can any connected agent call every tool?&lt;/li&gt;
&lt;li&gt;Are dangerous tools separated from read-only tools?&lt;/li&gt;
&lt;li&gt;Do you know which identity invoked which action?&lt;/li&gt;
&lt;li&gt;Can you revoke one agent without breaking all automation?&lt;/li&gt;
&lt;li&gt;Do you have a policy gate before execution?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer to most of those is “not really,” now is the time to fix it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;If you want to pressure-test your setup, here are a few free tools that help:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Want to check your MCP server?&lt;/strong&gt; Try &lt;a href="https://tools.authora.dev" rel="noopener noreferrer"&gt;https://tools.authora.dev&lt;/a&gt;&lt;br&gt;&lt;br&gt;
It scans for security issues, spec compliance, and exposure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Want to scan your codebase for agent security issues?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Run &lt;code&gt;npx @authora/agent-audit&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Want a visible identity signal for your agent?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Add a verified badge: &lt;a href="https://passport.authora.dev" rel="noopener noreferrer"&gt;https://passport.authora.dev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Want more agent security resources?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Check out &lt;a href="https://github.com/authora-dev/awesome-agent-security" rel="noopener noreferrer"&gt;https://github.com/authora-dev/awesome-agent-security&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The big takeaway
&lt;/h2&gt;

&lt;p&gt;The biggest mistake in agent security right now is treating access control like a cleanup task.&lt;/p&gt;

&lt;p&gt;It’s not.&lt;/p&gt;

&lt;p&gt;If your agent can act, it needs identity.&lt;br&gt;&lt;br&gt;
If it has identity, it needs permissions.&lt;br&gt;&lt;br&gt;
If it has permissions, they should be &lt;strong&gt;explicit&lt;/strong&gt;, not assumed.&lt;/p&gt;

&lt;p&gt;Default permit made sense for prototypes.&lt;br&gt;&lt;br&gt;
For real systems, it’s how “helpful automation” turns into an incident report.&lt;/p&gt;

&lt;p&gt;How are you handling agent identity and tool authorization today? Drop your approach below.&lt;/p&gt;

&lt;p&gt;-- Authora team&lt;/p&gt;

&lt;p&gt;This post was created with AI assistance.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
