<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Steve Gonzalez</title>
    <description>The latest articles on DEV Community by Steve Gonzalez (@goweft).</description>
    <link>https://dev.to/goweft</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3839366%2F25073b27-58cf-4933-975b-920278b4336f.png</url>
      <title>DEV Community: Steve Gonzalez</title>
      <link>https://dev.to/goweft</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/goweft"/>
    <language>en</language>
    <item>
      <title>I Replaced My AI Chat Interface With a Terminal Shell</title>
      <dc:creator>Steve Gonzalez</dc:creator>
      <pubDate>Mon, 06 Apr 2026 05:16:21 +0000</pubDate>
      <link>https://dev.to/goweft/i-replaced-my-ai-chat-interface-with-a-terminal-shell-5aoh</link>
      <guid>https://dev.to/goweft/i-replaced-my-ai-chat-interface-with-a-terminal-shell-5aoh</guid>
      <description>&lt;p&gt;Most AI tools give you a chat window. You type, the model responds, you copy what you need and paste it somewhere else. The conversation and the artifact live in different places.&lt;/p&gt;

&lt;p&gt;I wanted the artifact to appear &lt;em&gt;next to the conversation&lt;/em&gt;, stream in as it was generated, and stay there — editable, persistent, tab-switchable — without ever leaving the terminal.&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;CAS&lt;/strong&gt;: Conversational Agent Shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ chat ──────────────────────┐ ┌─ [l] Todo List For Easter ──────────────┐
│                             │ │                                          │
│ you › make a todo list for  │ │  Easter Todo List                        │
│       easter                │ │                                          │
│                             │ │  ## 🗓 Planning &amp;amp; Budget                 │
│ cas › Created list          │ │                                          │
│       workspace "Todo List  │ │  [ ] Set date and time for Easter Sunday │
│       For Easter".          │ │  [ ] Confirm guest list and RSVPs        │
│       Edit directly or ask  │ │  [ ] Create budget for food              │
│       me to make changes.   │ │  [ ] Check family availability           │
│                             │ │  [ ] Book reservations                   │
│ &amp;gt; █                         │ │                                          │
└─────────────────────────────┘ └──────────────────────────────────────────┘
  ↑↓ scroll  │  enter send  │  tab workspace  │  ctrl+n new session
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Left panel: conversation. Right panel: the workspace, streaming in as the model generates it. You stay in the terminal the whole time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;There's a debate in HCI that goes back to 1997.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q0qa4fa0vq74u72xfnf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1q0qa4fa0vq74u72xfnf.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ben Shneiderman argued that direct manipulation gives users control that delegation never can. Pattie Maes argued that agents reduce cognitive load that direct manipulation can't scale to. Both were right. They were arguing about the wrong dichotomy.&lt;/p&gt;

&lt;p&gt;CAS resolves it architecturally: &lt;strong&gt;agents generate, users manipulate.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You describe what you want. The agent produces it. Once it exists, you own it — you edit it directly, you scroll it, you tab between workspaces, you undo changes. The agent is a producer. You are the controller.&lt;/p&gt;

&lt;h2&gt;
  
  
  How messages flow
&lt;/h2&gt;

&lt;p&gt;Every message passes through a zero-latency routing layer before any model is called.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jnj6ibwovxkuexmyxop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7jnj6ibwovxkuexmyxop.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Intent detection is pure regex — sub-millisecond, deterministic. The routing decision fires before the LLM even knows a message arrived.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"write a project proposal"          → create workspace (document)
"make a todo list"                   → create workspace (list)
"create a python script"            → create workspace (code)
"add a conclusion section"          → edit active workspace
"run it"                            → execute code workspace
"combine the proposal and checklist" → merge workspaces
"standup"                           → run Lua plugin
"how long should this be?"          → chat reply
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plugins are checked first. Then close, run, combine, edit, create — in that priority order. Self-edit phrases like "I'll fix it myself" are caught before the edit patterns fire. The ordering matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deterministic contracts
&lt;/h2&gt;

&lt;p&gt;Every workspace operation passes through a contract layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="n"&gt;contract&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CheckPreconditions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;   &lt;span class="c"&gt;// is this operation permitted?&lt;/span&gt;
&lt;span class="n"&gt;contract&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CheckInvariants&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;      &lt;span class="c"&gt;// are all invariants satisfied?&lt;/span&gt;
&lt;span class="n"&gt;contract&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CheckPostconditions&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c"&gt;// did the output meet requirements?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These run in Go, not in the model. The model cannot modify, bypass, or reason about them. Any violation fails the operation closed. Based on Bertrand Meyer's Design by Contract (1986) — a 40-year-old idea that turns out to be exactly right for agentic systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code execution
&lt;/h2&gt;

&lt;p&gt;Say &lt;code&gt;run it&lt;/code&gt; with an active code workspace. CAS detects the language from content (bash, Python, Go, JavaScript, Ruby), writes to a temp file, and executes in a sandboxed subprocess.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;you › create a python script to compute fibonacci
     → [c] tab opens, tokens stream in

you › run it
     → ran python (23ms, exit 0)
       1, 1, 2, 3, 5, 8, 13, 21, 34, 55
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Process group isolation, restricted environment (only &lt;code&gt;PATH&lt;/code&gt; inherited), 30-second timeout that kills the entire tree. No LLM call — intent detection routes directly to the runner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-workspace operations
&lt;/h2&gt;

&lt;p&gt;With multiple tabs open, CAS resolves which workspace you're addressing by fuzzy-matching title fragments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"update the proposal"                → targets "Project Proposal"
"add the script code to the report"  → edits Report with Script as LLM context
"combine the proposal and checklist" → new workspace from both sources
"merge all workspaces"               → synthesizes everything into one
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Edits that reference another workspace by name include that workspace's content in the LLM prompt automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lua plugins
&lt;/h2&gt;

&lt;p&gt;Drop &lt;code&gt;.lua&lt;/code&gt; files in &lt;code&gt;~/.cas/plugins/&lt;/code&gt; to add custom commands without recompiling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- ~/.cas/plugins/standup.lua&lt;/span&gt;
&lt;span class="n"&gt;cas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"standup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Daily standup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;workspaces&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="kd"&gt;local&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;ipairs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"- "&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="s2"&gt;" ("&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;..&lt;/span&gt; &lt;span class="s2"&gt;")"&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
    &lt;span class="n"&gt;cas&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;table.concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type &lt;code&gt;standup&lt;/code&gt; and the plugin runs — no LLM call, sub-millisecond. The Lua VM is sandboxed: no file I/O, no &lt;code&gt;os.execute&lt;/code&gt;, no network. API: &lt;code&gt;cas.command()&lt;/code&gt;, &lt;code&gt;cas.reply()&lt;/code&gt;, &lt;code&gt;cas.workspaces()&lt;/code&gt;, &lt;code&gt;cas.active()&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-provider
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ollama — local, private, no API key&lt;/span&gt;
./cas

&lt;span class="c"&gt;# Anthropic — cloud, no GPU required&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CAS_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
./cas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Documents and lists route to &lt;code&gt;qwen3.5:9b&lt;/code&gt; locally or Sonnet on Anthropic. Code routes to &lt;code&gt;qwen2.5-coder:7b&lt;/code&gt; locally or Haiku. All overridable via env vars.&lt;/p&gt;

&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;p&gt;Single static Go binary. No runtime, no server, no browser. SSH to a remote machine and run it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;internal/
├── intent/      Regex intent detection — 7 intent kinds
├── contract/    Design by Contract enforcement
├── workspace/   Lifecycle: create, update, undo, close
├── shell/       Session manager + workspace resolver
├── llm/         Ollama + Anthropic streaming
├── runner/      Code execution — sandboxed subprocess
├── plugin/      Lua plugin runtime (gopher-lua)
├── store/       SQLite (WAL) + in-memory store
└── conductor/   Behavioral learning
ui/              Bubble Tea TUI: split panel, tabs, streaming
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;245 tests across all packages. 8 TUI integration tests that spawn the real binary in tmux and interact with it as a user would.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Requires Go 1.25+&lt;/span&gt;
git clone https://github.com/goweft/cas.git
&lt;span class="nb"&gt;cd &lt;/span&gt;cas
go build &lt;span class="nt"&gt;-o&lt;/span&gt; cas ./cmd/cas

&lt;span class="c"&gt;# Local inference&lt;/span&gt;
ollama pull qwen3.5:9b &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; ollama pull qwen2.5-coder:7b
./cas

&lt;span class="c"&gt;# Or cloud&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CAS_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key
./cas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why a terminal
&lt;/h2&gt;

&lt;p&gt;It's already where the work happens. It composes with existing tools — export to markdown, pipe to pandoc, commit to git. And it works over SSH: run CAS on a machine with a GPU, access it from a laptop without one.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/goweft/cas" rel="noopener noreferrer"&gt;goweft/cas&lt;/a&gt; — Apache 2.0&lt;/p&gt;

</description>
      <category>go</category>
      <category>ai</category>
      <category>terminal</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Found Anthropic's Source Map in a Production Bundle - So I Built Five Security Tools published.</title>
      <dc:creator>Steve Gonzalez</dc:creator>
      <pubDate>Mon, 06 Apr 2026 05:04:16 +0000</pubDate>
      <link>https://dev.to/goweft/i-found-anthropics-source-map-in-a-production-bundle-so-i-built-five-security-tools-published-215f</link>
      <guid>https://dev.to/goweft/i-found-anthropics-source-map-in-a-production-bundle-so-i-built-five-security-tools-published-215f</guid>
      <description>&lt;p&gt;On March 31, 2026, I was reviewing a Claude Code release when I found something unexpected: a complete JavaScript source map — a &lt;code&gt;.js.map&lt;/code&gt; file — shipped inside the production bundle. Source maps are development artifacts. They contain the original, pre-minified source code, internal file paths, variable names, and architectural structure. In a production bundle, they're a blueprint of your codebase handed to anyone who looks.&lt;/p&gt;

&lt;p&gt;This wasn't an Anthropic-specific failure. Source map leakage is one of the most common pre-publish mistakes in modern JavaScript tooling. Bundlers generate them by default. Developers forget to exclude them. CI pipelines don't check for them. And AI coding tools — which generate and publish code faster than any human can review — make the problem worse.&lt;/p&gt;

&lt;p&gt;I built five open-source security tools in response. This post explains what I found, why it matters for AI agent systems specifically, and what each tool does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Source Map Leak Actually Exposes
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;.js.map&lt;/code&gt; file contains the original unminified source code, internal file paths and project structure, pre-mangled variable and function names, and source-to-output mappings that let anyone reconstruct your build process.&lt;/p&gt;

&lt;p&gt;For a company like Anthropic, this means internal architecture details, module boundaries, and naming conventions that would normally take months of reverse engineering — handed over in a single file.&lt;/p&gt;

&lt;p&gt;For any organization shipping AI agents, the risk is compounded: agents generate and publish code autonomously, often faster than security review can keep up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Tooling Makes This Worse
&lt;/h2&gt;

&lt;p&gt;Traditional developer tools have a human in the loop at publish time. You run &lt;code&gt;npm publish&lt;/code&gt;, you notice the 847KB &lt;code&gt;.map&lt;/code&gt; file in the tarball, you stop.&lt;/p&gt;

&lt;p&gt;AI coding agents change this. An agent that can write, commit, and publish code can do all three faster than a human can review. The attack surface isn't just "developer forgets to exclude source maps" — it's "agent generates a release, publishes it, and the source map was never on anyone's checklist."&lt;/p&gt;

&lt;p&gt;This is the gap the five tools address. Not fixing the underlying problem (that's a toolchain problem), but making the gap visible and catchable before it becomes public.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. tenter — Pre-publish artifact scanner
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repos:&lt;/strong&gt; &lt;a href="https://github.com/goweft/tenter" rel="noopener noreferrer"&gt;goweft/tenter&lt;/a&gt; (Python, GitHub Actions) · &lt;a href="https://github.com/goweft/tenter-rs" rel="noopener noreferrer"&gt;goweft/tenter-rs&lt;/a&gt; (Rust, static binary)&lt;/p&gt;

&lt;p&gt;tenter scans a directory before publish and fails if it finds artifacts that shouldn't ship: source maps, &lt;code&gt;.env&lt;/code&gt; files, private keys, debug builds, or secrets matching common patterns.&lt;/p&gt;

&lt;p&gt;v1 ships as a GitHub Action on the Marketplace — three lines of YAML, zero config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;goweft/tenter@v1&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./dist&lt;/span&gt;
    &lt;span class="na"&gt;fail-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;source-maps,env-files,secrets&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;v2 (&lt;code&gt;tenter-rs&lt;/code&gt;) is a Rust rewrite: a single ~2MB static binary, no runtime dependencies, identical rule set and config format. Runs anywhere including minimal containers and non-GitHub CI.&lt;/p&gt;

&lt;p&gt;The source map that triggered this whole sprint would have failed a tenter scan immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. unshear — Fork divergence detector
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/goweft/unshear" rel="noopener noreferrer"&gt;goweft/unshear&lt;/a&gt; · Rust&lt;/p&gt;

&lt;p&gt;When someone forks an AI agent framework and removes safety mechanisms, unshear finds the delta. It compares a forked codebase against its upstream and surfaces files where safety-related patterns — guardrails, validation, rate limits, audit logging — were removed or weakened.&lt;/p&gt;

&lt;p&gt;Named after the shear lines in composite materials: the place where layers separate under stress. A forked agent that stripped its safety layer looks structurally similar to the original until you pull on it.&lt;/p&gt;

&lt;p&gt;Rust was a deliberate choice: a security tool that itself has 200 transitive dependencies is a liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. ratine — Agent memory poisoning detection
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/goweft/ratine" rel="noopener noreferrer"&gt;goweft/ratine&lt;/a&gt; · Python&lt;/p&gt;

&lt;p&gt;Ratine detects prompt injection attempts in agent memory stores. As agents accumulate context — conversation history, retrieved documents, tool results — that context becomes an attack surface. A malicious document retrieved during a research task can contain instructions that persist into future agent actions.&lt;/p&gt;

&lt;p&gt;Ratine scans memory stores (ChromaDB, plain JSON, SQLite) for patterns consistent with injection: instruction-like language in unexpected positions, escalation patterns, attempts to override system-level constraints.&lt;/p&gt;

&lt;p&gt;Named after a type of textured yarn — the attack surface is threaded through otherwise normal content.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. crocking — AI authorship detection
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/goweft/crocking" rel="noopener noreferrer"&gt;goweft/crocking&lt;/a&gt; · Python&lt;/p&gt;

&lt;p&gt;Crocking identifies code likely generated by an LLM. This matters for supply chain security: AI-generated code has characteristic patterns that differ from human-written code, and knowing provenance helps assess risk. Code that was generated, not written, may not have been reviewed with the same scrutiny.&lt;/p&gt;

&lt;p&gt;This is not about whether AI-generated code is "bad." It's about provenance transparency — knowing what you're actually running.&lt;/p&gt;

&lt;p&gt;Named after the textile term for dye that rubs off. The AI fingerprint is often visible if you know what to look for.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. heddle — Runtime trust enforcement
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/goweft/heddle" rel="noopener noreferrer"&gt;goweft/heddle&lt;/a&gt; · Python&lt;/p&gt;

&lt;p&gt;Heddle is the most architectural of the five. It's a self-hosted MCP (Model Context Protocol) mesh runtime where agents are defined as YAML configs, auto-register as MCP servers, and can bidirectionally consume and expose tools.&lt;/p&gt;

&lt;p&gt;Every tool call passes through deterministic contract enforcement before execution. Trust tiers (T1 read-only through T4 admin) control what each agent can do. Every action is audit-logged. The security model maps directly to OWASP Agentic Top 10 and NIST AI RMF.&lt;/p&gt;

&lt;p&gt;The name comes from the heddle in a loom — the component that controls which threads are lifted. Security is in the architecture, not bolted on afterward.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Source Map Leak Actually Tells Us
&lt;/h2&gt;

&lt;p&gt;The Anthropic source map incident was minor in isolation. No credentials were exposed, no production systems were affected. But it's a useful signal: even organizations with mature security practices miss pre-publish checks on non-traditional artifact types.&lt;/p&gt;

&lt;p&gt;AI tooling generates a category of artifact — bundles, packages, compiled agents, memory exports — that existing security tooling wasn't designed to inspect. The gap isn't in the tools that exist; it's in the tools that don't exist yet.&lt;/p&gt;

&lt;p&gt;These five tools are a start. They're all open source, every repo has tests and CI. The more interesting question is what the full picture looks like when AI agents are generating and publishing code at scale, autonomously, faster than human review can keep up.&lt;/p&gt;

&lt;p&gt;That's the problem worth solving.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/tenter" rel="noopener noreferrer"&gt;goweft/tenter&lt;/a&gt; — Python, GitHub Marketplace&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/tenter-rs" rel="noopener noreferrer"&gt;goweft/tenter-rs&lt;/a&gt; — Rust static binary&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/unshear" rel="noopener noreferrer"&gt;goweft/unshear&lt;/a&gt; — Rust&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/ratine" rel="noopener noreferrer"&gt;goweft/ratine&lt;/a&gt; — Python&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/crocking" rel="noopener noreferrer"&gt;goweft/crocking&lt;/a&gt; — Python&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/goweft/heddle" rel="noopener noreferrer"&gt;goweft/heddle&lt;/a&gt; — Python, MCP runtime&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>security</category>
      <category>opensource</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The Security Gap in MCP Tool Servers (And What I Built to Fix It)</title>
      <dc:creator>Steve Gonzalez</dc:creator>
      <pubDate>Wed, 25 Mar 2026 22:36:59 +0000</pubDate>
      <link>https://dev.to/goweft/the-security-gap-in-mcp-tool-servers-and-what-i-built-to-fix-it-1hlg</link>
      <guid>https://dev.to/goweft/the-security-gap-in-mcp-tool-servers-and-what-i-built-to-fix-it-1hlg</guid>
      <description>&lt;p&gt;MCP has no security model. I built Heddle — a policy-and-trust layer that turns YAML configs into validated, policy-enforced MCP tool servers.&lt;/p&gt;

&lt;p&gt;MCP (Model Context Protocol) is how AI agents connect to tools. Claude Desktop uses it, Cursor uses it, and thousands of developers are building MCP servers to give AI access to their APIs, databases, and infrastructure.&lt;/p&gt;

&lt;p&gt;There's one problem: &lt;strong&gt;MCP has no security model.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The protocol defines how a client talks to a server, but says nothing about what that server is allowed to do. No authentication between client and server. No authorization on which tools can be called. No audit trail of what happened. The spec assumes you'll handle all of that yourself.&lt;/p&gt;

&lt;p&gt;Most people don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Goes Wrong
&lt;/h2&gt;

&lt;p&gt;I run a self-hosted server with Prometheus, Grafana, Ollama, Gitea, and a handful of other services. I wanted Claude Desktop to query all of them through MCP. The standard approach is to write a Python FastMCP server for each one — a few dozen lines per service, hardcode the API key, register the tools, done.&lt;/p&gt;

&lt;p&gt;That works until you think about what you've actually built:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every MCP server has full access to whatever its process can reach.&lt;/strong&gt; Your Prometheus tool can also hit your Grafana API, your Gitea API, and anything else on localhost. There's no scoping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;API keys live in environment variables or config files.&lt;/strong&gt; If you have 9 MCP servers, you have 9 places where credentials sit in plaintext with no access policy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nothing is logged.&lt;/strong&gt; If Claude calls a tool that restarts a service or deletes data, there's no record of which tool was called, with what parameters, by which agent, at what time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;There's no concept of read-only vs. write.&lt;/strong&gt; A tool either exists or it doesn't. MCP doesn't know that &lt;code&gt;query_prometheus&lt;/code&gt; is safe to call freely but &lt;code&gt;restart_service&lt;/code&gt; should require approval.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool composition creates emergent risks.&lt;/strong&gt; When Claude has access to multiple MCP servers, it can chain calls across them. Server A reads sensitive data, Server B posts to an external API — Claude could combine them in ways neither server was designed for.&lt;/p&gt;

&lt;p&gt;These aren't theoretical risks. During development, I declared an agent as read-only (Trust Tier 1) but gave it a tool that used HTTP POST. The system I built caught it — blocked the call, logged a trust violation, and forced me to either fix the config or explicitly upgrade the trust level. Without that enforcement, the tool would have silently worked and I'd never have known my security model was wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/goweft/heddle" rel="noopener noreferrer"&gt;Heddle&lt;/a&gt; is a runtime that sits between your YAML config and the MCP protocol. You define your tools in a config file, and Heddle validates, secures, and serves them — with policy enforcement on every call.&lt;/p&gt;

&lt;p&gt;Here's a complete tool server for Prometheus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prometheus-bridge&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0.0"&lt;/span&gt;
  &lt;span class="na"&gt;exposes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;query_prometheus&lt;/span&gt;
      &lt;span class="na"&gt;access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;PromQL&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query"&lt;/span&gt;
      &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;string&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;true&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_alerts&lt;/span&gt;
      &lt;span class="na"&gt;access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;read&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;active&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Prometheus&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;alerts"&lt;/span&gt;
  &lt;span class="na"&gt;http_bridge&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;query_prometheus&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9090/api/v1/query"&lt;/span&gt;
      &lt;span class="na"&gt;query_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;query&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;get_alerts&lt;/span&gt;
      &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GET&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:9090/api/v1/alerts"&lt;/span&gt;
  &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;trust_tier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run &lt;code&gt;heddle run agents/prometheus-bridge.yaml&lt;/code&gt; and Claude can query Prometheus in natural language. But every call goes through a six-layer dispatch pipeline before it reaches the API:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting&lt;/strong&gt; → &lt;strong&gt;Access mode check&lt;/strong&gt; → &lt;strong&gt;Escalation rules&lt;/strong&gt; → &lt;strong&gt;Input validation&lt;/strong&gt; → &lt;strong&gt;Trust tier enforcement&lt;/strong&gt; → &lt;strong&gt;HTTP bridge execution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each layer can independently block the call and log why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Controls
&lt;/h2&gt;

&lt;p&gt;The dispatch pipeline enforces these controls on every tool call:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trust Tiers (T1–T4).&lt;/strong&gt; Each config declares a trust level. T1 (observer) can only use GET — any POST/PUT/DELETE is blocked at runtime, not just warned. T2 (worker) allows scoped writes. T3 (operator) allows cross-agent invocation. T4 (privileged) requires human approval. I caught a real misconfiguration with this — a T1 agent tried to POST and the enforcer blocked it before the request ever left the process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Access Mode Annotations.&lt;/strong&gt; Every tool is declared as &lt;code&gt;access: read&lt;/code&gt; or &lt;code&gt;access: write&lt;/code&gt;. T1 configs with write tools are rejected at load time — before the server even starts. This is the schema-level version of least privilege.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Credential Broker.&lt;/strong&gt; API keys are stored in &lt;code&gt;~/.heddle/secrets.json&lt;/code&gt; with per-config access policies. Configs reference them as &lt;code&gt;{{secret:prometheus-token}}&lt;/code&gt; — resolved at runtime, never written to the YAML file. A config can only access secrets it's been explicitly granted. Unauthorized access is denied, logged, and returns a placeholder instead of the real value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Escalation Rules.&lt;/strong&gt; Declarative conditions that hold a tool call for review instead of executing it. For example, my VRAM orchestrator has a rule that holds any &lt;code&gt;smart_load&lt;/code&gt; call if the model name contains "27b" — because loading a 27-billion parameter model consumes most of my 24GB GPU memory. The rule triggers, the call is held, and the audit log records why.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;escalation_rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;large-model-load&lt;/span&gt;
    &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;will&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;consume&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;most&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;24GB&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;VRAM"&lt;/span&gt;
    &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;smart_load"&lt;/span&gt;
    &lt;span class="na"&gt;param_contains&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;27b"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Input Validation.&lt;/strong&gt; Type checking, length limits, and injection pattern detection on every parameter. The validator catches shell injection (&lt;code&gt;; rm -rf /&lt;/code&gt;), SQL injection (&lt;code&gt;' OR 1=1&lt;/code&gt;), path traversal (&lt;code&gt;../../etc/passwd&lt;/code&gt;), and LLM prompt injection (&lt;code&gt;ignore previous instructions&lt;/code&gt;). In strict mode, these are blocked. In permissive mode, they're logged and passed through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hash-Chained Audit Log.&lt;/strong&gt; Every tool call, trust violation, credential access, and escalation hold is logged as a JSON Lines entry. Each entry includes a SHA-256 hash of the previous entry — if anyone modifies or deletes a log entry, the chain breaks and verification fails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Config Signing.&lt;/strong&gt; All YAML configs are signed with HMAC-SHA256. If a config is modified after signing, the runtime detects the tampering. AI-generated configs (from Heddle's natural language generator) are automatically quarantined in a staging directory until explicitly promoted.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Looks Like Running
&lt;/h2&gt;

&lt;p&gt;I'm currently running 46 tools from 9 configs through a single MCP connection to Claude Desktop. The configs cover Prometheus, Grafana, Ollama, Gitea, an RSS aggregator, a RAG search API, a GPU VRAM orchestrator, and a daily operations briefing agent.&lt;/p&gt;

&lt;p&gt;Every one of those 46 tools goes through the same dispatch pipeline. The Prometheus tools are T1 (read-only, 5 tools). The Ollama bridge is T2 (can POST for text generation). The VRAM orchestrator is T3 (can invoke other agents, has escalation rules on destructive operations).&lt;/p&gt;

&lt;p&gt;The trust tiers aren't just labels — they're enforced. A T1 config physically cannot make a POST request, even if the HTTP bridge URL is correct and the API would accept it. The enforcer blocks it before the request is constructed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework Mapping
&lt;/h2&gt;

&lt;p&gt;Every security control maps to at least one industry framework. This matters if you're in an organization that needs to demonstrate compliance, or if you're building a portfolio that shows applied security architecture (which is why I built this):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Control&lt;/th&gt;
&lt;th&gt;OWASP Agentic Top 10&lt;/th&gt;
&lt;th&gt;NIST AI RMF&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trust tiers&lt;/td&gt;
&lt;td&gt;#3 Excessive Agency&lt;/td&gt;
&lt;td&gt;GV-1.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential broker&lt;/td&gt;
&lt;td&gt;#7 Unsafe Credential Mgmt&lt;/td&gt;
&lt;td&gt;MAP-3.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit logging&lt;/td&gt;
&lt;td&gt;#9 Insufficient Logging&lt;/td&gt;
&lt;td&gt;MS-2.6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input validation&lt;/td&gt;
&lt;td&gt;#1 Prompt Injection&lt;/td&gt;
&lt;td&gt;MS-2.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Config signing&lt;/td&gt;
&lt;td&gt;#8 Supply Chain&lt;/td&gt;
&lt;td&gt;GV-6.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Escalation rules&lt;/td&gt;
&lt;td&gt;#3 Excessive Agency&lt;/td&gt;
&lt;td&gt;GV-1.3&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The full threat model with 8 threat categories is in the repo at &lt;a href="https://github.com/goweft/heddle/blob/master/docs/threat-model.md" rel="noopener noreferrer"&gt;docs/threat-model.md&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/goweft/heddle.git
&lt;span class="nb"&gt;cd &lt;/span&gt;heddle
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[dev]"&lt;/span&gt;

&lt;span class="c"&gt;# Try a starter pack&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;packs/prometheus.yaml agents/
heddle validate agents/prometheus.yaml
heddle run agents/prometheus.yaml &lt;span class="nt"&gt;--port&lt;/span&gt; 8200
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heddle ships with 6 starter packs — Prometheus, Grafana, Gitea/GitHub, Ollama, Sonarr, and Radarr — that you can drop into &lt;code&gt;agents/&lt;/code&gt; and run immediately. All read-only (T1) except Ollama (T2 for text generation).&lt;/p&gt;

&lt;p&gt;Or generate a config from natural language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;heddle generate &lt;span class="s2"&gt;"agent that wraps the Home Assistant API"&lt;/span&gt; &lt;span class="nt"&gt;--model&lt;/span&gt; qwen3:14b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Claude Desktop, Cursor, and any MCP client that supports stdio transport.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Heddle&lt;/strong&gt; is open source (MIT) at &lt;a href="https://github.com/goweft/heddle" rel="noopener noreferrer"&gt;github.com/goweft/heddle&lt;/a&gt;. 126 tests, 15 security controls, and a threat model mapped to OWASP Agentic Top 10 and NIST AI RMF. If you're exposing APIs to AI agents, I'd like to know what security controls you wish existed.&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>security</category>
      <category>python</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
