DEV Community: Steve Gonzalez

I Replaced My AI Chat Interface With a Terminal Shell

Steve Gonzalez — Mon, 06 Apr 2026 05:16:21 +0000

Most AI tools give you a chat window. You type, the model responds, you copy what you need and paste it somewhere else. The conversation and the artifact live in different places.

I wanted the artifact to appear next to the conversation, stream in as it was generated, and stay there — editable, persistent, tab-switchable — without ever leaving the terminal.

So I built CAS: Conversational Agent Shell.

What it looks like

┌─ chat ──────────────────────┐ ┌─ [l] Todo List For Easter ──────────────┐
│                             │ │                                          │
│ you › make a todo list for  │ │  Easter Todo List                        │
│       easter                │ │                                          │
│                             │ │  ## 🗓 Planning & Budget                 │
│ cas › Created list          │ │                                          │
│       workspace "Todo List  │ │  [ ] Set date and time for Easter Sunday │
│       For Easter".          │ │  [ ] Confirm guest list and RSVPs        │
│       Edit directly or ask  │ │  [ ] Create budget for food              │
│       me to make changes.   │ │  [ ] Check family availability           │
│                             │ │  [ ] Book reservations                   │
│ > █                         │ │                                          │
└─────────────────────────────┘ └──────────────────────────────────────────┘
  ↑↓ scroll  │  enter send  │  tab workspace  │  ctrl+n new session

Left panel: conversation. Right panel: the workspace, streaming in as the model generates it. You stay in the terminal the whole time.

The idea

There's a debate in HCI that goes back to 1997.

Ben Shneiderman argued that direct manipulation gives users control that delegation never can. Pattie Maes argued that agents reduce cognitive load that direct manipulation can't scale to. Both were right. They were arguing about the wrong dichotomy.

CAS resolves it architecturally: agents generate, users manipulate.

You describe what you want. The agent produces it. Once it exists, you own it — you edit it directly, you scroll it, you tab between workspaces, you undo changes. The agent is a producer. You are the controller.

How messages flow

Every message passes through a zero-latency routing layer before any model is called.

Intent detection is pure regex — sub-millisecond, deterministic. The routing decision fires before the LLM even knows a message arrived.

"write a project proposal"          → create workspace (document)
"make a todo list"                   → create workspace (list)
"create a python script"            → create workspace (code)
"add a conclusion section"          → edit active workspace
"run it"                            → execute code workspace
"combine the proposal and checklist" → merge workspaces
"standup"                           → run Lua plugin
"how long should this be?"          → chat reply

Plugins are checked first. Then close, run, combine, edit, create — in that priority order. Self-edit phrases like "I'll fix it myself" are caught before the edit patterns fire. The ordering matters.

Deterministic contracts

Every workspace operation passes through a contract layer:

contract.CheckPreconditions()   // is this operation permitted?
contract.CheckInvariants()      // are all invariants satisfied?
contract.CheckPostconditions()  // did the output meet requirements?

These run in Go, not in the model. The model cannot modify, bypass, or reason about them. Any violation fails the operation closed. Based on Bertrand Meyer's Design by Contract (1986) — a 40-year-old idea that turns out to be exactly right for agentic systems.

Code execution

Say run it with an active code workspace. CAS detects the language from content (bash, Python, Go, JavaScript, Ruby), writes to a temp file, and executes in a sandboxed subprocess.

you › create a python script to compute fibonacci
     → [c] tab opens, tokens stream in

you › run it
     → ran python (23ms, exit 0)
       1, 1, 2, 3, 5, 8, 13, 21, 34, 55

Process group isolation, restricted environment (only PATH inherited), 30-second timeout that kills the entire tree. No LLM call — intent detection routes directly to the runner.

Cross-workspace operations

With multiple tabs open, CAS resolves which workspace you're addressing by fuzzy-matching title fragments:

"update the proposal"                → targets "Project Proposal"
"add the script code to the report"  → edits Report with Script as LLM context
"combine the proposal and checklist" → new workspace from both sources
"merge all workspaces"               → synthesizes everything into one

Edits that reference another workspace by name include that workspace's content in the LLM prompt automatically.

Lua plugins

Drop .lua files in ~/.cas/plugins/ to add custom commands without recompiling:

-- ~/.cas/plugins/standup.lua
cas.command("standup", "Daily standup", function()
    local ws = cas.workspaces()
    local lines = {}
    for i, w in ipairs(ws) do
        lines[i] = "- " .. w.title .. " (" .. w.type .. ")"
    end
    cas.reply(table.concat(lines, "\n"))
end)

Type standup and the plugin runs — no LLM call, sub-millisecond. The Lua VM is sandboxed: no file I/O, no os.execute, no network. API: cas.command(), cas.reply(), cas.workspaces(), cas.active().

Multi-provider

# Ollama — local, private, no API key
./cas

# Anthropic — cloud, no GPU required
export CAS_PROVIDER=anthropic
export ANTHROPIC_API_KEY=sk-ant-...
./cas

Documents and lists route to qwen3.5:9b locally or Sonnet on Anthropic. Code routes to qwen2.5-coder:7b locally or Haiku. All overridable via env vars.

The stack

Single static Go binary. No runtime, no server, no browser. SSH to a remote machine and run it.

internal/
├── intent/      Regex intent detection — 7 intent kinds
├── contract/    Design by Contract enforcement
├── workspace/   Lifecycle: create, update, undo, close
├── shell/       Session manager + workspace resolver
├── llm/         Ollama + Anthropic streaming
├── runner/      Code execution — sandboxed subprocess
├── plugin/      Lua plugin runtime (gopher-lua)
├── store/       SQLite (WAL) + in-memory store
└── conductor/   Behavioral learning
ui/              Bubble Tea TUI: split panel, tabs, streaming

245 tests across all packages. 8 TUI integration tests that spawn the real binary in tmux and interact with it as a user would.

Quick start

# Requires Go 1.25+
git clone https://github.com/goweft/cas.git
cd cas
go build -o cas ./cmd/cas

# Local inference
ollama pull qwen3.5:9b && ollama pull qwen2.5-coder:7b
./cas

# Or cloud
export CAS_PROVIDER=anthropic
export ANTHROPIC_API_KEY=your-key
./cas

Why a terminal

It's already where the work happens. It composes with existing tools — export to markdown, pipe to pandoc, commit to git. And it works over SSH: run CAS on a machine with a GPU, access it from a laptop without one.

Source: goweft/cas — Apache 2.0

I Found Anthropic's Source Map in a Production Bundle - So I Built Five Security Tools published.

Steve Gonzalez — Mon, 06 Apr 2026 05:04:16 +0000

On March 31, 2026, I was reviewing a Claude Code release when I found something unexpected: a complete JavaScript source map — a .js.map file — shipped inside the production bundle. Source maps are development artifacts. They contain the original, pre-minified source code, internal file paths, variable names, and architectural structure. In a production bundle, they're a blueprint of your codebase handed to anyone who looks.

This wasn't an Anthropic-specific failure. Source map leakage is one of the most common pre-publish mistakes in modern JavaScript tooling. Bundlers generate them by default. Developers forget to exclude them. CI pipelines don't check for them. And AI coding tools — which generate and publish code faster than any human can review — make the problem worse.

I built five open-source security tools in response. This post explains what I found, why it matters for AI agent systems specifically, and what each tool does.

What a Source Map Leak Actually Exposes

A .js.map file contains the original unminified source code, internal file paths and project structure, pre-mangled variable and function names, and source-to-output mappings that let anyone reconstruct your build process.

For a company like Anthropic, this means internal architecture details, module boundaries, and naming conventions that would normally take months of reverse engineering — handed over in a single file.

For any organization shipping AI agents, the risk is compounded: agents generate and publish code autonomously, often faster than security review can keep up.

Why AI Tooling Makes This Worse

Traditional developer tools have a human in the loop at publish time. You run npm publish, you notice the 847KB .map file in the tarball, you stop.

AI coding agents change this. An agent that can write, commit, and publish code can do all three faster than a human can review. The attack surface isn't just "developer forgets to exclude source maps" — it's "agent generates a release, publishes it, and the source map was never on anyone's checklist."

This is the gap the five tools address. Not fixing the underlying problem (that's a toolchain problem), but making the gap visible and catchable before it becomes public.

1. tenter — Pre-publish artifact scanner

Repos: goweft/tenter (Python, GitHub Actions) · goweft/tenter-rs (Rust, static binary)

tenter scans a directory before publish and fails if it finds artifacts that shouldn't ship: source maps, .env files, private keys, debug builds, or secrets matching common patterns.

v1 ships as a GitHub Action on the Marketplace — three lines of YAML, zero config:

- uses: goweft/tenter@v1
  with:
    path: ./dist
    fail-on: source-maps,env-files,secrets

v2 (tenter-rs) is a Rust rewrite: a single ~2MB static binary, no runtime dependencies, identical rule set and config format. Runs anywhere including minimal containers and non-GitHub CI.

The source map that triggered this whole sprint would have failed a tenter scan immediately.

2. unshear — Fork divergence detector

Repo: goweft/unshear · Rust

When someone forks an AI agent framework and removes safety mechanisms, unshear finds the delta. It compares a forked codebase against its upstream and surfaces files where safety-related patterns — guardrails, validation, rate limits, audit logging — were removed or weakened.

Named after the shear lines in composite materials: the place where layers separate under stress. A forked agent that stripped its safety layer looks structurally similar to the original until you pull on it.

Rust was a deliberate choice: a security tool that itself has 200 transitive dependencies is a liability.

3. ratine — Agent memory poisoning detection

Repo: goweft/ratine · Python

Ratine detects prompt injection attempts in agent memory stores. As agents accumulate context — conversation history, retrieved documents, tool results — that context becomes an attack surface. A malicious document retrieved during a research task can contain instructions that persist into future agent actions.

Ratine scans memory stores (ChromaDB, plain JSON, SQLite) for patterns consistent with injection: instruction-like language in unexpected positions, escalation patterns, attempts to override system-level constraints.

Named after a type of textured yarn — the attack surface is threaded through otherwise normal content.

4. crocking — AI authorship detection

Repo: goweft/crocking · Python

Crocking identifies code likely generated by an LLM. This matters for supply chain security: AI-generated code has characteristic patterns that differ from human-written code, and knowing provenance helps assess risk. Code that was generated, not written, may not have been reviewed with the same scrutiny.

This is not about whether AI-generated code is "bad." It's about provenance transparency — knowing what you're actually running.

Named after the textile term for dye that rubs off. The AI fingerprint is often visible if you know what to look for.

5. heddle — Runtime trust enforcement

Repo: goweft/heddle · Python

Heddle is the most architectural of the five. It's a self-hosted MCP (Model Context Protocol) mesh runtime where agents are defined as YAML configs, auto-register as MCP servers, and can bidirectionally consume and expose tools.

Every tool call passes through deterministic contract enforcement before execution. Trust tiers (T1 read-only through T4 admin) control what each agent can do. Every action is audit-logged. The security model maps directly to OWASP Agentic Top 10 and NIST AI RMF.

The name comes from the heddle in a loom — the component that controls which threads are lifted. Security is in the architecture, not bolted on afterward.

What the Source Map Leak Actually Tells Us

The Anthropic source map incident was minor in isolation. No credentials were exposed, no production systems were affected. But it's a useful signal: even organizations with mature security practices miss pre-publish checks on non-traditional artifact types.

AI tooling generates a category of artifact — bundles, packages, compiled agents, memory exports — that existing security tooling wasn't designed to inspect. The gap isn't in the tools that exist; it's in the tools that don't exist yet.

These five tools are a start. They're all open source, every repo has tests and CI. The more interesting question is what the full picture looks like when AI agents are generating and publishing code at scale, autonomously, faster than human review can keep up.

That's the problem worth solving.

Links

goweft/tenter — Python, GitHub Marketplace
goweft/tenter-rs — Rust static binary
goweft/unshear — Rust
goweft/ratine — Python
goweft/crocking — Python
goweft/heddle — Python, MCP runtime

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Steve Gonzalez — Wed, 25 Mar 2026 22:36:59 +0000

MCP has no security model. I built Heddle — a policy-and-trust layer that turns YAML configs into validated, policy-enforced MCP tool servers.

MCP (Model Context Protocol) is how AI agents connect to tools. Claude Desktop uses it, Cursor uses it, and thousands of developers are building MCP servers to give AI access to their APIs, databases, and infrastructure.

There's one problem: MCP has no security model.

The protocol defines how a client talks to a server, but says nothing about what that server is allowed to do. No authentication between client and server. No authorization on which tools can be called. No audit trail of what happened. The spec assumes you'll handle all of that yourself.

Most people don't.

What Actually Goes Wrong

I run a self-hosted server with Prometheus, Grafana, Ollama, Gitea, and a handful of other services. I wanted Claude Desktop to query all of them through MCP. The standard approach is to write a Python FastMCP server for each one — a few dozen lines per service, hardcode the API key, register the tools, done.

That works until you think about what you've actually built:

Every MCP server has full access to whatever its process can reach. Your Prometheus tool can also hit your Grafana API, your Gitea API, and anything else on localhost. There's no scoping.

API keys live in environment variables or config files. If you have 9 MCP servers, you have 9 places where credentials sit in plaintext with no access policy.

Nothing is logged. If Claude calls a tool that restarts a service or deletes data, there's no record of which tool was called, with what parameters, by which agent, at what time.

There's no concept of read-only vs. write. A tool either exists or it doesn't. MCP doesn't know that query_prometheus is safe to call freely but restart_service should require approval.

Tool composition creates emergent risks. When Claude has access to multiple MCP servers, it can chain calls across them. Server A reads sensitive data, Server B posts to an external API — Claude could combine them in ways neither server was designed for.

These aren't theoretical risks. During development, I declared an agent as read-only (Trust Tier 1) but gave it a tool that used HTTP POST. The system I built caught it — blocked the call, logged a trust violation, and forced me to either fix the config or explicitly upgrade the trust level. Without that enforcement, the tool would have silently worked and I'd never have known my security model was wrong.

What I Built

Heddle is a runtime that sits between your YAML config and the MCP protocol. You define your tools in a config file, and Heddle validates, secures, and serves them — with policy enforcement on every call.

Here's a complete tool server for Prometheus:

agent:
  name: prometheus-bridge
  version: "1.0.0"
  exposes:
    - name: query_prometheus
      access: read
      description: "Run a PromQL query"
      parameters:
        query: { type: string, required: true }
    - name: get_alerts
      access: read
      description: "List active Prometheus alerts"
  http_bridge:
    - tool_name: query_prometheus
      method: GET
      url: "http://localhost:9090/api/v1/query"
      query_params: { query: query }
    - tool_name: get_alerts
      method: GET
      url: "http://localhost:9090/api/v1/alerts"
  runtime:
    trust_tier: 1

Run heddle run agents/prometheus-bridge.yaml and Claude can query Prometheus in natural language. But every call goes through a six-layer dispatch pipeline before it reaches the API:

Rate limiting → Access mode check → Escalation rules → Input validation → Trust tier enforcement → HTTP bridge execution

Each layer can independently block the call and log why.

The Security Controls

The dispatch pipeline enforces these controls on every tool call:

Trust Tiers (T1–T4). Each config declares a trust level. T1 (observer) can only use GET — any POST/PUT/DELETE is blocked at runtime, not just warned. T2 (worker) allows scoped writes. T3 (operator) allows cross-agent invocation. T4 (privileged) requires human approval. I caught a real misconfiguration with this — a T1 agent tried to POST and the enforcer blocked it before the request ever left the process.

Access Mode Annotations. Every tool is declared as access: read or access: write. T1 configs with write tools are rejected at load time — before the server even starts. This is the schema-level version of least privilege.

Credential Broker. API keys are stored in ~/.heddle/secrets.json with per-config access policies. Configs reference them as {{secret:prometheus-token}} — resolved at runtime, never written to the YAML file. A config can only access secrets it's been explicitly granted. Unauthorized access is denied, logged, and returns a placeholder instead of the real value.

Escalation Rules. Declarative conditions that hold a tool call for review instead of executing it. For example, my VRAM orchestrator has a rule that holds any smart_load call if the model name contains "27b" — because loading a 27-billion parameter model consumes most of my 24GB GPU memory. The rule triggers, the call is held, and the audit log records why.

escalation_rules:
  - name: large-model-load
    reason: "Loading a model that will consume most of the 24GB VRAM"
    tool: "smart_load"
    param_contains:
      model_name: "27b"

Input Validation. Type checking, length limits, and injection pattern detection on every parameter. The validator catches shell injection (; rm -rf /), SQL injection (' OR 1=1), path traversal (../../etc/passwd), and LLM prompt injection (ignore previous instructions). In strict mode, these are blocked. In permissive mode, they're logged and passed through.

Hash-Chained Audit Log. Every tool call, trust violation, credential access, and escalation hold is logged as a JSON Lines entry. Each entry includes a SHA-256 hash of the previous entry — if anyone modifies or deletes a log entry, the chain breaks and verification fails.

Config Signing. All YAML configs are signed with HMAC-SHA256. If a config is modified after signing, the runtime detects the tampering. AI-generated configs (from Heddle's natural language generator) are automatically quarantined in a staging directory until explicitly promoted.

What It Looks Like Running

I'm currently running 46 tools from 9 configs through a single MCP connection to Claude Desktop. The configs cover Prometheus, Grafana, Ollama, Gitea, an RSS aggregator, a RAG search API, a GPU VRAM orchestrator, and a daily operations briefing agent.

Every one of those 46 tools goes through the same dispatch pipeline. The Prometheus tools are T1 (read-only, 5 tools). The Ollama bridge is T2 (can POST for text generation). The VRAM orchestrator is T3 (can invoke other agents, has escalation rules on destructive operations).

The trust tiers aren't just labels — they're enforced. A T1 config physically cannot make a POST request, even if the HTTP bridge URL is correct and the API would accept it. The enforcer blocks it before the request is constructed.

Framework Mapping

Every security control maps to at least one industry framework. This matters if you're in an organization that needs to demonstrate compliance, or if you're building a portfolio that shows applied security architecture (which is why I built this):

Control	OWASP Agentic Top 10	NIST AI RMF
Trust tiers	#3 Excessive Agency	GV-1.3
Credential broker	#7 Unsafe Credential Mgmt	MAP-3.4
Audit logging	#9 Insufficient Logging	MS-2.6
Input validation	#1 Prompt Injection	MS-2.5
Config signing	#8 Supply Chain	GV-6.1
Escalation rules	#3 Excessive Agency	GV-1.3

The full threat model with 8 threat categories is in the repo at docs/threat-model.md.

Getting Started

git clone https://github.com/goweft/heddle.git
cd heddle
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"

# Try a starter pack
cp packs/prometheus.yaml agents/
heddle validate agents/prometheus.yaml
heddle run agents/prometheus.yaml --port 8200

Heddle ships with 6 starter packs — Prometheus, Grafana, Gitea/GitHub, Ollama, Sonarr, and Radarr — that you can drop into agents/ and run immediately. All read-only (T1) except Ollama (T2 for text generation).

Or generate a config from natural language:

heddle generate "agent that wraps the Home Assistant API" --model qwen3:14b

Works with Claude Desktop, Cursor, and any MCP client that supports stdio transport.

Heddle is open source (MIT) at github.com/goweft/heddle. 126 tests, 15 security controls, and a threat model mapped to OWASP Agentic Top 10 and NIST AI RMF. If you're exposing APIs to AI agents, I'd like to know what security controls you wish existed.