Leo

Posted on May 12

LocalFirst – I built a harness for my AI tool proxy, found 2 bypasses

#ai #mcp #agents

Hi — I built LocalFirst, a local boundary layer for AI coding agents like Claude Code and MCP clients.

It sits between the agent and the cloud model and decides, per tool result, what is allowed to re-enter the next request:

LOCAL – run in-process; the tool result's bytes are not forwarded
PASS – forward unchanged
BLOCK – synthetic refusal goes back to the model; the action never runs
TRANSFORM – redact secrets, distill an 800-line grep result to the relevant 50 lines, etc.

One human-readable policy.yml drives all of it. The same engine governs Claude Code traffic via an HTTP proxy on port 8081 and MCP traffic, so a single deny_paths rule produces byte-identical BLOCK rows under both protocols.

Why I'm posting v0.8.0

I built a real test harness for it. The harness spawns an actual Claude Code subordinate session in a scratch tempdir with a scratch policy and audit chain, points it at the proxy, and runs adversarial scenarios end-to-end against a real model.

Two real bypasses fell out on the first useful run.

Bypass 1: deny_paths via Claude Code auto-context

The host runtime injects synthetic tool_use Read + tool_result <file content> pairs into the OUTBOUND request body as agentic context, before any model-emitted tool call.

My original gate sat at the tool-call boundary on the response side and never saw them. A denied file's bytes were reaching the model anyway.

Bypass 2: Secret-redaction via the same path

An AKIA-shaped fixture inside an auto-context tool_result was forwarded unredacted.

Unit tests could not have caught either of these. They were not bugs in my policy logic. They were a category of traffic the policy logic never saw, because that body shape only appears when a real host runtime is exercising the proxy.

The fix

A second enforcement gate on the outbound request body: src/outbound-policy.js.

The same policy now applies at both boundaries:

Path 1 – tool-call boundary, model-emitted tool_use
Path 2 – outbound auto-context boundary, client-injected tool_result in the next request body

That symmetry now covers deny_paths, secret redaction, distillation, and max_output_tokens. Each path-2 enforcement writes its own audit row with direction: "outbound", so the report command and the independent verifier see both gates uniformly.

Audit story

The log is a SHA-256 hash chain from a fixed genesis sentinel.

docs/AUDIT.md contains a ~30-line standalone Python walker, so a buyer or auditor can confirm chain integrity without trusting LocalFirst's own code.

The harness re-walks the chain on every scenario, so a defended-but-chain-broken outcome is correctly failed instead of silently passing.

What LocalFirst is not

Not a sandbox against the developer on a single-user box. The shell alias is trivially bypassable. The point is enforceable org policy plus tamper-evident evidence, not adversarial isolation from the user. The real-world fit is the same as a corporate HTTP proxy: meaningful when the base URL is set by something the user cannot edit on their own.
Not provider-agnostic yet. Adapters currently cover Claude Code, Cline, and MCP. OpenAI and Gemini agents are not in scope today.
Parts of the adapter layer carry Claude-Code-specific shape knowledge. The auto-context body format that produced bypass #1 is a Claude Code-specific pattern, not a universal one.
Platform support: I develop on Windows and the harness is validated there. macOS and Linux should work, but I have not run the harness on them yet.

Stack

Node ≥18
2,372 unit tests, 113 smoke tests
Apache-2.0 with explicit patent grant

Install

npm install -g @localfirst-ai/localfirst
localfirst policy init
localfirst register
claude

DEV Community