John Wade

Posted on Feb 18 • Edited on Feb 27

When the Sandbox Leaks: Context Contamination Across LLM Workspaces

#architecture #ai #devops #llm

I had two workspaces. One was a sandbox — messy, exploratory, version-controlled on GitHub. The other was a curated portfolio — polished, employer-facing, local-only. The boundary between them was architecturally clear: research stays in the sandbox, finished artifacts get promoted one-way to the portfolio. Simple.

Except the boundary kept failing.

I found three copies of my Obsidian vault in different locations on my machine — Systemic_Intelligence_Vault, Systemic_Intelligence_Vault_Antigravity, and Systemic_Intelligence_Vault_Claude. Each was a variant with slightly different content. Scripts would target the wrong root directory. Absolute paths from the portfolio would show up hardcoded inside sandbox files, coupling two systems that were supposed to be independent. And the part that should have bothered me most — I'd already designed solutions for all of this months earlier. Beacon files, verification scripts, promotion gates. They were documented in my Program Architecture. They just weren't enforced.

That's when I realized: documentation isn't a boundary. Enforcement is.

Part 2 of Building at the Edges of LLM Tooling. If you're running models across separate workspaces — a research environment and a curated one — contamination is the default without enforcement infrastructure. Documentation isn't a boundary. Start here.

Why It Breaks

Contamination between workspaces isn't dramatic. It's entropic. Copy a folder for backup. Reference a path in a script. Create a variant for testing. Each action small and reasonable. But without enforcement, they accumulate into what I started calling spaghetti — tangled, unclear boundaries where messy research bleeds into curated space and curated assumptions leak back into exploratory work.

LLM-assisted workflows multiply this entropy. Every IDE agent, every model session, every tool creates its own assumptions about where things live and what the rules are. A model operating in one copy of your repository doesn't know another copy exists. It doesn't know its configuration file differs from the one in the canonical version. It just follows whatever instructions it finds.

This creates three contamination vectors I kept hitting.

Path contamination. Multiple copies of a project in different locations, no single canonical root. Scripts break. Models reference wrong directories. When my MP assistant surfaced that I'd previously hit "wrong root, wrong name, quoted ~" failures — and that I'd already designed beacon files to prevent them — that was the signal. I was solving the same problem for the second time because the first solution was a document, not a gate.

Behavioral contamination. This one is invisible, which makes it worse. When I ran a diff between the main vault and the Antigravity variant, the content was nearly identical. But the .cursorrules in one said "You are working inside the Systemic Intelligence Vault — a living knowledge system using Expansive Closure Protocol methodology." The other said "You are working inside a local Obsidian vault." Same content. Same prompts. Completely different AI behavior — different assumptions about what the project was, what methodology to apply, how to treat the files. No error message. Just quietly wrong outputs that I couldn't explain until I thought to diff the configuration files.

Promotion contamination. The one-way flow from sandbox to portfolio is supposed to be a clean gate. My architecture document was explicit: "No cross-contamination of messy research into MP; promotion remains one-way." But without preflight checks, drafts leak into the curated space. Without provenance tracking, you lose the lineage between source and promoted artifact. Without lane enforcement, raw chat transcripts and agent logs can accidentally get promoted alongside finished work.

What I Tried

The first instinct was procedural — checklists, handoff protocols, naming conventions. I documented which folders were forbidden from promotion. I wrote architecture docs explaining the one-way flow. I created naming standards: YYYYMMDD_Title_v#.#.md in the sandbox, similar conventions in the portfolio. I detailed guardrails across five categories: conceptual, procedural, semantic, structural, behavioral.

It didn't hold. Manual discipline degrades under load. Checklists get skipped at speed. Which copy is canonical blurs after a week away from the project.

So I moved to technical enforcement. The pattern that emerged had three layers.

Beacon files for canonical roots. A zero-byte .MASTER_PORTFOLIO_ROOT file dropped at the true root of the portfolio. Every script checks for it before operating. If the beacon is missing, the script fails immediately rather than silently working on the wrong directory. My MP assistant was blunt about this: "Pick one canonical root and treat everything else as a copy." The implementation was a trivial bash script — check for the beacon, exit with an error if it's absent. It sounds trivially simple, and it is. That's why it works. You can't forget it because it's a hard gate, not a soft convention.

Preflight checks before cross-boundary operations. Before anything moves from sandbox to portfolio, a script verifies: Is the source file actually in the sandbox repo? Is it from a permitted lane? Does it contain hardcoded portfolio paths that would create coupling? Is the metadata marked as ready for promotion? Any failure blocks the operation.

Pointer-only provenance. Instead of copying source context into the portfolio, promoted artifacts carry a provenance record with only metadata: source system, source reference, date, what was promoted, what was excluded, promotion rationale, and optionally a hash for integrity verification. No content bleed. The portfolio stays clean while the traceability chain remains intact.

For the behavioral contamination problem, the fix required a mindset shift: treat configuration files as part of the system, not as local preference. Version-control .cursorrules and .claude/settings.json. Mark one copy as canonical and every variant as explicitly labeled. When the AI behaves differently than expected, the first diagnostic is "which copy am I in?" — and the beacon system makes that immediately answerable.

What It Revealed

Boundaries in LLM workflows exist at two levels. The first gets built; the second takes deliberate work.

The first level is conceptual: this space is for exploration, that space is for finished work, and promotion flows one direction. The pattern: document the architecture, trust discipline to hold it.

The second level is enforcement: the beacon that makes scripts fail if they're in the wrong root, the preflight check that blocks forbidden lanes, the version-controlled configuration files that prevent invisible behavioral drift. This is where the boundary actually holds.

The gap between levels one and two is where spaghetti grows. And the tell is recurring failures. When my own assistant surfaced that I'd already designed beacons and verification scripts months earlier — to solve the exact same failures I was hitting again — that was the signal. If you're hitting the same contamination pattern twice, you didn't fail at design. You failed at enforcement.

The other insight was about invisible contamination. Content drift is noisy — files can be diffed, differences merged. Configuration drift is silent. Different .cursorrules across vault copies means different AI behavior with no error message, no visible discrepancy in the content, just outputs that feel subtly wrong. This is the hardest contamination to catch because There's no signal until you look for it.

The Reusable Rule

If you maintain separate workspaces for exploratory and curated work — and if LLM agents operate in those spaces — contamination is the default without enforcement infrastructure.

Start with the diagnostic. When you catch yourself re-discovering where files live, that's path contamination — drop a beacon file and make your scripts check for it. When the AI produces unexpectedly different output for the same prompt, check which workspace copy you're in — configuration drift is likely. When you're promoting work from sandbox to portfolio, ask whether the promotion path has a preflight check or whether you're relying on your own attention. Your attention will fail.

The anti-spaghetti principle is this: every boundary between workspaces needs a corresponding enforcement mechanism. Conceptual boundaries document intent. Enforcement mechanisms preserve it. And when the same failure recurs, the missing piece is almost never the design — it's the gate that makes the design non-optional.

DEV Community

When the Sandbox Leaks: Context Contamination Across LLM Workspaces

Why It Breaks

What I Tried

What It Revealed

The Reusable Rule

Top comments (0)