DEV Community

Cover image for Jailbreaking Claude Cowork: Escaping the “Sandbox”
Aaron Walker
Aaron Walker

Posted on

Jailbreaking Claude Cowork: Escaping the “Sandbox”

Checkout The Repo: Cowork-Bridge


TL;DR: The "Bridge" Protocol

Claude’s Cowork mode is a powerful orchestrator, but its security sandbox (running under bwrap) blocks real-world developer tasks like hitting private APIs, running Docker, or using local git credentials.
I built a bidirectional filesystem bridge that turns the Cowork VM into a frontend UX while delegating restricted tasks to a host-side watcher.
The Hack: An "RPC over a mounted folder" protocol using JSON requests and responses.
The Capability: Enables host-side execution of curl, git, docker, and even "Claude-to-Claude" delegation using the unrestricted host CLI.
The Secret Sauce: A script to rewrite Cowork's internal systemPrompt to remove "guardrail rituals" and treat the agent like a high-speed power tool.
The Setup: Fully automated via a background daemon that detects and "bridges" new Cowork sessions as soon as they are created.


A Love/Hate Relationship with Claude Cowork

Cowork mode is awesome… until you try to do real power-user things.
I wanted Claude in Cowork to:

  • hit arbitrary APIs (not just allowlisted domains)
  • push to git remotes with real credentials
  • run Docker on my machine
  • use my custom agents + MCP servers
  • behave like a dev tool, not a guarded assistant

Instead, I kept slamming into the reality of Cowork’s sandbox: a VM launched with isolation tooling (in my case I confirmed it’s running under bwrap / bubblewrap), with restricted host access and outbound network limitations that show up the moment your workflow touches something outside the box.

So I built a bridge. It is a bidirectional filesystem protocol between the sandboxed Cowork VM and an unrestricted Claude CLI running on my Mac (optionally in Docker). Cowork writes requests into a shared folder; a host-side watcher executes them with full capabilities and writes responses back.

It’s basically “RPC over a mounted folder.” Here is the idea in a diagram:

┌─────────────────────────────────────────────────────────────────┐
  COWORK VM (sandboxed)                                          
                                                                 
  "Fetch from api.example.com, then summarize"                    
                                                                
                                                                
     /mnt/outputs/.bridge/requests/job-2026...-001.json           
└───────────────────────────┬─────────────────────────────────────┘
                              (mounted output folder)
                            
┌─────────────────────────────────────────────────────────────────┐
  HOST MAC (unrestricted)                                        
                                                                 
  watcher.sh sees new request  executes it (curl/git/docker/CLI) 
  writes response JSON (or stream log)                            
                                                                
                                                                
     .../outputs/.bridge/responses/job-2026...-001.json           
└───────────────────────────┬─────────────────────────────────────┘
                            
                            
┌─────────────────────────────────────────────────────────────────┐
  COWORK VM                                                       
                                                                 
  Reads response JSON (or tails stream file) and continues        
└─────────────────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

Cowork stays your chat UX + orchestration brain.
Your Mac (and optionally host Claude CLI) becomes the executor with real access.


Why Cowork feels powerful… and then suddenly doesn’t

Cowork is designed to be safe and broadly usable. That’s great. But if you’re doing developer workflows, the safety posture often translates to friction:

  • outbound HTTP is restricted (in my environment it was allowlisted)
  • anything touching your real machine—Docker, credentialed git, custom toolchains—becomes awkward or blocked
  • the agent behaves like it must follow guardrail rituals (more on that later)

My goal became simple:

Keep Cowork for the UX, but delegate anything restricted to a host-side executor I control.

That’s the bridge.


Architecture: two “sides” + one shared directory

This repo is conceptually two components:

  • VM-side: cowork-bridge: A small “skill” that writes requests into the shared folder and polls for responses.

  • Host-side: cli-bridge watcher: A script (watcher.sh) that watches for new requests and executes them on the host with full permissions (optionally inside Docker).

The transport layer is just files.


Part 1 — The protocol (deep dive)

The protocol is intentionally boring. That’s a feature:

  • you can inspect it with ls and cat
  • you can debug it by reading the request/response JSON
  • it doesn’t depend on network access
  • it works even when the sandbox is locked down

The bridge lives under a .bridge/ directory inside the session’s shared outputs folder:

.bridge/
├── requests/ # Cowork writes request JSON files here
├── responses/ # Host watcher writes response JSON files here
├── streams/ # Optional: streaming output log files
└── logs/ # Audit/debug logs
Enter fullscreen mode Exit fullscreen mode

There’s also a small status.json file written during initialization so the VM side can confirm it’s wired up.

File locations (VM vs host)

This is the key trick: the same files appear in two places.

Inside the Cowork VM, you’ll typically see it under something like:

/sessions/<session-name>/mnt/outputs/.bridge/
Enter fullscreen mode Exit fullscreen mode

On the host Mac, the same folder is inside Claude’s session directory:

~/Library/Application Support/Claude/local-agent-mode-sessions/
<account-id>/<workspace-id>/local\_<session-id>/outputs/.bridge/
Enter fullscreen mode Exit fullscreen mode

Once you accept that “Cowork sessions are just directories,” this stops feeling like a hack and starts feeling like infrastructure.

Request format

Each request is a single JSON file at:

requests/<job-id>.json
Enter fullscreen mode Exit fullscreen mode

A request has:

  • id — unique job ID (also the filename)
  • timestamp
  • type — what kind of operation to run
  • type-specific fields (command, args, url, prompt, etc.)
  • timeout — seconds
  • optional env — values or references (e.g., $HOST_API_KEY)
  • optional cwd — where to run on host
  • optional stream: true — for streaming output

Here’s a representative exec request:

{
"id": "job-20260201-001",
"timestamp": "2026-02-01T03:14:15Z",
"type": "exec",
"command": "bash",
"args": ["-lc", "curl -s https://api.example.com/data | jq ."],
"timeout": 30,
"cwd": "~/projects/my-app"
}
Enter fullscreen mode Exit fullscreen mode

The watcher script is careful to avoid the classic “shell injection by string concatenation” problem: for sensitive handlers it builds argument arrays and sends prompts via stdin rather than interpolating them into shell strings.

Request types (and why they exist)

In principle, you could make everything exec. But first-class request types make the system safer and easier to audit.

Supported types:

  • exec — run a shell command
  • http — make an HTTP request directly (safer than crafting curl strings)
  • git — git operations with credentials
  • node — node/npm/npx/yarn/pnpm style commands
  • docker — run docker commands on host
  • prompt — delegate to host Claude CLI (Claude-to-Claude)
  • env — inject env vars into the VM session settings
  • file — read/write/list/check existence on the host filesystem

Response format

Each response is a JSON file at:

responses/<job-id>.json
Enter fullscreen mode Exit fullscreen mode

A response includes:

  • id
  • timestamp
  • status
  • exit_code (where relevant)
  • stdout / stderr (unless streaming)
  • duration_ms
  • error (structured when possible)

Example:

{
"id": "job-20260201-001",
"timestamp": "2026-02-01T03:14:16Z",
"status": "completed",
"exit_code": 0,
"stdout": "{ \"ok\": true }",
"stderr": "",
"duration_ms": 842,
"error": null
}
Enter fullscreen mode Exit fullscreen mode

Status values

  • pending — request received, not started
  • running — executing
  • completed — success
  • failed — finished with error
  • timeout — exceeded timeout
  • streaming — output is in a stream file (see below)

Streaming protocol (for logs + huge outputs)

Some outputs don’t belong inside JSON:

  • long-running commands (docker logs -f, watchers)
  • huge outputs (large diffs, large JSON responses)
  • long Claude CLI responses
  • anything you’d naturally “tail”

Streaming can happen in three ways:

  • Explicit: request includes "stream": true
  • Auto-stream: output exceeds a threshold (default is ~50KB in the watcher)
  • Naturally continuous: logs/watch commands

What happens during streaming

  1. Host creates streams/<job-id>.log
  2. Host writes output incrementally to the stream file
  3. Host writes a sentinel line **STREAM_END** when complete
  4. Host writes the final responses/<job-id>.json with bytes_written, exit_code, etc.

On the Cowork side, it just does:

tail -f /mnt/outputs/.bridge/streams/job-20260201-logs.log
Enter fullscreen mode Exit fullscreen mode

Or you can read incrementally with offset tracking if you want to be fancy.


Part 2 — The skills: VM-side ergonomics + host-side execution

The bridge feels “native” only if Cowork can use it naturally. That’s why there are two skills:

1. cowork-bridge (inside the VM)

This is the “client” side. It gives Cowork a repeatable workflow:

  • confirm bridge is initialized (status.json)
  • write a request JSON file into requests/
  • poll responses/ until the matching response exists
  • return the result (or stream pointer)

The skill doc includes a simple step-by-step:

“Write request” -> “Poll for response” -> “Tail stream”
Enter fullscreen mode Exit fullscreen mode

It also includes helper shell functions that make this nice from inside the VM:

  • job ID generator
  • “write request + wait”
  • “stream request + tail until sentinel”

This matters more than it sounds: once you have one-liners, Cowork can be guided to reliably use them and you stop re-inventing glue code.

2. cli-bridge (host watcher)

The host side is a loop that:

  • finds the latest active session (or uses an explicit --session)
  • finds that session’s .bridge/requests/
  • processes each JSON request file
  • routes by type
  • writes response JSON
  • logs everything to .bridge/logs/bridge.log

Watcher defaults (useful details)

The watcher is configurable via env vars, but defaults include:

  • poll interval: 1 second
  • stream threshold: ~50KB (51200 bytes)
  • allowed types: exec/http/git/node/docker/prompt/env/file
  • a small blocked-command list for obviously dangerous footguns (rm -rf /, etc.)

Streaming support on host

Streaming is currently supported for:

  • exec
  • prompt (Claude-to-Claude), including streamed Claude output
  • Other types typically return normal JSON responses.

Part 3 — Claude-to-Claude delegation (prompt)

This is the feature that turns the bridge from “job runner” into “distributed agent.”

A prompt request looks like:

{
"id": "job-20260201-claude-001",
"type": "prompt",
"prompt": "Fetch latest issues in my repo and summarize",
"options": {
"agent": "my-github-agent",
"model": "sonnet",
"tools": ["Bash", "Read", "Write"],
"system_prompt": "You are helping a sandboxed Cowork session. Be concise."
},
"timeout": 120
}
Enter fullscreen mode Exit fullscreen mode

On the host, the watcher calls Claude CLI roughly like:

  • sends prompt via stdin (safer)
  • passes through --agent, --model, --tools, --system-prompt

So Cowork becomes:

  • frontend (chat + context + orchestration)
  • router (decide what to delegate)
  • consumer (read the response and continue)

And host Claude becomes:

  • full network
  • real filesystem
  • your real tools + credentials
  • your agent library

Once you have this, “sandbox limitations” stop being blockers—they become a routing decision.

Part 4 — System prompt rewrites: making Cowork behave like a power tool

This is the part that feels slightly illegal the first time you do it. It turns out that the Cowork session configuration lives in a directory on the host, and that the session prompt is stored in the JSON.

Why prompt injection matters

The default Cowork prompt is optimized for broad safety and “non-developer” workflows, which can translate to:

  • mandatory TodoWrite
  • mandatory ask-a-question-before-work patterns
  • refusal patterns for curl/wget/requests even via bash
  • verbose confirmations
  • “read skill docs even for trivial actions”

That was simply too much overhead. So I created prompt presets that:

  • assume developer competence
  • allow terse, CLI-like responses
  • make TodoWrite/AskUserQuestion optional (not ritual)
  • build in “use the bridge automatically when sandbox limits appear”
  • make Claude-to-Claude delegation first-class

The presets

  • power-user — dev mode + bridge awareness
  • cli-mode — terse, Claude Code-esque behavior
  • minimal — barebones prompt, maximum freedom
  • unrestricted — “no limitations” posture (use carefully)

Injecting prompts

In the repo, there’s a script that can:

  • list presets
  • backup the original config (cowork_settings.json.original)
  • inject a preset
  • show the current prompt (truncated)
  • restore the original

Example:

# list presets

./scripts/inject-prompt.sh --list

# backup original

./scripts/inject-prompt.sh --backup power-user

# inject power-user

./scripts/inject-prompt.sh power-user

# restore original

./scripts/inject-prompt.sh --restore
Enter fullscreen mode Exit fullscreen mode

The best part: changes apply immediately (next message), no restart needed.

A note on template variables

The prompt files are JSON and support template vars (things like current working directory, selected folders, etc.), which means you can write prompts that adapt to the session context.

Part 5 — The scripts: making it reliable across sessions

If this stayed a manual hack, it would die in a week. Sessions come and go. Paths change. New sessions appear. So I built scripts that turn it into “set and forget.”

install.sh — main installer

What it does:

  • installs skills into ~/.claude/skills/
  • installs CLI wrappers into ~/.local/bin/
  • optionally installs a daemon (launchd) to auto-setup new sessions

Flags:

  • --auto — install/start daemon
  • --setup-existing — retrofit existing sessions
  • --full — do both

session-finder.sh (cowork-session) — finding active sessions

It scans:

~/Library/Application Support/Claude/local-agent-mode-sessions
Enter fullscreen mode Exit fullscreen mode

and can list sessions or print the latest one (based on mtime / recency). This is used by basically every other script.

bridge-init.sh (cowork-bridge-init) — wire up one session

This is the actual “plumbing” script. It:

  1. creates .bridge/requests|responses|streams|logs
  2. writes .bridge/status.json
  3. finds the session’s account/workspace IDs
  4. injects the VM-side skill into Claude’s skills-plugin registry path
  5. updates manifest.json to register the skill
  6. injects an env var toggle into .claude/settings.json:
{
"env": {
"BRIDGE_ENABLED": "true"
}
}
Enter fullscreen mode Exit fullscreen mode

That env var becomes a “capability bit” Cowork can check.

setup-all-sessions.sh — retrofit everything

This loops over existing sessions and runs the init steps on any that aren’t already configured.

Useful flags:

  • --force — reconfigure everything
  • --dry-run — show what would change

auto-setup-daemon.sh (cowork-bridge-daemon) — set-and-forget

This is the “always on” piece. It runs continuously and detects new Cowork sessions appearing. It:

  • polls every 2 seconds by default
  • uses fswatch if installed (faster)
  • tracks known sessions in ~/.claude/.bridge-known-sessions
  • initializes any new session automatically

This is the difference between “I built a bridge” and “my environment is always bridged.”

bridge-uninstall.sh — clean removal

It supports:

  • uninstall one session
  • uninstall all sessions
  • remove global installs
  • full cleanup

…and importantly it can run in --dry-run mode so you can see what it would remove.

inject-session.sh — power tools for session config

This script is basically: “treat session config as editable infrastructure.”

It supports commands like:

  • show — print session config
  • model opus|sonnet|haiku — switch models
  • prompt — inject prompt
  • approve-path — pre-approve file access path
  • mount — pre-mount a folder
  • enable-tool / disable-tool — toggle MCP tools
  • list-tools
  • backup / restore

This is the moment you stop thinking of Cowork as “an app” and start thinking of it as “a runtime with configuration.”


Part 6 — Security: you’re the sandbox now

A bridge like this gives Cowork real power. So the host watcher includes:

  • allowed-type allowlist
  • blocked command substrings for obvious disasters
  • timeouts
  • logging for audit/debug

This isn’t “secure” in the formal sense. It’s “controlled by you.”
Treat it like SSH keys: useful, powerful, and worth respecting.


Closing: Cowork becomes a UI, not a cage

This project started with “why can’t Cowork just curl this endpoint” and ended with a pretty clean architecture:

  • Cowork = orchestration + context + UX
  • Bridge = transport
  • Host watcher = executor
  • Host Claude CLI = unrestricted agent brain

Once you have this, you stop fighting the sandbox and start routing around it.

And once you add prompt rewrites + session automation, it stops being a hack and starts being a workflow.

Top comments (0)