nghiahsgs

Posted on May 16

⭐ I gave Claude Code a real Ubuntu computer (and open-sourced the engine)

#agents #mcp #opensource #showdev

TL;DR — I built taw-computer, an open-source MCP server that gives any AI
agent (Claude Code, Cursor, etc.) a real Ubuntu computer to drive. 40 tools, two sandbox backends, browser automation via Playwright
with Set-of-Mark prompting. Hosted version at shipkit.cc if you want to try without self-hosting.

The problem

When you use Claude Code or Cursor today, the AI can:

Read and edit files in your project
Run commands in your terminal
That's it.

It cannot:

Open a browser and verify the UI it just shipped
Test against a real database it doesn't have installed locally
Deploy a preview link to send a client
Install random apt packages without contaminating your machine
Use a GUI app like a designer would use Figma

Half of "agentic coding" workflows hit this wall. The agent writes the code, you copy-paste it into a terminal, run it, screenshot
the bug, paste back. The agent is reading code through a straw.

What I built

A sandboxed Ubuntu desktop, exposed as MCP tools.

┌─ Your editor (Claude Code / Cursor) ─┐
│ │
│ "Open Chrome, search 'X', take│
│a screenshot of the first result"│
│ │
└──────────────┬────────────────────────┘
 │ MCP (HTTP + Bearer)
 ▼
┌─ taw-computer MCP server ─────────────┐
│40 tool handlers │
└──────────────┬────────────────────────┘
 │ Docker API / Firecracker
 ▼
┌─ Ubuntu 22.04 sandbox ────────────────┐
│bash · Node 20 · Python 3│
│Chromium + Playwright + CDP│
│xfce4 desktop + VNC live view│
│Postgres / MySQL / Redis clients │
└───────────────────────────────────────┘

Setup is one line:

claude mcp add --transport http shipkit https://shipkit.cc/mcp \
--header "Authorization: Bearer YOUR_KEY"

Now your agent can do this in one turn:

> open chrome on the shipkit workspace, search "claude opus 4.7",
take a screenshot of the top result, then send it to my email.

⏺ vm_create(label: "research")
⏺ browser_open()
⏺ browser_navigate("https://google.com")
⏺ browser_snapshot()← Set-of-Mark applied
⏺ browser_click_ref("[2]")← search box, ref 2
⏺ browser_type_ref("[2]", "claude opus 4.7")
⏺ browser_key("Enter")
⏺ browser_snapshot()
⏺ desktop_screenshot()
⏺ send_email(to: "me@x.com", attachments: [...])

You watch it happen live by opening the VNC URL the agent returns.

The interesting bits

1. Set-of-Mark prompting beats coordinate-guessing

Most browser automation libraries try to teach the LLM to output pixel coordinates. That fails because (a) LLMs are bad at spatial
reasoning and (b) page layouts shift.

Set-of-Mark instead draws numbered badges on every clickable element before sending the screenshot to the model. The model just
says "click [2]" — no coordinates, no guessing.

// browser_snapshot returns:
// - PNG with [0] [1] [2] [3] overlays on buttons, inputs, links
// - JSON mapping ref → element description
//
// Then click_ref looks up ref → DOM element → CDP click event

It works dramatically better than alternatives I've used.

2. Two backends, one interface

The repo abstracts sandbox via SandboxManager so two backends plug in:

SandboxManager (interface)
├── DockerSandbox(works on any Linux VPS)
└── FirecrackerSandbox (microVM, needs /dev/kvm)

Default auto-detects KVM and falls back to Docker. For dev VPS without nested virt, Docker is fine. For multi-tenant production,
you'd want bare-metal + Firecracker for hardware isolation.

3. Per-container bridge networks (real isolation, not "share one big bridge")

A subtle thing that surprised even my security-conscious users: each VM gets its own Docker user-defined network (taw-net-<vmId>),
not the shared default bridge.

async create(opts) {
const name = `shipkit-${genId()}`;
const netName = `taw-net-${name}`;
await execPromise(`docker network create ${netName}`);
// → IP space, ARP table, broadcast domain all isolated per VM
}

VM-to-VM scan inside the same host is blocked by Docker's network namespace, not just firewall rules. Plus egress filter for spam
ports and IMDS metadata IPs.

4. Live preview via subdomain

Agent runs npm run dev on port 5173. Calls share_port(5173). Gets back https://<random16hex>.shipkit.cc. You send the link to
your client. Done.

The proxy is a Node HTTP layer that does vmId → containerIP lookup per request, no nginx config edits.

What's NOT there yet (honest section)

No GPU passthrough — can't run CUDA inference inside the VM
Snapshots are full image commits — works but inefficient at scale
Firefox/Safari — only Chromium browser tools
No multi-region — single-VPS deploy for now
Compliance certs — none. Use for dev/PoC, don't put production secrets in there

The README is straightforward about Docker-vs-Firecracker trade-offs. If you need real hardware isolation for multi-tenant workloads,
Firecracker on bare-metal is the answer (work in progress).

Why open source the engine?

I tried [browserbase], [e2b], [modal] — all paid SaaS, mostly closed-source, vendor-locked. For something as fundamental as "give my
agent a computer" I wanted the engine open so:

People can self-host without paying me
Security can be audited externally
Forks and contributions are possible

The hosted version (shipkit.cc) adds the SaaS layer (multi-user auth, billing, dashboard, port sharing infra)
on top. The engine stays free.

Try it

# Self-host (Docker required)
git clone https://github.com/the-agents-work/taw-computer
cd taw-computer
npm install && npm run image:build
npm run dev # stdio mode for Claude Desktop

# Or use the hosted version
claude mcp add --transport http shipkit https://shipkit.cc/mcp \
--header "Authorization: Bearer $(open https://shipkit.cc and register)"

Repo: https://github.com/the-agents-work/taw-computer — star + issue if useful, especially if you find Firecracker production
blockers, I'm actively working on that path.

Roast welcome.

Top comments (1)

nghiahsgs • May 16

good