I Built a Remote Control for AI Coding Agents — Here's the Engineering Behind CorvusTunnel

#ai #programming #opensource #productivity

Last month I caught myself SSH-ing into my own workstation from my phone — just to approve a file change that Claude Code was waiting on. I was at a café, my laptop was at home running a long refactor, and the agent had been blocked for forty minutes because it needed a yes/no answer.

I didn't want to slap --dangerously-skip-permissions on everything either. Even when I did that, I'd still have to sit in front of my PC to hand the agent its next task. It finishes a job, waits for instructions, and I'm stuck at my desk like it's 2019.

That was the moment I decided to build CorvusTunnel.

The problem that was only half-solved

AI coding agents are getting good — good enough that I actually leave them running while I step away. Claude Code, Codex CLI, Antigravity — they all need a human in the loop at some point. They want to create a file, delete a function, run a migration. And they wait.

Now, some people have already solved pieces of this. Claude has its own remote control. Codex has one too. Happy Coder exists. But every one of them only works with its own agent. If you use Claude Code in the morning and Codex in the afternoon — or you're evaluating all three on the same project — you need separate tools, separate setups, separate apps on your phone.

I wanted one tool that reaches all of them at once. Run a command, scan a QR code, and have a real terminal on my phone that's encrypted end-to-end. No accounts, no port forwarding, no Docker. CorvusTunnel talks to whatever agent you're running because it doesn't care which agent it is — it's just a terminal.

Architecture: the relay pattern

The hardest part of "connect your phone to your computer" is NAT traversal. Your home machine is behind a router, your phone is on a cell network, and neither can directly reach the other.

I considered three approaches:

Option 1: Port forwarding. The user opens a port on their router. This is hostile to non-experts and a security surface I didn't want to create.

Option 2: Cloudflare Tunnel. cloudflared creates an outbound tunnel from your machine. This works but requires installing a separate binary and a Cloudflare account. I kept it as a --no-relay fallback.

Option 3: A relay server. Both sides connect outbound to a lightweight WebSocket relay. The relay holds no keys and forwards opaque bytes. This is what CorvusTunnel does by default.

The relay is a small Cloudflare Worker. It matches sessions by ID and forwards bytes. It never holds keys, never buffers messages beyond delivery, and never inspects content. If you don't trust even that, --no-relay --no-tunnel puts you on LAN-only mode with zero third-party involvement.

Cryptography: why NaCl, not TLS

TLS terminates at the relay. Even if I ran my own relay and pinned certificates, anyone who compromised the relay server would see plaintext. I needed encryption where only the two endpoints hold keys.

I chose NaCl (via libsodium's Python binding, PyNaCl) for the entire session:

Key exchange: Both sides generate ephemeral X25519 keypairs at connection time. The public keys are exchanged during the QR code scan (the QR encodes the server's public key alongside the boot token). The browser generates its own keypair and sends its public key back.
Authenticated encryption: Every WebSocket frame is encrypted with crypto_box (XSalsa20-Poly1305). This gives authenticated encryption — tampering is detected, not just eavesdropping.
Forward secrecy: Keys are ephemeral and never touch disk. When the session ends, the keys are garbage-collected. Compromising a future session reveals nothing about past sessions.

The implementation is around 120 lines of Python. There's no custom cryptography — just NaCl primitives composed in the standard way.

One design choice I'm particularly happy with: the QR code carries everything needed for the key exchange in a single scan. No back-and-forth, no "enter this code on your phone." Scan, connect, encrypted. The entire handshake completes in under a second.

The terminal: PTY over WebSocket

The core of CorvusTunnel is a pseudo-terminal (PTY) that runs the agent process and streams its output over WebSocket to an xterm.js instance in the browser.

On the server side, the executor spawns the agent (claude, codex, or agy) in a PTY using Python's pty.openpty(). Every byte the agent writes to stdout goes through the crypto layer and arrives at the browser as an encrypted WebSocket frame. The browser decrypts it and feeds it to xterm.js, which renders it as a real terminal — colors, cursor movement, alternate screen buffer, everything.

Input works the same way in reverse. Tap a key on your phone, it gets encrypted, sent to the relay, forwarded to your machine, decrypted, and written to the PTY's stdin. The agent sees it as normal keyboard input.

This means CorvusTunnel doesn't need to understand the agent's protocol. It doesn't parse prompts or intercept approval flows. It's just a terminal. Whatever the agent shows you locally, you see remotely. This is why supporting multiple agents was trivial — they all just run in a terminal.

FastAPI on two ports

The server runs two FastAPI applications on two ports:

Port 8000 (public): Serves the web UI, handles WebSocket connections, and exposes the API that the browser talks to. This port connects to the relay.
Port 8001 (internal, localhost-only): Admin endpoints for health checks, session management, and audit log access. This port never leaves the machine.

The split was deliberate. Port 8000 faces the internet (through the relay) and gets the full security stack: rate limiting, IP bans, body-size caps, security headers, CORS policy. Port 8001 is bound to 127.0.0.1 and trusts the caller implicitly.

QR code as the entire auth flow

I wanted zero configuration on the phone side. No app to install, no token to paste, no URL to type. Just scan.

The QR code encodes a URL containing a one-time boot token, the server's ephemeral X25519 public key, and the relay session ID. When the browser opens this URL, it exchanges the boot token for a session token (the boot token is immediately invalidated), generates its own keypair, derives the shared secret, and opens the encrypted WebSocket connection.

The boot token has a 60-second TTL. If nobody scans the QR code within a minute, the server shuts down and the token expires. No stale links linger.

The frontend: Svelte, not React

The web UI is built with Svelte and Vite. Two reasons:

Bundle size. The entire frontend compiles to ~180KB gzipped. On a phone over a cell connection, this matters. React would have added 40KB+ just for the runtime.
PWA performance. CorvusTunnel installs as a PWA. Svelte's compiled output has less JavaScript to parse on startup, which makes the "app" feel native-fast even on mid-range phones.

On top of xterm.js, I added quick-action chips (context-aware buttons like "Approve", "Reject", "Continue") that adapt to what the agent is showing, and a pinned favorites bar for commands you use often.

Hardening: assume the relay is hostile

Even though I run the relay, I designed CorvusTunnel as if the relay were adversarial:

E2E encryption means the relay can't read traffic even if compromised
IP-bound session tokens prevent session hijacking
One-time WebSocket tickets with 30s TTL prevent replay attacks
Automatic IP bans after repeated auth failures
Explicit client-IP trust chain — X-Forwarded-For only honored from TRUSTED_PROXIES
Append-only audit logging in JSONL for full forensic trail

What I learned

Keep the abstraction low. My first prototype tried to parse the agent's output and present a custom UI for approvals. This was fragile, agent-specific, and broke every time Claude Code changed its output format. The terminal abstraction is crude but universal. It's also what made multi-agent support trivial — I didn't have to reverse-engineer three different agent protocols. They all run in a terminal. Done.

--dangerously-skip-permissions isn't the answer. It's tempting to just auto-approve everything so the agent doesn't block. But even if you're comfortable with that risk, you still need to give it the next task when it finishes. The real problem isn't permissions — it's presence.

QR codes are underrated for auth. The QR flow is faster than OAuth, simpler than magic links, and works offline. The entire key exchange happens in the payload.

NaCl is a joy to use. No cipher suite negotiation, no mode selection, no IV management. One function encrypts, one function decrypts. If TLS doesn't cover your threat model, NaCl is the answer.