DEV Community

Arthur Jean
Arthur Jean

Posted on

Building a native terminal for AI coding agents in Rust + GPUI

Author: Arthur Jean, solo indie maker. More on paneflow.dev and arthurjean.com.
Repo: github.com/ArthurDEV44/paneflow. MIT licensed.

I spend my days running Claude Code, Codex, and OpenCode in parallel panes. Different agents on different branches, each with their own dev server, all in one terminal. tmux can do this, but every multiplexer I tried treated agents the same as any other shell process: no first-class branch context, no live dev-server detection, no session restore that survives a reboot, no programmatic way for an external tool to drive the editor. I built Paneflow because I needed it.

This is a post-mortem, not a launch post. Paneflow is a native terminal workspace, splits, panes, branch-aware workspaces, session restore, built in pure Rust on top of Zed's GPUI framework and the upstream alacritty_terminal crate. It started as a port of cmux, a macOS-only Swift/AppKit project, and the Rust rewrite forced a string of decisions I had no good intuition for at the start. I want to walk through the ones that mattered: which UI frameworks I tried and rejected, how the GPUI/alacritty boundary actually looks, how dev-server detection works under the hood, the N-ary layout tree that replaced binary splits, the cross-platform PTY plumbing, the JSON-RPC control plane that makes agents first-class, and four lessons that surprised me.

If you take one thing away, take this: for a terminal emulator, the UI framework must own glyph rasterization. Everything else flows from that single constraint.

Paneflow showing a multi-workspace sidebar, an in-pane markdown viewer reading a PRD document, and the OpenCode AI agent active in the right pane

Why a terminal for AI coding agents

VSCode and classic IDEs matter less and less when you work with Claude Code or Codex. The work has shifted: instead of typing code in an editor, I'm directing agents from a shell. One pane runs Claude Code working on a feature branch. Another runs Codex doing a refactor on a second branch. A third tails the dev server. A fourth has notes and a markdown viewer open.

The terminals we know weren't designed for this. They render a grid, they pipe stdin/stdout, they let you split. They don't know which workspace is on which branch, which pane has a server bound to port 5173, which session restored from yesterday's setup. Every multiplexer treats every pane identically, so the orchestration cognitive load lands entirely on you.

Paneflow is built exactly for this mode. Workspaces are git-aware: each one knows its branch and the sidebar surfaces it. Dev-server detection scans for Vite, Next.js, Webpack, and others, then resolves their actual listening ports through the kernel, not by trusting whatever the framework printed. AI agent buttons in the tab bar launch Claude Code, Codex, or OpenCode in the active pane with one keystroke. A local JSON-RPC server exposes the editor so external tools can drive workspaces, send keystrokes, or post agent lifecycle events programmatically. Sessions save and restore so closing the app and reopening it tomorrow lands you exactly where you left off.

Side-by-side comparisons vs cmux, WezTerm, iTerm2, and Warp at paneflow.dev/compare. The rest of this post is about how the engine works.

Why not Electron, why not Tauri

The first option I evaluated was Electron + xterm.js. The architecture review rejected it immediately: "Electron: heavy memory footprint (contradicts cmux's 'not Electron' philosophy)." cmux was conceived as a native, low-RAM tool, and the whole point of porting it was to keep that property on Linux and Windows. So Electron was out without debate.

The second option was Tauri, and I genuinely spent time there. I started a real implementation, laid out the architecture, got the first interactions running. On paper it held up: lighter binary than Electron, frontend in whichever stack you want, Rust backend for the sensitive logic, cross-platform distribution with minimal friction. But as I went deeper, two things sharpened up.

First, the technical blocker. The webview API surface is too narrow for the kind of low-level keyboard, IME, and input grabbing a multiplexer needs. Everything you take for granted in a native terminal (precise key chords without a JS intermediary, clean Asian IME, focus management between panes without a round-trip to the webview) becomes a workaround or a hole. For an editor or a typical productivity app, those trade-offs are acceptable. For a terminal that has to absorb every keystroke at sub-frame latency, they are not.

Second, and this is what really decided it: I wanted to build something new, not wrap a webview with one more layer on top. When Zed started, they could have built on Electron, on Tauri, on GTK or Qt. They chose to write their own framework, GPUI, because no existing option could deliver what they wanted for code editors: dense GPU-accelerated text rendering at 120 fps with sub-frame keystroke-to-pixel latency. The result speaks for itself.

Zed's bet for IDEs applies word for word to terminals. Not wrap, not lean on a portability layer that flattens every feature to the lowest common denominator, but build on a foundation designed specifically for dense, native, low-latency text rendering. The difference with Zed is that I did not have to write the framework: they had already done it. GPUI owns text rendering end-to-end: shaping, atlasing, GPU draw, the lot. There is no "bring your own text path," there is no webview to work around. Adopting GPUI meant choosing the "dedicated native framework" approach over "webview with a Rust shell." Architectural ambition over ecosystem ease. That was the decision.

The GPUI mental model

A two-paragraph briefing so the rest of this post reads cleanly. Mutable state in GPUI lives in Entity<T>. You mutate through a Context<T>, you signal observers with cx.notify(), you spawn async work with cx.spawn(). Views implement Render and return a div() tree built with a Tailwind-shaped builder API. Every observable thing in Paneflow (the app root, the cursor blink phase, every terminal view, every pane, the sidebar) is an Entity<T>.

Keyboard input flows through GPUI's actions! macro, which generates zero-sized typed structs in the paneflow namespace (SplitHorizontally, LayoutTiled, UndoClosePane, ...) that the framework dispatches through the focus chain. Adding a keybinding is one struct, one handler, one bind. GPUI's repaint is diff-based and dependency-tracked at observation time, so a cx.notify() only invalidates the rects that subscribed. That's the whole surface area you need to follow what comes next.

Plugging in alacritty_terminal

The terminal grid itself is alacritty_terminal = "0.26" from crates.io (src-app/Cargo.toml:26). I migrated off Zed's internal fork as soon as 0.26 was released. Staying on a vendored fork of someone else's editor's fork of alacritty was not a sustainable path.

The shared state is unsurprising: Arc<FairMutex<Term<ZedListener>>>. ZedListener is a thin newtype wrapping a futures UnboundedSender<AlacEvent>. alacritty's Term calls listener.send_event(event) whenever the grid mutates; the receiver lives on the GPUI main thread.

What is surprising is that Paneflow does not use alacritty's EventLoop::spawn(). That helper is convenient for embedded-terminal-in-an-editor cases, but a multiplexer wants finer control over OSC scanning, synchronized output, and shutdown. Instead, every terminal session runs two hand-rolled detached threads (src-app/src/terminal/pty_loops.rs):

  1. pty_reader_loop reads PTY bytes into a 4096-byte buffer, runs OSC scanners (OSC 7 for CWD, OSC 133 for prompt boundaries, XTVersion for shell identification), advances the VTE Processor against the locked Term, and emits a single wakeup when bytes were processed outside a DEC 2026 synchronized-output window.
  2. pty_message_loop receives Msg from a PtyNotifier over std::sync::mpsc and writes input or resize commands back to the PTY master.

The reader's hot path is roughly this:

let mut term = term.lock();
processor.advance(&mut *term, &buf[..n]);
drop(term);
if processor.sync_bytes_count() < n {
    listener.send_event(AlacEvent::Wakeup);
}
Enter fullscreen mode Exit fullscreen mode

The check on sync_bytes_count() is the synchronized-output coalesce: a TUI like neovim or btop announces "I'm about to draw a frame, hold the wakeup until I'm done," and we honor it. Without that check, a busy TUI would generate hundreds of wakeups per logical frame and the GPU would render them all.

On the GPUI side, the TerminalView's cx.spawn event loop drains the wakeup channel for up to 4 ms (max 100 events), then issues one cx.update() and one cx.notify(). That's the keystroke-to-pixel path: shell writes, reader thread, VTE, wakeup, 4 ms batcher, cx.notify(), GPUI diff repaint, atlas draw. There's a PANEFLOW_LATENCY_PROBE=1 env var on debug builds that traces this path end-to-end, which is the right tool when you suspect the batcher is doing something wrong.

Detecting dev servers from /proc/net/tcp

One of the things I wanted from the start was a sidebar that knows what's running in each workspace. If a pane is hosting a Vite dev server on port 5173, I want a one-click open. If a Next.js app is on 3000, same. The naive solution is to regex the terminal output: catch "vite dev: http://localhost:5173" when the framework prints it. That works half the time. The other half, the line scrolls off, or the framework prints to stderr in a way the OSC scanner missed, or the user piped it through tee and the announcement never came back.

So Paneflow does both. src-app/src/terminal/service_detector.rs regex-matches a list of 22 framework signatures against the terminal output as a fast, immediate signal:

const FRAMEWORKS: &[(&str, &str, bool)] = &[
    ("next.js", "Next.js", true),
    ("turbopack", "Next.js", true),
    ("vite", "Vite", true),
    ("nuxt", "Nuxt", true),
    ("remix", "Remix", true),
    ("astro", "Astro", true),
    ("webpack-dev-server", "Webpack", true),
    ("uvicorn", "uvicorn", false),
    ("flask", "Flask", false),
    ("axum", "Axum", false),
    // ...
];
Enter fullscreen mode Exit fullscreen mode

The third tuple element is is_frontend: frontend frameworks get a clickable URL in the sidebar; backend ones get a status badge.

The ground truth, though, comes from the kernel. src-app/src/workspace/ports.rs walks the PID tree of every workspace and queries listening sockets directly. On Linux, that means parsing /proc/net/tcp and /proc/net/tcp6:

#[cfg(target_os = "linux")]
pub fn detect_ports(pids: &[u32]) -> Vec<u16> {
    let mut all_pids = HashSet::new();
    for &pid in pids {
        for descendant in collect_descendant_pids(pid) {
            all_pids.insert(descendant);
        }
    }
    let owned_inodes = collect_socket_inodes(&all_pids);
    let mut ports = Vec::new();
    for path in &["/proc/net/tcp", "/proc/net/tcp6"] {
        if let Ok(content) = read_capped(path, 256 * 1024) {
            for line in content.lines().skip(1) {
                let fields: Vec<&str> = line.split_whitespace().collect();
                if fields[3] != "0A" { continue; } // 0A = TCP_LISTEN
                if let Some(port_hex) = fields[1].split(':').next_back()
                    && let Ok(port) = u16::from_str_radix(port_hex, 16)
                    && let Ok(inode) = fields[9].parse::<u64>()
                    && owned_inodes.contains(&inode)
                {
                    ports.push(port);
                }
            }
        }
    }
    ports.sort_unstable();
    ports.dedup();
    ports
}
Enter fullscreen mode Exit fullscreen mode

The walk is in two steps. First, collect_descendant_pids BFS-walks /proc/{pid}/task/{pid}/children to gather every descendant of the workspace's shell PID. Then collect_socket_inodes reads /proc/{pid}/fd/ for each PID, looking for symlinks of the shape socket:[<inode>], and accumulates the inode set. Finally, /proc/net/tcp is parsed line-by-line; column 3 is the TCP state (hex 0A is TCP_LISTEN), columns 1 and 9 are the local address and socket inode. We keep only ports whose inode is owned by our PID set, so we don't surface the SSH listener on port 22 just because someone happens to have a shell open.

macOS doesn't have /proc, so the branch there uses libproc:

#[cfg(target_os = "macos")]
pub fn detect_ports(pids: &[u32]) -> Vec<u16> {
    use libproc::libproc::file_info::{ListFDs, ProcFDType, pidfdinfo};
    use libproc::libproc::net_info::{SocketFDInfo, SocketInfoKind, TcpSIState};
    use libproc::libproc::proc_pid::listpidinfo;
    // ... for each pid in the workspace tree:
    //   listpidinfo::<ListFDs>(pid) to get the FD table
    //   filter to ProcFDType::Socket
    //   pidfdinfo::<SocketFDInfo>(pid, fd) to get the socket info
    //   keep entries where soi_kind == IPv4 and tcpsi_state == LISTEN
}
Enter fullscreen mode Exit fullscreen mode

Windows is currently a stub: detect_ports returns an empty Vec and the sidebar falls back to the regex path. The native Win32 API is GetExtendedTcpTable, which is on the roadmap.

The combination matters. Regex catches the immediate "Vite started" line within the second; /proc catches anything actually listening even when the announcement was lost. Both paths feed into the same sidebar entity via GPUI's EventEmitter, which fires ActivityBurst events on PTY activity rather than polling (see lesson #3 below).

The N-ary layout tree

My first attempt at panes used a binary SplitNode enum: Leaf | Split { direction, ratio, first, second }. It worked. It also produced terrible UX the moment you wanted three equal columns. You had to nest a split inside a split, and a 50/50 split-of-a-50/50 is a 25/75, not 33/33. Every preset became a special case. Every drag-resize on the outer divider produced visually inconsistent inner sizes.

I rewrote the tree as an N-ary structure. The relevant types live in src-app/src/layout/tree.rs:42-58:

pub struct LayoutChild {
    pub node: LayoutTree,
    pub ratio: Rc<Cell<f32>>,
    pub computed_size: Rc<Cell<f32>>,
}

pub enum LayoutTree {
    Leaf(Entity<Pane>),
    Container {
        direction: SplitDirection,
        children: Vec<LayoutChild>,
        drag: Rc<Cell<Option<DragState>>>,
        container_size: Rc<Cell<f32>>,
    },
}
Enter fullscreen mode Exit fullscreen mode

Three things to call out:

  • Vec<LayoutChild>, not (Box<Self>, Box<Self>). A container holds any number of children. Three columns is one container with three children, not a binary tree of containers. Drag-resizing across siblings becomes a single ratio rebalance, not a recursive walk.
  • Rc<Cell<f32>> for ratios. GPUI's render tree is single-threaded, and the layout body is rebuilt on every Render call. Putting ratios behind Rc<Cell<f32>> lets the render closure read the ratio while the drag handler writes it, with no Arc and no lock. This is one of the patterns I underestimated coming in: GPUI's "no Arc<Mutex<...>> for UI state" rule pushes you toward Rc<Cell<...>> and Rc<RefCell<...>> everywhere, and that's correct.
  • Constants are deliberately small. MIN_PANE_SIZE = 80.0 (tree.rs:62), DIVIDER_PX = 4.0 (tree.rs:60), max 32 panes per workspace, max 20 workspaces. The clamp on resize is dynamic, MIN_PANE_SIZE / container_size at drag time, not a fixed range.

Four presets ship today: even_h and even_v (built from the same from_panes_equal constructor), main_vertical (60% left, 40% stacked right), and tiled (the tmux algorithm: increment rows then cols alternately until rows*cols >= N, then fill row-by-row). Rendering emits GPUI flex divs with flex_basis(relative(ratio)) per child, and the divider is a 4 px element with a drag listener that rewrites two adjacent ratios in place.

Cross-platform PTY via portable-pty

Paneflow uses portable-pty = "0.8" (src-app/Cargo.toml:29). Its native_pty_system() resolves to ConPTY on Windows and openpty on Unix with no caller-side conditional compilation. The PtyBackend trait in src-app/src/pty.rs has exactly one production implementation, PortablePtyBackend, and the call site has zero #[cfg] guards. The two I/O thread loops in pty_loops.rs are pure std::io::Read/Write over the boxed handle.

Where platform-specific code does appear, it's at exactly two seams:

  • Process shutdown. Unix uses libc::kill(pid, SIGTERM) with a grace window before SIGKILL. Windows uses TerminateProcess and WaitForSingleObject from windows-sys. Both branches are #[cfg]-guarded in pty_session.rs.
  • CWD detection. Linux reads /proc/<pid>/cwd. macOS calls proc_pidinfo. The Windows branch returns None for now and falls back to OSC 7 emission from the shell prompt. Three small #[cfg(target_os = "...")] blocks; nothing leaks into the rest of the codebase.

That's the entire surface area of "this code knows what OS it's running on", fewer than 100 lines across the whole app. Everything else compiles unchanged on Linux, macOS, and Windows.

Agent orchestration via JSON-RPC

This is the part the rest of the article was building toward. Paneflow exposes a JSON-RPC 2.0 server on a local socket so external tools, scripts, and AI agents can drive the editor programmatically. Workspace navigation, sending text to panes, splitting surfaces, and a six-method ai.* namespace that lets agents publish their lifecycle events back to the UI: session_start, prompt_submit, tool_use, notification, stop, session_end.

Transport: the interprocess crate's local_socket module. On Unix it's a Unix domain socket at $XDG_RUNTIME_DIR/paneflow/paneflow.sock (with $TMPDIR fallback on macOS); on Windows it's a named pipe at \\.\pipe\paneflow. Same wire protocol (newline-delimited JSON-RPC 2.0), same Rust call sites, zero #[cfg] at the dispatch level. The socket is strictly local: no network surface, no port binding. Trust derives from filesystem mode 0600 set immediately after bind(), plus getsockopt(SO_PEERCRED) on Linux and LOCAL_PEERCRED on macOS: every accepted connection checks the peer's UID against the server's before any method dispatches. A mismatch returns JSON-RPC -32001 permission denied and closes the stream.

Architecturally, methods fall in two buckets. Stateless methods (system.ping, system.capabilities, system.identify) reply directly on the socket thread:

match method.as_str() {
    "system.ping" => json!({"jsonrpc": "2.0", "result": {"pong": true}, "id": id}),
    "system.identify" => json!({"jsonrpc": "2.0", "result": {
        "name": "Paneflow",
        "version": env!("CARGO_PKG_VERSION"),
        "protocol": "jsonrpc-2.0"
    }, "id": id}),
    _ => dispatch_to_gpui(&request_tx, method, params, id),
}
Enter fullscreen mode Exit fullscreen mode

Stateful methods (workspace.*, surface.*, ai.*) need the GPUI main thread because that's where all mutable state lives. The socket thread cannot touch Entity<T> directly. So dispatch_to_gpui wraps the request in an IpcRequest with a one-shot response channel, sends it across an mpsc queue, and blocks on the reply with a 5-second timeout:

fn dispatch_to_gpui(
    request_tx: &mpsc::Sender<IpcRequest>,
    method: String,
    params: Value,
    id: Value,
) -> Value {
    let (resp_tx, resp_rx) = mpsc::channel();
    let ipc_req = IpcRequest { method, params, _id: id.clone(), response_tx: resp_tx };
    if request_tx.send(ipc_req).is_err() {
        return json!({"jsonrpc": "2.0", "error": {"code": -32000, "message": "App shutting down"}, "id": id});
    }
    match resp_rx.recv_timeout(Duration::from_secs(5)) {
        Ok(result) => promote_response(result, id),
        Err(_) => json!({"jsonrpc": "2.0", "error": {"code": -32000, "message": "Timeout"}, "id": id}),
    }
}
Enter fullscreen mode Exit fullscreen mode

On the GPUI side, process_ipc_requests (in src-app/src/app/ipc_handler.rs) drains the receiver each tick, dispatches by method name, and sends the result back through the per-request response channel. Stateful handlers can return a structured JsonRpcError via a _jsonrpc_error sentinel value that promote_response rewrites into a proper JSON-RPC error envelope at the boundary, so handlers stay synchronous and don't have to construct envelopes themselves.

What this buys, concretely: a Claude Code session can announce itself with ai.session_start and the sidebar marks the pane as agent-active. A shell script can push commands into the active surface with surface.send_text. A test harness can spin up a workspace, run a battery of agent calls, and tear it down. The protocol is bytes on a socket; the security model is filesystem mode + peer credentials; the dispatch is a mpsc::channel between two threads.

Here's the smallest useful client you can write today, just socat and a JSON line:

echo '{"jsonrpc":"2.0","method":"surface.send_text","params":{"text":"ls\n"},"id":1}' \
  | socat - UNIX-CONNECT:$XDG_RUNTIME_DIR/paneflow/paneflow.sock
Enter fullscreen mode Exit fullscreen mode

The full method surface is documented in the README.

Four things I got wrong

1. I over-invested in Tauri. I pushed a Tauri implementation pretty far before admitting that the webview API surface could not carry a multiplexer's constraints (low-level input, IME, inter-pane focus). The right time to pivot was about two weeks before the moment I actually pivoted. Lesson: when a framework asks you to work around its own abstraction for two basic features, you're building against the framework, not with it. Pivot.

2. Block-character rendering gaps. Paneflow had a visible gap between adjacent block characters (U+2580 to U+259F) when rendered at certain font sizes. I spent six fix attempts on it: integer cell dimensions, origin rounding, shader edge math, shared-boundary subpixel adjustments. Every fix was plausible. None of them worked. The actual bug was that my block-character coverage table was incomplete: eleven codepoints (U+2594, U+2596 to U+259F) were missing, so they fell through to the default font path which has slight metric differences. The fix was extending the codepoint table, found by binary-scanning the binary of a TUI that triggered the bug. Lesson: when GPU quad math looks correct on every probe but the visual artifact persists, the problem is probably which codepoints you're emitting, not how.

3. Polling timers defeat cx.notify(). An early version of the sidebar polled for port-scan results every 500 ms and CWD changes every 2 s. That produced six to eight unnecessary repaints per second when nothing was happening. The fix was a full migration to GPUI's EventEmitter/cx.subscribe/cx.emit push model, with custom events like ActivityBurst and CwdChanged emitted from the PTY reader. Idle repaints went to zero. GPUI's diff repaint is cheap, but it isn't free, and a cx.notify() in a setInterval-shaped loop is exactly the pattern the framework was designed to replace.

4. Session persistence is table-stakes. I shipped the first internal build without session save/restore because the layout code was already complex and I wanted to land the v1 commit. The first thing I did with the build was open four workspaces, rebuild the binary, and lose all four. Every multiplexer user has tmux in their muscle memory. Restart-survives-state isn't a feature, it's the price of admission. Save it to ~/.cache/paneflow/session.json on CloseWindow, restore on startup, scope it into v1.

Try it

The repo is github.com/ArthurDEV44/paneflow, MIT licensed. Linux (Wayland and X11) and macOS (Apple Silicon) ship today as signed and notarized builds; native Windows is in flight, the code is largely ready, the signing infrastructure is being wired up. Prebuilt artifacts are on the download page; honest side-by-side comparisons vs cmux, WezTerm, iTerm2, and Warp at paneflow.dev/compare.

Issues, suggestions, and "your N-ary tree should really be a Z-tree" arguments are all welcome on the tracker. I'm especially curious about what's missing for your workflow versus the multiplexer you use today; that's the feedback that shapes the next release.

If this was useful, the engineering work I'm most proud of is in the terminal/element/ module, that's where glyph shaping, atlas blits, and the APCA adjustment pipeline all live, and it didn't fit in this post. I'll write that one up next.

Top comments (0)