DEV Community

Orhan Engin Okay
Orhan Engin Okay

Posted on

Why doesn't Windows have its iTerm2? I'm building D-Terminal.

Bringing the third-party terminal ecosystem Linux and macOS take for granted to Windows — multi-tab, multi-pane, themes, profiles, and an agent-aware AI layer on top. Tauri v2 + Vue 3 + Rust

If you write code on macOS, you have iTerm2, Warp, Alacritty, Kitty, WezTerm, Hyper. On Linux you have all of those, plus Tilix, Terminator, Foot, and the entire kitty-and-friends crowd. On Windows you have... Windows Terminal. Which is excellent — and basically alone in its category.

I noticed the gap from the inside, switching between machines. Every time I came back to Windows I missed the choice. Not the absence of any one feature — the absence of another option that was actually trying.

So I started building one.

D-Terminal is a third-party Windows terminal in Rust + Vue 3 on Tauri v2. v0.9.3 just shipped. The spine of the project is the same thing every other terminal in this category gets right — multi-tab, multi-pane, profile system, themes, real multitasking — and the one opinion I added on top is that it tries to be agent-aware, because half my workday now goes through Claude Code, Codex, and Aider.

This post is the architectural tour: what's in the box, what "agent-aware" means, and why I'm shipping ~5MB binaries instead of another 200MB Electron app.

What's in the box
The table-stakes layer first, because it's the actual reason the project exists. Windows users don't always realize how much of this Linux/macOS users get by default:

  1. Splits and tabs, per-tab independent split tree. Drag a pane title onto another pane's edge to rearrange. tmux-style zoom (Ctrl+Shift+Z) for full-pane focus. Inline rename with double-click. # group tags for color-coded pane grouping.
  2. Profile system: PowerShell / CMD / WSL out of the box, plus user-defined profiles for SSH hosts, Docker exec, pwsh 7, Python REPL — anything with shell + args + cwd + env + icon + color badge.
  3. 14 built-in themes with runtime swap, plus JSON for custom themes. Mica / Acrylic / None vibrancy switch on Win11 22H2+.
  4. xterm.js engine — WebGL renderer, Canvas/DOM fallback, OSC 7/8/133, sixel and iTerm2 inline image protocols, Unicode 11, smart links (file paths, git SHAs, IPs).
  5. Block-based command history (OSC 133): every command + output + exit code captured automatically, color-coded status, re-runnable, copy-able, sendable to AI in one click.
  6. Output triggers (iTerm2 parity): regex match → auto-action (toast, AI hand-off, snippet execution).
  7. Broadcast input across all panes (tmux sync-panes), command palette (Ctrl+Shift+P), Quake-mode hotkey (F1), bracketed-paste mode for safe multi-line pastes.
  8. Live system overlay (DFetch): CPU, RAM, disk, GPU, IP, with KVKK/GDPR-friendly hostname/IP masking by default. Snapshot + broadcast across panes.
  9. Per-pane git diff +/- chip in the title (OSC 7 cwd + git shortstat). Most of the year-one work was making sure these felt right. Ten panes is a normal day for me; broadcast input is one keystroke; layout restores on relaunch. The AI layer earns its place because the foundation is solid.

What "agent-aware" actually means

When I started running Claude Code, Codex, and Aider regularly, I noticed three friction points my terminal was making worse, not better:

  • I couldn't tell at a glance which pane needed me. A waiting tool-approval prompt looks identical to a finished run. I'd alt-tab away, come back, and find an agent had been blocked for ten minutes.

  • Cost was invisible until it wasn't. Token spend was happening across 4–5 panes in parallel. By the time I checked a dashboard, I'd already burned the budget.

  • Output was fungible. Agent stdout, my own commands, build logs — all the same scrollback. Hard to revisit a tool call from twenty minutes ago.
    So D-Terminal maintains a per-pane state machine about the agent, not just about the PTY:

running — agent is producing output
waiting — agent has paused, typically for a tool-approval prompt
interrupted — user broke the loop

Each pane gets a small ambient badge in its title — current state, live token count, running cost. No dashboard, no polling, no leaving the terminal to check.

Three layers under the hood:

1. Detection
The terminal heuristically detects when something agent-shaped is running in a pane: process info, command-line patterns, and characteristic output signatures from Claude Code, Codex, Aider, and Cursor. When detection fires, the pane flips into agent mode.

2. OSC 9999
OSC sequences (Operating System Commands) are how terminals have always exchanged structured metadata with the programs running inside them. OSC 7 reports the working directory. OSC 8 makes hyperlinks clickable. OSC 133 brackets command + output + exit code so a terminal can do block-based history.

OSC 9999 is the protocol I added on top — a vendored channel where agent state, tool calls, completed steps, and pending approvals can be carried out-of-band, invisible to the user but fully readable to the terminal:

\x1b]9999;<event>;<key>=<value>;...\x1b\\
Enter fullscreen mode Exit fullscreen mode

Events include lifecycle markers (agent.start, agent.tool.request, agent.tool.complete, agent.wait, agent.end) and a token-accounting payload. The terminal parses; the user never sees the bytes.

For agents that don't emit OSC 9999 natively — currently all of them — D-Terminal falls back to heuristic parsing of stdout. The detector is tier-2; OSC 9999 is the door I'm leaving open for agents to opt in.

3. UI, including auto-split
Once a pane has agent state, the rest is presentation: a colored chip in the pane title, a live cost badge in the status bar, a soft pulse when a pane goes into waiting. Parallel batches — multiple agents launched at once — get auto-split so each lane lives in its own visual track.

That last one is small but it's the feature I missed most before I had it. Watching four agents share one scrollback feed is the developer-tools equivalent of trying to read four overlapping TTS streams.

Why a new app, not a Windows Terminal extension

Two answers.
The ecosystem one. Windows Terminal is good, but it's also alone in its category. Linux and macOS users have a competitive market of terminals each with their own opinion — that competition produces ideas. Windows hasn't had that pressure. Adding a second actively-developed third-party option is itself the point. Even if D-Terminal stayed strictly at parity with Windows Terminal feature-for-feature, having two is healthier than having one.

The technical one. Windows Terminal's extension model is JSON profile + a few hooks. There's no path to render arbitrary chips and badges in the pane title, maintain per-pane state machines outside of the PTY, inject a popover for "send this output block to AI," or run an HTTP proxy for AI providers inside the host process. An agent-aware terminal isn't a config-level concern — it's a frame around the terminal, and the frame has to be its own application.

The architecture, briefly

I didn't want to ship a 200MB Electron app for a terminal. So:
Tauri v2 as the host. Rust core, system WebView2 for the UI, no bundled Chromium. Result: ~5MB binary, ~100MB RAM at rest. Roughly 5× lighter than the Electron equivalent.

node-pty sidecar for PTY work, because writing a ConPTY bridge in Rust well is its own quarter. The sidecar is a pkg-bundled standalone executable — users don't need Node.js installed. IPC between Tauri and the sidecar is length-prefixed binary, with a heartbeat: if the Tauri process dies, the sidecar exits, no zombies.

Vue 3 + xterm.js for the UI layer. WebGL renderer with a Canvas/DOM fallback chain. xterm.js handles the standard OSC sequences out of the box; OSC 9999 is custom parsing on top.

Windows DPAPI for AI provider keys. The frontend never holds a plaintext key. When you call Anthropic or OpenAI, the request goes Frontend → Rust → DPAPI unwrap → outbound HTTP. XSS in the WebView2 layer doesn't leak credentials, because the WebView2 layer never had them.

ARM64 + x64 dual-architecture builds. Surface Pro X and Snapdragon laptops are real machines real people use, and shipping x64-only on Windows in 2026 feels like an anachronism.

That's the load-bearing 20%. The remaining 80% is what was already in "What's in the box," plus the AI-side niceties — 4 cloud providers (Anthropic, OpenAI, Gemini, plus a custom OpenAI-compatible endpoint), 5 local runtimes (Ollama, LM Studio, Jan, Text Generation WebUI, llama.cpp server), and a natural-language command generator (# on an empty prompt → the AI writes the shell command for you).

What's still rough

Pre-alpha for reasons:

  • Code signing isn't done. SignPath FOSS application is in flight. Until it lands, Windows SmartScreen warns "Unknown publisher" on the NSIS installer, and Microsoft Defender ML occasionally false-flags the unsigned x64 NSIS as Wacatac.B!ml. The MSI installers come up clean (verified against Kaspersky + VirusTotal sandbox, all major engines pass). Workaround: prefer the MSI for now.

  • unsafe-eval is still in CSP because of vue-i18n 9. The v1.0.5 milestone migrates to vue-i18n 11 and drops it.

  • Agent OSC 9999 is one-directional today — the terminal observes, but doesn't yet broadcast back to the agent. The bidirectional handshake is on the v1.1 path.

  • Microsoft Store submission is the v1.0 line in the sand. MSIX packaging and identity migration are mid-flight.

Try it, contribute, what I'm working on

D-Terminal v0.9.3 ships for Windows 10 1809+ and Windows 11, ARM64 and x64, MSI and NSIS. MIT, single-maintainer.

→ github.com/AmrasElessar/d-terminal

Two lanes are open for community contribution: language packs (30+ stub locale files waiting for translators) and themes.

Architecture and feature PRs are intentionally closed for now — I'm holding the spine of the project tightly through v1.0.

If "Windows deserves a real third-party terminal market" is a thing you also believe, the issues tab is the right place to push back, file bugs, or suggest themes. And if you build agents and want to emit OSC 9999 natively so terminals like this can read you cleanly without heuristics — let's talk. The protocol is small enough to fit in a gist, and it could drop into any framework that already speaks ANSI.

The chip in pane 2 currently reads running · 14.2K tokens · $0.21, so I should probably go see what Claude Code is up to.

Top comments (0)