DEV Community: Karl Wirth

Codex vs Claude Code (2026): A Real Head-to-Head

Karl Wirth — Sun, 26 Jul 2026 21:00:00 +0000

The "Codex vs Claude Code" debate usually gets framed as a model fight: Anthropic's Claude Opus 4.8 and Sonnet 4.6 against OpenAI's codex-tuned gpt-5.5. On most coding benchmarks the two sit within a few points of each other, and the lead shifts every few weeks. So the model question is mostly settled, or at least unsettled in a way no blog post can resolve.

What actually separates them in 2026 is working style and the harness around the model: how you start a session, how you review diffs, how you run several agents at once, how you plan work, and how you keep multi-day context. This guide does both halves. First a straight head-to-head on the two agents, then the harness comparison that decides how much you get out of either one.

Codex vs Claude Code at a glance

Dimension	OpenAI Codex	Claude Code
Default model (2026)	gpt-5.5, codex-tuned	Claude Opus 4.8 and Sonnet 4.6
Where it runs	CLI, IDE extension, desktop app, cloud, ChatGPT mobile follow-up	CLI, Claude desktop app, VS Code extension, Remote Control
Working style	Supervised passes with explicit approval modes	Long autonomous repo runs with subagents
Diff review	Inline in the CLI; richer in the app	Inline in the CLI; richer in the desktop app
Mobile	Follow up on runs from the ChatGPT app	Remote Control mirror; no native review app
Platform	macOS and Windows app; CLI on Linux	macOS, Windows, and Linux
Openness	CLI open source (Apache 2.0); app and cloud closed	Closed source
Pricing shape	Bundled with ChatGPT plans; usage-based	Anthropic subscription or API credits; usage-based
Context file	AGENTS.md	CLAUDE.md

The short version: Codex spreads across more surfaces and makes the supervised, approval-gated path explicit, which suits careful work where you want to see each step. Claude Code feels most natural for long, low-interruption runs inside a repo, and it runs on Linux as a first-class target. The benchmarks are close enough that this difference in rhythm matters more than the raw model.

Which should you pick

Pick Codex if you want explicit approval modes and supervised passes, you already live in the ChatGPT and OpenAI ecosystem, and you like being able to follow a run from your phone.

Pick Claude Code if you want long autonomous terminal sessions, you are standardized on Claude, you lean on first-party skills, hooks, and subagents, and you want official Linux support.

Run both if your choice is really task-shaped rather than fixed. One job wants Codex for a careful, reviewed pass; the next wants Claude Code for a larger delegated run. That is common, and it is the case the rest of this guide is about, because once you want both, the harness matters more than either agent on its own.

What "harness" actually means

A harness is the tooling between you and the model. It includes:

Session entry point. CLI, native app, IDE extension, web console, mobile.
Diff review. Inline terminal text, file-by-file visual diffs, or PR-style review.
Session management. One window at a time, tabs, kanban, or full orchestration.
Planning layer. None, a markdown file, or a structured plan document the agent reads from.
Memory and context files. AGENTS.md for Codex, CLAUDE.md for Claude Code, plus any repo conventions the harness loads automatically.
Parallelism. Running one task end to end or fanning out work to several agents.

Same model, different harness, different output velocity. That gap is the whole bet.

The Codex harness options

1. Codex CLI

The default. The CLI is open source under Apache 2.0, sandboxes commands by default, supports approval modes, and reads AGENTS.md for context. It is fast, scriptable, and the most direct way to talk to Codex.

What you get:

A single session in a single terminal window
Configurable sandbox and approval policy
Solid GitHub integration through gh
AGENTS.md for project context

What you do not get: visual diff review, parallel session management, a planning surface, or anything to look at while the agent is running.

2. Official Codex Desktop App

OpenAI's native desktop app is no longer a thin CLI wrapper. It runs multiple Codex agents in parallel, organizes work by projects and threads, includes worktree support, and lets you review diffs and comment on changes. It is available on macOS and Windows, not Linux.

The strength: it is the cleanest first-party experience and inherits everything from the CLI, including AGENTS.md and approval policies.

The trade-off: it is Codex only, with no support for Claude Code or other engines. If you ever want to run a parallel agent on a different model, you leave the app.

3. Codex in ChatGPT and the cloud

Codex inside ChatGPT lets you delegate tasks to cloud sandboxes, then review the resulting PRs in GitHub. Useful for fire-and-forget scaffolding, less useful for interactive work where you want to steer the agent.

4. Third-party Codex GUIs

A small set of third-party tools wrap Codex in a richer workspace. The two worth knowing:

CodexMonitor (open source, MIT, Tauri): multi-workspace and multi-thread Codex management with worktrees and built-in diff stats.
Nimbalyst (open source, MIT desktop): a visual workspace for Codex with parallel sessions, file-by-file visual diff review, markdown and mockup editors, planning documents, and an iOS companion. Also runs Claude Code as a peer engine.

The Claude Code harness options

1. Claude Code CLI

Anthropic's official terminal binary. Reads CLAUDE.md, supports --allowedTools for approval, can spawn subagents, and resumes sessions with claude -c and claude -r. Like the Codex CLI, it is fast and direct, with no visual layer.

2. Claude Code in Claude Desktop and VS Code

Anthropic ships Claude Code inside the Claude desktop app and as a VS Code extension. Both are first-party and well integrated, and both are still single-session in shape. The same desktop app also hosts Claude Cowork, a separate agent aimed at non-technical knowledge work rather than coding, so do not confuse the two when you are choosing a coding harness.

3. Third-party Claude Code GUIs

The third-party Claude Code ecosystem is broader than Codex's:

Opcode (formerly Claudia): desktop GUI with checkpoints and timeline.
Claude Squad: tmux-plus-worktrees terminal multiplexer.
Nimbalyst: visual workspace with kanban sessions, optional one-click worktrees per session, visual diff review across markdown and code, planning documents, and the same Codex-side support as above.

Harness-vs-harness, head to head

This is the comparison that decides day-to-day velocity, because it is where the work actually lives.

Capability	Codex CLI	Codex App	Claude Code CLI	Claude Code Desktop	Nimbalyst
Engines	Codex	Codex	Claude Code	Claude Code	Codex + Claude Code
Session entry	Terminal	Native chat	Terminal	Native chat	Visual workspace
Parallel sessions	Manual (tmux)	Built in	Manual (tmux)	Limited	Kanban with 6+
Visual diff review	No	Yes (chat-style)	No	Yes (chat-style)	File-by-file inline
Markdown WYSIWYG	No	No	No	No	Yes
Mockups, diagrams, data models	No	No	No	No	Yes
Planning documents	None	None	None	None	Built in
Git worktree per session	Manual	Built in	Manual	Manual	Optional one-click
Mobile app	None	ChatGPT follow-up	None	Remote Control mirror	iOS companion
Linux support	Yes	No	Yes	Yes	Yes
Open source	Apache 2.0	No	No	No	MIT

The pattern: the official harnesses are good at single-engine, single-session work. The visual-workspace harnesses are good at multi-engine, multi-session work with structured review.

When each harness wins

The Codex CLI wins when you want speed and scriptability. Quick scaffolds, CI-driven runs, and headless automation belong in the CLI. The harness is "as little as possible," which is the right call for those jobs.

The official Codex desktop app wins if you live entirely in OpenAI's ecosystem, you are on macOS or Windows, and one Codex agent at a time is enough. It is the cleanest first-party experience, and it inherits worktrees and projects out of the box.

Claude Code Desktop and the VS Code extension win when you want the official Anthropic experience for Claude Code specifically, with first-party skills, hooks, and subagents, plus Linux support on the official tool.

A visual workspace harness wins when:

You want to run Codex and Claude Code in the same project
You want parallel sessions on a kanban with status visible at a glance
You want to review every change inline, file by file, before it lands
You want planning documents the agent reads from and writes back to
You want a mobile app that is more than a remote-desktop mirror
You are on Linux, where the official Codex app does not run

That slot is what Nimbalyst is built for. Same Codex engine, same Claude Code engine, more workflow around them, with the desktop and iOS apps MIT licensed and free for individual use.

The honest verdict

If your work is mostly single-session and your model preference is fixed, the official harnesses are excellent. They are well built, they are first-party, and they keep getting better.

If your work is parallel, multi-engine, or visual, the harness gap is where the productivity sits. Picking Codex over Claude Code or the reverse matters less than picking a harness that lets you run several agents at once, review their work without scrolling terminal output, and plan the next move in the same place you executed the last one.

The Codex vs Claude Code question is real. The harness question is bigger. If you want to stop choosing and run both, Nimbalyst is free to download.

FAQ

Is Codex better than Claude Code?

Neither wins outright in 2026. On most coding benchmarks Codex (gpt-5.5) and Claude Code (Opus 4.8, Sonnet 4.6) land within a few points, and the lead shifts every few weeks. The larger difference is working style: Codex makes supervised, approval-gated passes explicit, while Claude Code feels more natural for long autonomous runs. Many developers keep both and pick per task.

Codex vs Claude Code: which is cheaper?

Both are usage-based rather than flat per-seat, so cost tracks how much you run. Codex comes bundled with ChatGPT plans, and Claude Code bills through an Anthropic subscription or API credits with model access varying by plan. Confirm current pricing on each vendor's site, since both change often.

Can I use Codex and Claude Code together?

Yes. They keep separate session contexts, so the practical way to combine them is one workspace where both read the same project files, plans, and trackers. A visual workspace like Nimbalyst runs Claude Code and Codex side by side, each with its own transcript and git worktree.

Codex vs Claude Code for long autonomous runs?

Claude Code still feels more native for sustained, low-interruption repo work, especially for teams that lean on subagents and longer delegated runs. Codex can run long tasks too, particularly through its app and cloud flows, but its approval modes nudge toward a more supervised rhythm.

Does Codex have a mobile advantage over Claude Code?

Codex work can be followed from the ChatGPT mobile app. Claude Code has Remote Control to mirror a session, but no equivalent native review surface. If mobile review matters, a workspace with a dedicated mobile app covers both agents.

Codex vs Claude Code on Linux?

Codex CLI runs on Linux, but OpenAI's Codex desktop app ships for macOS and Windows only. Claude Code runs on macOS, Windows, and Linux. On Linux you either stay in the Codex CLI or use a cross-platform workspace for a visual layer.

Codex vs Claude Code: which should I pick in 2026?

Pick Codex if you want explicit approval modes and supervised passes across the CLI, IDE, cloud, and app. Pick Claude Code if you want long autonomous terminal runs and you are standardized on Claude. When the choice is genuinely task-dependent, run both in one workspace instead of committing to a single harness.

Why we put Obsidian, Linear, Terminal, Codex app, and Conductor in one workspace

Karl Wirth — Sun, 26 Jul 2026 19:00:00 +0000

For the last year, my daily stack for working with coding agents has looked something like this. Plans and specs in Obsidian. Diagrams in Excalidraw. Tasks and bugs in Linear. Agent sessions in the Codex app and Claude Cowork app, Conductor for parallel Claude Code, and a terminal when I want raw control. Code in VS Code. Diff review in the terminal or on GitHub.

The integration tax

When I start a session, I want to point the agent at the plan I wrote yesterday, the mockup I sketched, and the task it's executing against. Today that means copying file paths, pasting context, and hoping the agent picks up the right thing. The agent can reach files via MCP, but it doesn't know that this plan in Obsidian is connected to that task in Linear, or that the session running in Conductor is the one that needs the mockup attached.

When the agent finishes, I want to review its changes, mark the task as done, and update the plan with what we learned. Today that's three tools and four context switches. The agent transcript lives in one place, the diff lives in another, the task in a third, the plan in a fourth.

The result is that I spend more time moving artifacts and state between apps than I do actually directing the work. The agent runs with a fraction of the context, because most of what would help it live outside its reach.

What integration actually buys you

Putting the planning doc, the diagram, the task, the session, and the diff in one workspace changes the shape of the work in a few specific ways.

Sessions, files, and tasks become linked. Open a session and you can see which files it touched, which tasks it ran against, and which plan it executed. Open a file and you can see which sessions changed it. Open a task and you can see the session that did the work.

Context flows to the agent automatically. The agent already has access to the same files, plans, diagrams, and tasks you do, because everything lives in the same workspace. You don't paste a path. You don't repeat yourself across tools. The integration is the context.

Review happens where the work is. When an agent changes a file, red and green diffs show up inline with per-block accept and reject. The surface where you read and write is also the surface where you review, so there is no app switch from editing to reviewing.

Project knowledge stays connected. A month later, when you're trying to remember why something was built a certain way, you can follow plan to task to session to diff to commit, with each piece pointing to the others, instead of reconstructing it from memory.

A few concrete examples

A spec lives next to its mockup, its data model, and its task. When you hand the task to an agent, the agent already has the spec, the mockup, and the model as context because they're in the same workspace.

A bug report is linked to the session that's fixing it. When the agent proposes a fix, the diff shows up inline. You accept the parts you like, ask for changes on the rest, and the bug moves to review without leaving the app.

A planning conversation from three weeks ago is searchable. You find the session, see the files it touched, open the resulting plan, and pick up where you left off. Nothing was lost to a closed terminal tab.

Three sessions run in parallel on a kanban board. You see what each one is doing, what's waiting on review, and what's blocked. The work that used to require tabbing through terminals fits on one screen.

What Nimbalyst is

Nimbalyst is the workspace we built around this idea. Visual editors for markdown, mockups, diagrams, data models, spreadsheets, and code. A session manager with kanban, parallel sessions, search, and resume. A task tracker where agents and humans both create, update, and execute tasks linked to files and sessions. Diff review with per-block accept and reject. Developer tools like a terminal, worktrees, visual git, and agent-driven commits and branches. An iOS app for managing sessions away from the desk.

It supports Codex, Claude Code, OpenCode (alpha), and Copilot (alpha) through a pluggable provider system, so you're not locked into one agent. The desktop and iOS apps are MIT licensed.

If you're running multiple agents across multiple projects today, you've probably felt the same friction we did. We'd love your feedback and contributions.

Repo: github.com/nimbalyst/nimbalyst
Website: nimbalyst.com

How to Use OpenAI Codex with a Visual Workspace

Karl Wirth — Sun, 26 Jul 2026 17:00:00 +0000

OpenAI's Codex CLI is fast and powerful out of the box, but the terminal is a thin surface for real work. Once you are reviewing more than a handful of diffs a day, planning multi-step features, or running more than one Codex session at a time, you start to want a visual workspace around the agent.

This guide walks through how to use OpenAI Codex with a visual workspace from setup to a full review-and-ship loop. The walkthrough uses Nimbalyst, an open-source workspace that runs Codex as a first-class engine, but most of the patterns translate to any Codex GUI.

What you will be able to do at the end

Run Codex sessions inside a visual editor instead of a terminal
Edit AGENTS.md in WYSIWYG markdown with inline AI diffs
Review every Codex change file by file, accept or reject inline
Run several Codex sessions in parallel, each on its own branch
Plan features in markdown documents the agent reads from
Monitor sessions from your phone while you are away from the laptop

Step 1. Install the Codex CLI

The visual workspace runs the actual Codex CLI under the hood, so install it first.

npm install -g @openai/codex
codex --version

Authenticate with your OpenAI API key or sign in with your ChatGPT plan.

codex login

Verify Codex works in a small repo before adding the visual layer:

cd ~/some-test-repo
codex

If you get a working session, the engine is fine. Now you can wrap it.

Step 2. Install the visual workspace

Download Nimbalyst for macOS, Windows, or Linux from the download page, or build from source on GitHub. The desktop app is MIT licensed and free for individual use.

After installing, open Nimbalyst and add your project folder. The workspace detects your AGENTS.md, your .git directory, and any existing Codex configuration automatically.

Step 3. Write a real AGENTS.md

The biggest single quality jump for Codex sessions comes from a well-written AGENTS.md. This is the file Codex reads at the start of every session. Treat it like onboarding documentation for a new engineer.

In the workspace, open AGENTS.md in the WYSIWYG markdown editor. Write something like:

# AGENTS.md

## Project context
This is a TypeScript Astro site deployed to Cloudflare Pages.
Content lives in src/content/, components in src/components/.

## Conventions
- Tailwind utility classes only, no inline style attributes
- All copy lives in src/data/*.yaml, not in component files
- Run `npm run build` before declaring a task done

## Commands
- `npm run dev` for local preview (do not run, the user does that)
- `npm run build` for production build with Pagefind indexing

## Out of scope
- Do not modify .env or wrangler.toml
- Do not add client-side JS frameworks

In a visual editor you can ask Codex itself to draft or refine AGENTS.md. Every change is shown as an inline red and green diff before it lands, so you keep editorial control without leaving the document.

Step 4. Start a Codex session in the workspace

In Nimbalyst, open a new session and pick Codex as the engine. Type a prompt the way you would in the terminal:

Add a `Last updated` line under each blog post title using the post's
`updated` frontmatter field if present, falling back to `date`.

The session opens with a chat panel on one side and your file tree on the other. As Codex works, files it touches appear in a per-session "files edited" sidebar. No more searching git status to figure out what changed.

Step 5. Review every diff visually

This is the part that makes the visual workspace pay for itself.

When Codex finishes (or pauses to ask a question), open the diff viewer. Every file Codex touched appears as a red-and-green inline diff. You can:

Accept a file's changes whole
Reject changes you do not like
Edit a change inline and re-run the agent against the corrected file

For markdown changes, the diff is shown in the rendered document, not the raw markdown. For mockups or diagrams, the diff is visual: you see the actual UI or graph change, not JSON. For data models, you see schema changes drawn out, not raw Prisma.

This is the workflow Codex's terminal output does not give you. It is the difference between "trust and merge" and "review and ship."

Step 6. Run a second Codex session in parallel

Open a new session. Pick Codex again. Optionally enable "git worktree" to run this session on its own checkout in a sibling directory, isolated from the first.

Now you have two Codex agents running in the same project, each on its own branch, each visible on the kanban board with status, current file, and last action. Repeat for as many sessions as you want.

A common pattern:

Session A: large refactor on a feature branch
Session B: small bug fix on a hotfix branch
Session C: scaffolding a new component on its own branch
Session D: writing tests for the merged refactor

The kanban shows you which sessions need input, which finished, and which crashed.

Step 7. Plan features in markdown the agent reads from

For anything bigger than a one-shot prompt, write a plan document. In the workspace, create plans/feature-name.md and structure it:

# Feature: User profile page

## Goal
Add a profile page at /u/[username] showing the user's avatar, bio, and last 10 sessions.

## Constraints
- Must use existing auth middleware
- Must work with our existing Prisma schema, no migrations
- Must pass type check and tests before merge

## Tasks
- [ ] Add the route
- [ ] Add the data loader
- [ ] Add the component
- [ ] Add the test

Then start a Codex session and tell it to "execute the plan in plans/user-profile.md, marking each task as you finish it." Codex updates the plan in place as it works, and the diff viewer shows the checkbox flips and any new sub-tasks the agent added.

Step 8. Monitor from your phone

If your workspace ships a mobile companion (Nimbalyst has an iOS app), pair it with your desktop. From your phone you can:

See which Codex sessions are running, paused, or done
Review diffs and accept or reject changes
Answer Codex prompts when an agent needs input
Start new sessions

The point is not "code from the toilet." The point is that an agent that finishes while you are in a meeting does not have to sit idle until you are back at your laptop.

Common questions

Do I still need the Codex CLI installed? Yes. The visual workspace runs the real CLI under the hood. Your existing CLI auth, AGENTS.md, and approval policies all still apply.

Does this work with the official Codex desktop app? They are different tools. The official app is a clean Codex-only chat GUI. Nimbalyst is a multi-engine visual workspace that adds editors, diff review, planning, and Linux support. You can use both if you want.

What about Claude Code? The same workspace runs Claude Code. Pick whichever engine fits the task: Codex for fast scaffolding, Claude Code for deep refactors. See Claude Code vs Codex CLI: when to use which.

Is it free? The Nimbalyst desktop app is free for individual use and open source under MIT. You bring your own OpenAI API key for Codex.

Where to go next

Best Tools for Agentic Coding in 2026

Karl Wirth — Sun, 26 Jul 2026 15:00:00 +0000

Agentic coding stopped being a buzzword roughly a year ago and is now the way a growing number of working developers actually ship code. The tools have multiplied accordingly. Picking the right ones, and the right combination of them, matters more than it used to.

This is a working tour of the agentic coding tool landscape in May 2026. It is opinionated and based on running these tools daily on real product work. I build one of the tools listed, Nimbalyst, and I will say so when we get there. The rest of the comparisons are based on actual use.

Four layers, not one list

The biggest mistake when picking agentic coding tools is treating them as one category. There are at least four layers, and the best setups use one tool from each.

The agent itself. The thing that reads, plans, and writes code.
The IDE or editor layer. Where the human sits when they want to read or change a single file.
The workspace layer. Where multiple agents, multiple sessions, and the non-code artifacts (notes, tasks, mockups, decisions) live.
The harness. The instructions, rules, and tools that wrap the agent and make it good at your specific codebase.

The list below is organized by layer.

Layer 1: The agents

Claude Code (Anthropic). Still the most capable general-purpose coding agent for long, multi-step work in a real repository. Strong at planning, careful with risky changes, good at honoring instruction files. Best at sustained work where context and reasoning matter more than raw speed.

OpenAI Codex. Now a serious peer to Claude Code for day-to-day coding work. OpenAI supports both cloud and local Codex flows, plus MCP connectivity. In practice it is often faster and more aggressive, particularly on well-scoped changes. For the full three-way breakdown, see OpenCode vs Codex vs Claude Code.

Gemini Code Assist. The third serious option. Particularly relevant if you already live in Google Cloud, VS Code, or JetBrains. Worth keeping in the mix even if the center of gravity in agentic coding still feels more Claude-and-Codex shaped.

Open-weights agents (Llama, Qwen, DeepSeek through local runtimes). Quality is climbing fast. Practical for cost-sensitive bulk work and for shops with hard data-residency rules. Not yet a frontier-tier replacement for Claude or Codex on complex tasks, but the gap is shrinking. The open-source, model-agnostic OpenCode agent is a common way to drive these local models through one harness.

The right answer is not picking one. The teams getting the most out of agentic coding are routing different kinds of work to different agents.

Layer 2: The IDE / editor layer

Cursor. Cursor 3 (April 2026) is built around multi-agent work, but it still feels strongest when the developer is the driver for file-level work, review, and handoff. Strong UX, strong momentum.

Windsurf. Windsurf 2.0 introduced the Agent Command Center for managing local and cloud agents in one place. Strong if you want an IDE-shaped surface and you like Cascade's approach to autonomy.

GitHub Copilot (agent mode / cloud agent). Still the default starting point for a lot of enterprises and the easiest path to "agentic coding in the IDE and GitHub workflow my team already uses." Reliable, conservative, well-integrated with GitHub.

Zed AI. Lean, fast, modal editor with first-class AI integration. Underrated for developers who want speed and minimalism over the kitchen-sink approach.

JetBrains AI Assistant. The right answer if you live in IntelliJ, RubyMine, Rider, or any other JetBrains IDE. Less hype, solid execution.

The IDE layer is the most crowded and the most commoditized. Almost any of these works fine. Pick the one whose modal model and keymap you already know.

Layer 3: The workspace layer

This is the layer that did not exist a year ago and is now the most important one to get right.

Anthropic's desktop app for Claude Code. Anthropic now gives Claude Code a visual desktop surface for parallel local sessions with isolated worktrees. The best single-agent surface for Claude Code specifically.

Nimbalyst. An open-source visual workspace for AI coding that runs Claude Code and Codex as first-class agents in the same workspace. Sessions, tasks, decisions, files, mockups, diagrams, diffs, and code live in one place. The desktop and iOS apps are MIT licensed. Built for the case where you use more than one agent and want the surface itself to be open.

Conductor and other multi-session managers. A growing category of tools that wrap multiple terminal-based agents with a manager UI. Strong for developers who want a session manager without leaving terminal-shaped thinking.

The choice at this layer comes down to one question. Are you a one-agent shop or a multi-agent shop. If you genuinely only use Claude Code, the Anthropic app is the most polished surface. If you use Claude Code plus Codex (or expect to within a year), a vendor-neutral workspace is the right primitive.

Layer 4: The harness

Project instruction files. CLAUDE.md, AGENTS.md, path-scoped rule files. Checked into the repo. Every team doing this seriously has at least one of these by now.

MCP servers. Model Context Protocol is the current best way to expose tools to agents. A growing ecosystem of community MCP servers (Linear, GitHub, Playwright, Postgres, file systems, screenshot capture) means agents can reach real systems without bespoke integrations. Anthropic, OpenAI, and several IDE vendors all support MCP as of mid-2026.

Skill libraries and slash commands. Reusable agent recipes (/release, /review-pr, /triage). Claude Code supports slash commands and skills directly, and OpenAI's Codex docs now treat saved workflows and skills as a first-class pattern. Investment here compounds.

Live-state tools. Direct database queries, log readers, screenshot captures, end-to-end test runners the agent can loop on. The difference between an agent that needs a human at every step and an agent that can finish a long task on its own usually comes down to whether the harness exposes these.

A working stack

The setups I see working best in May 2026 look something like this:

Agents: Claude Code as the default, Codex for tasks where it is faster or stronger.
IDE: whatever the developer already knows. Cursor, Zed, or Copilot are the most common.
Workspace: Nimbalyst when the team uses more than one agent or wants the surface open. Anthropic's app when Claude Code is the only agent.
Harness: CLAUDE.md and AGENTS.md in the repo, MCP servers for live state (logs, database, browser, screenshots), at least three recurring slash commands the team uses every week.

This is the shape of a serious 2026 agentic coding setup. Different from a year ago when "use Claude Code in the terminal" was almost the whole answer.

Where the field is still weak

A few honest gaps as of this writing.

Cross-agent context handoff. When Claude Code finishes work and Codex picks it up, the transcript does not travel cleanly. Most teams copy-paste.
Cross-session review. Three parallel agents touch coupled code. No tool yet treats the combined changeset as one reviewable unit.
Scheduled agents with full workspace access. Background delegation is improving, but real scheduled agents that see the same context interactive sessions see is still an open area.
Shared harness across teams. A team's harness is mostly tribal knowledge in CLAUDE.md files. There is no good package manager equivalent yet for sharing rule sets and skills across teams.

If you are picking a workspace surface this year, weight it toward the one that has the best chance of closing those gaps without locking you to one model vendor. That is the rationale behind Nimbalyst, and it is the lens I would apply to any of the alternatives as well.

The agents are going to keep getting better. The work this year is in the four layers around them.

Best Desktop App for Claude Code on Mac

Karl Wirth — Sun, 26 Jul 2026 13:00:00 +0000

Claude Code started life as a terminal tool. For a lot of developers that is still the right answer, especially if you live in tmux and never want to leave. For everyone else, the past six months have produced a real set of desktop options on the Mac, and the differences between them matter more than they look at first.

This is a working developer's view of the options, what each one is good at, and where I think the gaps still are. I build Nimbalyst, which is one of the options below, so treat that section accordingly. The rest of the comparison is based on running these tools daily on an M2 Mac.

What "desktop app for Claude Code" actually means in 2026

There are now three honest answers to "how do I run Claude Code on a Mac."

Terminal: claude in iTerm or Ghostty, optionally inside tmux. Anthropic's CLI, which is still the canonical surface.
Anthropic's desktop app for Claude Code: Claude Code now runs inside Anthropic's desktop app, which gives it a visual surface for parallel sessions and managed worktrees.
Third-party desktop workspaces: native Mac apps from outside Anthropic that wrap Claude Code (and usually other agents) into a richer surface. Nimbalyst is the one I work on, and there are a handful of others.

Each of these is the "best" answer for a different kind of work.

Terminal Claude Code

The terminal is still the lowest-friction way to start a Claude Code session. One command, no UI to learn, full keyboard control.

Where it shines:

Single-session, single-repo work where you do not need to glance across multiple things at once.
Developers who already have a strong tmux or terminal multiplexer setup.
CI-like flows where you want to script Claude Code with shell.

Where it falls apart:

Running more than two or three sessions at once. You end up alt-tabbing through terminals trying to remember which one is doing what.
Reading large diffs. Terminal diffs are usable for ten lines, painful for a thousand.
Anything visual: mockups, diagrams, screenshots from a long session.

If you are running one Claude Code session at a time, the terminal is fine. The friction starts when you go parallel.

Anthropic's desktop app for Claude Code

Anthropic's desktop app gives Claude Code a visual surface for running multiple sessions in parallel, with isolated worktrees handled for you. This is a genuine improvement over the terminal for parallel work.

Where it shines:

First-party support. If Claude is the only agent you care about, the official surface is the safest long-term bet and the most polished single-agent UI available.
Worktrees are managed for you. You stop thinking about branch hygiene.

Where it has gaps:

Single-vendor by design. There is no path to running an OpenAI Codex session next to a Claude Code session in the same workspace. If you use both (most developers I talk to do), you are running two apps.
Sessions are the unit of work. Tasks, decisions, mockups, and data models live somewhere else. You end up keeping notes in a separate app.
Closed source. The desktop app is not something you or your company can extend, fork, or pin to a specific version.

For a developer who has standardized on Claude Code and wants the most polished single-agent experience, this is a strong default.

Nimbalyst

Nimbalyst is an open-source visual workspace for AI coding. It runs Claude Code and OpenAI Codex as first-class agents in the same workspace, with a pluggable agent layer for whatever comes next. The desktop app ships as a native Mac app.

Where it fits:

You run more than one agent. Claude Code for some tasks, Codex for others, both pointed at the same files. Switching between them does not mean switching apps.
You want sessions, tasks, decisions, files, and visual artifacts in one place. Mockups, diagrams, diffs, and markdown are all editable in the same canvas as the agent sessions that produced them.
You want the surface itself to be open. The desktop and iOS apps are MIT licensed. You can run it, read it, fork it, or pin a reviewed version.

Where it is the wrong tool:

You only use Claude Code and you want the single most polished Claude-only UI. The Anthropic app will feel cleaner.
You never leave the terminal and have no interest in a GUI. The CLI is right there.

Nimbalyst is the answer when "desktop app for Claude Code" is really "desktop app for the way I actually work, which includes Claude Code and other agents and a lot of non-code artifacts."

A quick honest matrix

Need	Best fit
Single agent, single repo, terminal-native	Claude Code CLI
Single agent, lots of sessions, polished UI	Anthropic desktop app
Multiple agents (Claude Code + Codex) in one workspace	Nimbalyst
Visual editing of mockups, diagrams, diffs, markdown alongside agent sessions	Nimbalyst
Open-source surface you can pin or extend	Nimbalyst

What I would actually recommend

If you are picking today and you are on a Mac:

Trying Claude Code for the first time: start in the terminal for a week. Get a feel for how the agent thinks.
Running Claude Code daily, single agent, want a GUI: install Anthropic's official desktop app. It is the most polished single-vendor experience.
Running Claude Code plus Codex, or anticipating that you will: install Nimbalyst. The single-workspace, multi-agent property gets harder to retrofit later, and the surface is yours to keep.

The model layer is going to keep churning. Whichever desktop app you pick is the surface your prompts, context, and team workflow will live in for the next few years. Pick the one that does not assume the answer to "which agent" is going to stay the same forever.

Best Agent Harness for Claude Code and Codex: How to Choose

Karl Wirth — Sat, 25 Jul 2026 21:00:00 +0000

A year ago "agent harness" was an inside-baseball term used mostly inside AI labs. In 2026 it has become one of the most important decisions a serious AI-coding shop makes, and most teams are making it accidentally.

A harness is everything around the model that helps it do the right thing when it needs to. The model itself is interchangeable. The harness is not. As frontier models keep flipping the leaderboard every few weeks, the harness is increasingly where your real investment lives.

This post is a practical evaluation guide for picking or building one for Claude Code and Codex. For the deeper definition and architecture, read What Is an Agent Harness?. Here, the focus is the decision: which approach gives your team a durable advantage without making you maintain infrastructure you do not need?

What an agent harness actually is

A harness is context plus restraint plus empowerment.

Context: the things the model needs to know to do good work in your codebase. Your conventions, your past decisions, the way you build React components, the shape of your data model, the open tracker items related to what is being worked on right now.
Restraint: the rules that keep the model inside the lines. Do not use dynamic imports here. Never write to the D1 database from this path. Always ask before running this kind of command.
Empowerment: the tools the model can reach for. Direct access to log files, the ability to query the running app's state, a sandboxed browser for end-to-end testing, screenshots of the UI it just changed.

Strip those three out and what is left is a chat box pointed at a fast autocomplete engine. Put them in and you have something that can iterate on real work and get measurably better over time.

The parts of a real harness

A harness is not one file. In a working setup, it is at least these:

A root instruction file (a CLAUDE.md, an AGENTS.md, or the equivalent startup file your agent reads). The first thing every agent reads. Project conventions, critical rules, the map to the rest of the harness.
Path-scoped rules. Files that activate when the agent touches a particular area. "When you are working on IPC handlers, read this." "When you are styling components, follow these Tailwind conventions."
Skills, examples, and recipes. Worked examples that show, not just tell. The model is much better at imitating a good example than at parsing prose.
Tools that touch live state. Read the log file. Query the local database. Take a screenshot of the UI. Run the end-to-end test suite in a loop until it passes. These are what turn a code generator into something that can verify its own work.
A linked workspace. Tracker items, sessions, commits, files, and decisions that are all addressable. So the agent can see "this bug is linked to that session is linked to those files is linked to that commit history."

If you only have the first one, you have a notes file. If you have all five, you have a harness that compounds in value every week.

Four approaches you can evaluate

Most teams choose among four shapes. The labels overlap, so compare the ownership boundary rather than the marketing category.

Approach	What it provides	Best fit	Main tradeoff
Vendor-native	One coding agent's built-in execution, tools, permissions, memory, and interface	A team standardizing on one agent and optimizing for fast adoption	Project workflow can become coupled to one product
Agent framework	Libraries and primitives for loops, tools, state, and evaluation	A team building a custom runtime or specialized agent product	You still need to assemble the human workspace and production controls
Fully in-house	A custom runtime plus project context, tools, policy, traces, and coordination	Specialized or regulated environments with a platform team	Highest ongoing maintenance burden
Open-source workspace plus project layer	Inspectable shared infrastructure with repository-owned context and policy	Teams that want portability without rebuilding generic capabilities	Requires deliberate evaluation and integration

The open-source vs in-house comparison goes deeper on what should stay under project control. Whichever route you choose, the test is the same: can you see what happened, constrain what can happen, verify the result, and point a different model at the same project layer?

Why this matters more for Claude Code and Codex than for any single agent

Many teams that use Claude Code are also testing Codex. Claude is better at some kinds of work, Codex is better at others, and frontier models keep trading positions. The teams getting the most out of agentic coding right now are routing different tasks to different agents.

That changes the harness requirements. A harness that only works inside one vendor's app does not survive the next model swap. The valuable harness is the one that is portable across agents: the same context, the same rules, the same tools, available to whichever agent you point at the work.

Concretely, that means:

Instruction files in the repo, not in the vendor's UI. CLAUDE.md and AGENTS.md checked into git, readable by any agent that respects them.
Tools exposed through an open protocol. MCP (Model Context Protocol) is the current best answer. Tools written once, reachable by Claude Code, Codex, and whatever lands next.
A workspace surface that is not owned by one model vendor. If the surface is from the same company as the model, it may eventually optimize for keeping you on that model.

Evaluation criteria for the best harness

For Claude Code plus Codex specifically, the best harness in 2026 has four properties.

Open and inspectable. You can read every file the agent reads. No hidden system prompts owned by a vendor. If a rule is firing, you can find it. If a tool is wrong, you can fix it.

Multi-agent by design. The same harness drives Claude Code, Codex, and any other agent your team wants to try. Switching agents on a task is a one-click decision, not a migration.

Workspace-aware. The harness can see across sessions, tasks, files, and decisions. An agent fixing a bug can read the linked tracker item, the related session transcripts, and the commit history without you copy-pasting any of it.

Loopable. The agent can run, observe, evaluate, and try again, using real tools (Playwright, log queries, screenshots) rather than guessing. This is the difference between agents that need a human in the loop on every step and agents that can grind through a long task while you do something else.

Any harness missing one of those four properties is going to feel limiting within months.

Add two more criteria before choosing:

Restrained by design. Powerful tools have explicit scopes, approval gates, and audit trails. The system never treats access to a tool as blanket permission to use it against shared state.

Measurable on your work. You can replay real tasks with fixed inputs, capture complete evidence, and distinguish a harness improvement from model variance or a lucky run.

Where Nimbalyst fits

I build Nimbalyst, so this is the part where I tell you what we are doing about it. Nimbalyst is an open-source visual workspace for AI coding that is built to be a harness across Claude Code and Codex.

Concretely:

Project-level instruction files (CLAUDE.md, AGENTS.md, scoped rule files) are first-class. Any agent in the workspace reads them.
MCP tools are a first-class part of the system. Agents can query live state, read log files, drive the UI, take screenshots, and run verification loops against the app.
Sessions, tasks, decisions, mockups, diagrams, and code all live in the same workspace and are linkable. An agent can see the bug, the related sessions, the related files, and the history that connects them.
Claude Code and Codex are first-class agents today and the agent layer is pluggable for the next one.
The desktop and iOS apps are MIT licensed.

It is not the only valid answer. A determined team can hand-roll a harness with a careful repo layout, a shared MCP server, and discipline about which app each developer uses. The reason I think a workspace-shaped harness wins is that the linking is the part that does not exist in a pile of files.

How to pick (or build) yours this quarter

If you are about to invest serious time in agentic coding, three concrete moves:

Move your harness into the repo. Whatever is in your head about how the codebase should be written needs to be in a CLAUDE.md and an AGENTS.md, both checked in.
Write at least one path-scoped rule and one tool that reads live state. Any path-scoped rule. Any live-state tool. The first one is the hardest. The second is when the harness starts to compound.
Pick a surface that does not lock you to one model vendor. Whether that is Nimbalyst, a careful CLI setup, or something else, the test is the same. Can a new agent that ships next month run against the same harness with one config change?

The model and agent layers will keep moving. The harness is the part you own.

Before standardizing, run representative tasks through both your current setup and the candidate harness. The Agent Harness Benchmark Protocol holds the model and environment fixed, scores correctness before efficiency, and requires traces and test output behind every claim. If you want a concrete starting structure, use the Open-Source Agent Harness Blueprint.

FAQ

What is the best agent harness for Claude Code and Codex?

The best harness is one your team can inspect, version, verify, and move across models. It should keep project context and tools in durable formats, enforce permission boundaries, and prove results with repeatable evidence.

Can Claude Code and Codex share one project harness?

Yes. Keep shared instructions, project tools, architecture context, and evaluations in repository files or open protocols, then use thin agent-specific entry points for the runtime differences.

Should I choose an open-source or in-house agent harness?

Keep project-specific rules, permissions, context, examples, and evaluations in house. Adopt an inspectable open-source base for generic workspace and orchestration capabilities unless your requirements justify maintaining those layers yourself.

How should I evaluate an agent harness?

Use real hidden-solution tasks and paired runs. Hold the model, repository, prompt, base tools, and budget fixed, then compare correctness, verification, policy adherence, context efficiency, recovery, and provenance.

Integrate 80% of everything for agent and human context

Karl Wirth — Sat, 25 Jul 2026 19:00:00 +0000

Why deep integration matters for agent context

To ship a single feature with a coding agent, most teams touch seven systems: Jira or Linear for the ticket, Slack for the thread that clarified what to actually build, Obsidian, Notion, Google Docs, or Confluence for the plan, Excalidraw, Miro, or Lucid for the diagram, Figma for the mockup, the IDE for the diff, and Codex or Claude Code for the working sessions. Plus the local files themselves.

Each tool owns one slice of the data and the work. The connections between those slices mostly live in human heads. The ticket does not know which session touched it. The plan does not know which diagram shaped it. The session does not know which thread changed the requirement it just implemented. An agent cannot know any of this unless a human reconstructs it.

So the human stitches. You open seven tabs, retell the story to whichever agent you started today, paste links, summarize threads, and hope the model has enough fragments to do useful work. Then the session ends and the next one starts cold, and the lookups and stitching start over.

In that workflow, the human is the bottleneck, and the agent ends up working on disconnected fragments.

Context as a graph, not a pile of tabs

Ask, "Pick up where the last session left off and ship the streak tracker." Across a fragmented stack, that requires a chain of lookups through separate tools, separate auth models, and separate data models, and the agent still may not know which prior session touched the work. The artifacts already exist, but you are missing the connections between them.

In an integrated workspace, those artifacts become nodes with typed edges: tracker item to message thread to plan to spec to session to diagram to diff to files. Both the human and the agent can traverse the same graph, and the human can do it visually. So that same query is a single traversal. The tracker item, the plan, the spec, the discussion threads, the prior session, the design diagram, the open PR, and the files already changed are all connected.

A graph like this is a connective layer that any agent can plug into, whether the model is Claude Code, Codex, OpenCode, or whatever comes next.

Integrated visual editors are how the human stays in the loop

A graph of typed edges is only useful to a human if they can actually see and edit the things on the other end of those edges without leaving the place they started.

Editors have to be part of the workspace, accessed natively within any context. Reading a message thread that references a diagram? Open the diagram right there and edit it. Sitting in a tracker item that links to a mockup? Pull up the mockup and adjust it without losing your place. Working in a markdown spec that embeds a diagram? Click into the diagram, change it, and the spec updates. The same is true for plans, diffs, code files, and sessions. Every artifact in the graph has a first-class editor inside the workspace, and you can move between them without switching apps, losing context, or copying anything between tools.

And you work visually with your agent in the same artifact with the same edits visible to both of you in real time. When the agent modifies a mockup, you see the red and green diff and approve it. When you redraw part of a diagram, the agent picks up the change for its next step. The visual surface and the agent's working surface are the same surface.

Why we are building the 80% that matters

Deeply integrated agent context will not exist as long as the underlying work is scattered across eight different SaaS applications.

So we are building the 80% of those products that matters for human and agent workflows, then integrating those data models into one graph.

That already includes:

A tracker that holds tickets, bugs, decisions, and ideas
A markdown editor with WYSIWYG and red/green diffs
Diagrams as first-class files
Mockups that render
A code editor
Sessions that persist

Message threads are next, because the conversation around a piece of work is part of the work.

These applications are agent-native and deeply integrated both visually and in the graph. They share IDs and a workspace, all in one graph. Our thesis is that the winning environment for human-agent work is an integrated workspace where the work, the discussion, the decisions, the files, and the sessions all belong to the same system.

Nimbalyst is one example you can learn from and use

Nimbalyst is an open-source visual workspace where agents, sessions, tasks, and files live in one place. Markdown, mockups, diagrams, diffs, and code all open in the same canvas. Claude Code and Codex run as first-class agents today, and the agent layer is pluggable for the next one. The desktop and iOS apps are MIT licensed. Steal what is useful for your own setup, or use it as-is if it fits.

Claude Code Pricing for Engineering Teams (2026 Guide)

Karl Wirth — Sat, 25 Jul 2026 17:00:00 +0000

Claude Code pricing in 2026 is straightforward on the surface and complicated underneath. The headline numbers (the Claude subscription tiers and the API token rates) only tell part of the story. For an engineering leader trying to budget a team rollout, what matters is the total cost per developer per month, how that cost scales with parallel sessions and long-context work, and where the line items hide. This guide walks through the actual Claude Code cost math for an engineering team in 2026, with the numbers and the trade-offs we have seen in practice.

Claude Code Pricing: Quick Answer

Individual plans are straightforward: Claude Pro is $20 per month, Max 5x is $100 per month, and Max 20x is $200 per month when billed monthly.
Team billing is different from individual billing. Team standard seats do not include Claude Code. Team premium seats do.
Enterprise billing is different again. Anthropic currently lists Enterprise as $20 per seat plus usage billed at API rates.
API pricing is the real variable cost line. Current list pricing is $3 input / $15 output per million tokens for Sonnet 4.6 and $5 input / $25 output for Opus 4.7.
Anthropic's own team-usage guidance is the best planning anchor. The company says API-billed Claude Code deployments average about $13 per active developer day and roughly $150 to $250 per developer per month, with wide variance.
Official Claude Code Review is a separate spend line. Anthropic says Claude Code Review averages roughly $15 to $25 per review and does not count against included plan usage.

The full breakdown follows.

The Four Billing Paths That Matter

Anthropic now has four pricing paths that matter to engineering leaders. Most confusion comes from mixing them together.

1. Individual Pro

Claude Pro is $20 per month when billed monthly, or $17 per month on the annual plan. It includes Claude Code. This is the cleanest entry point for an individual developer evaluating the workflow or using Claude Code for shorter daily sessions.

2. Individual Max

Max comes in two monthly tiers:

Max 5x: $100 per month
Max 20x: $200 per month

Both include Claude Code and raise usage capacity above Pro. This is the simplest predictable-budget option for heavy individual users who want a hard monthly ceiling instead of token billing.

3. Team and Enterprise seats

This is where many team rollouts get misread.

Team standard seat: $25 per seat per month when billed monthly, or $20 on annual billing. More usage than Pro, but no Claude Code.
Team premium seat: $125 per seat per month when billed monthly, or $100 on annual billing. Includes Claude Code and Claude Cowork.
Enterprise: currently listed as $20 per seat plus usage at API rates.

For an engineering leader, the key point is simple: a Team plan does not automatically mean every seat can use Claude Code. Only premium seats do.

4. API / usage-based billing

The API path matters for custom automation, CI-triggered workflows, and Enterprise-style usage billing. Anthropic's current list pricing is:

Sonnet 4.6: $3 input / $15 output per million tokens
Opus 4.7: $5 input / $25 output per million tokens

This is the least predictable model, but it is the most flexible one for scheduled runs, background workflows, and organization-managed spend controls.

What a Team Should Actually Budget

The useful question is not "what does Claude Code cost" in the abstract. It is "which billing path are we standardizing on, and which developers really need Claude Code access."

Individual-plan rollout

If your team is small and decentralized, the budget bands are simple:

Occasional user: Pro at $20 per month
Heavy daily user: Max 5x at $100 per month
Power user with parallel sessions: Max 20x at $200 per month

This is the easiest rollout to understand, but it is not the cleanest model for centralized admin, shared spend controls, or procurement.

Managed team rollout

If you want centralized billing, do not assume "put everyone on Team" solves the Claude Code question. It does not. The budgeting question becomes: how many people need premium seats, and how many only need standard Claude access.

That distinction matters because Team standard is materially cheaper than Team premium, and only Team premium includes Claude Code.

Usage-billed rollout

If you are using Enterprise billing or API-based automation, Anthropic's own cost guidance is the most credible benchmark: around $13 per active developer day and roughly $150 to $250 per developer per month on average, with wide variation by model choice, codebase size, and automation volume.

That benchmark is much more useful than generic blog estimates because it reflects real token-billed deployments rather than seat-based subscriptions.

Hidden Cost Lines Most Teams Miss

Four line items repeatedly surprise engineering leaders the first time they budget Claude Code for a team.

Standard seats vs premium seats

This is the biggest one. A Team plan does not mean every seat has Claude Code. If you assume it does, your first real rollout plan will be under-budgeted.

Usage spillover after included limits

Anthropic now supports extra usage and pay-as-you-go paths after included limits are reached. That is good for keeping developers unblocked, but it also means your "fixed seat cost" can quietly become a hybrid seat-plus-usage bill if you are not watching it.

Separate billing for Claude Code Review

Anthropic's official Claude Code Review is not bundled into included Claude Code usage. It is billed separately through usage credits, and Anthropic says the average review costs about $15 to $25. If your team plans to review every pull request with Claude, budget that as its own line item.

Automation multiplies usage faster than chat

The most expensive Claude Code workflows are usually not everyday chat turns. They are automation-heavy flows: CI-triggered runs, long refactors, multiple parallel sessions, and repeated PR review passes. Those are the workflows that push teams toward premium seats, usage credits, or Enterprise spend controls.

Budgeting Claude Code for a Team

A practical budgeting approach for an engineering leader rolling Claude Code out across a team of five to twenty engineers:

Choose the procurement model first. Decide whether you want individual subscriptions, Team seats, or Enterprise/API billing before you estimate per-developer cost.
Count premium-seat users, not just total developers. On Team plans, the important question is how many developers truly need Claude Code, not how many employees need Claude access.
Track review usage separately from coding usage. Official Claude Code Review is its own spend line and should not be mixed into the base seat estimate.
Expect automation users to cost more than interactive users. The developer who runs background jobs, CI flows, or several parallel sessions is the one who breaks your first budget model.
Reforecast quarterly. Anthropic's packaging and model lineup change often enough that annual assumptions get stale.

For more on what good Claude Code adoption looks like at a team level, see our guide to orchestrating Claude Code sessions on a kanban board and our breakdown of the best Claude Code GUI tools in 2026.

Frequently Asked Questions

How much does Claude Code cost?

For individuals, Claude Code currently costs $20 per month on Pro, $100 per month on Max 5x, and $200 per month on Max 20x. For teams, the answer depends on plan type: Team premium seats include Claude Code, while Enterprise is priced as seat cost plus usage.

What is Claude Code API pricing?

Claude Code API pricing follows Anthropic's standard model pricing. As of May 27, 2026, Anthropic lists Sonnet 4.6 at $3 input / $15 output per million tokens and Opus 4.7 at $5 input / $25 output per million tokens. There is no separate Claude Code token price beyond those underlying model rates.

What is the cheapest way to use Claude Code?

The cheapest straightforward way to use Claude Code is Claude Pro at $20 per month. For teams, the cheapest centralized path is not automatically Team standard, because Team standard does not include Claude Code. If you need Claude Code on a Team plan, you are really choosing between Team premium seats and Enterprise-style usage billing.

How much should a team budget for Claude Code per developer?

The cleanest answer is to budget by seat type, not by a single blended average. On Team, standard seats and premium seats have very different costs and only premium includes Claude Code. On usage-billed deployments, Anthropic's own benchmark of roughly $150 to $250 per developer per month is the better planning anchor than a generic blog average.

Is Claude Code worth the cost for a team?

Claude Code is worth the cost when the team uses it as a real workflow layer rather than a novelty. The strongest justification usually comes from teams that use it for implementation, refactoring, and review loops repeatedly enough that the seat or usage cost is small relative to saved engineering time. The weakest justification is buying premium access for everyone before you know who will actually use it heavily.

AI Code Review Tools for Engineering Teams (2026)

Karl Wirth — Sat, 25 Jul 2026 15:00:00 +0000

AI code review in 2026 has moved from a demo curiosity to a real layer of the engineering workflow. The tools have matured. The integrations into GitHub and other forges are now table stakes. Most teams are now choosing between several credible AI code review options rather than deciding whether to adopt one at all. For an engineering leader, the question is no longer "should we use AI code review" but "where in the review pipeline does it actually help, and which tools are worth the line in the budget." This guide covers the practical state of AI code review tools in 2026, the integration patterns that work, and the patterns that quietly waste developer time.

AI Code Review: Quick Answer

AI code review tools are most useful as a first-pass reviewer that flags issues before a human reviewer opens the PR.
GitHub AI code review is now real product surface area, not just a prompt trick. Copilot and Codex both ship first-party PR review on GitHub, while Anthropic now distinguishes between official Claude Code Review and custom Claude Code automations.
AI code review is best at style consistency, common bug patterns, security smells, and missing test coverage.
AI code review is weakest at architectural review, intent matching, and judgment calls that depend on team context.
The realistic team rollout is one AI reviewer in the PR pipeline plus human review for the parts AI cannot judge.
Cost models differ more than teams expect. Copilot review rides on Copilot plans, Anthropic's official Claude Code Review is usage-based, and dedicated review vendors add their own seat pricing.

The full breakdown follows.

What AI Code Review Actually Does Well

Three categories of feedback are the consistent strength of AI code review tools in 2026.

Style consistency and minor refactors

AI code review tools reliably catch inconsistent style, dead code, unused imports, mis-named variables, and small refactors that would otherwise live in a "nits" pile and slow the human reviewer down. This is the least controversial value AI code review delivers and the easiest to validate.

Common bug patterns

Off-by-one errors, null handling gaps, missing error returns, wrong loop bounds, missing await, and similar standard mistakes are well-suited to AI review. The tool reads the diff, recognizes the pattern from millions of similar examples, and posts an inline comment. The hit rate is high enough that engineers stop dismissing the comments after a couple of weeks.

Security and obvious vulnerability smells

AI code review catches obvious security smells (SQL injection patterns, hardcoded secrets, unvalidated input flowing into shell commands, weak crypto usage) with high precision. Catching subtle vulnerabilities still requires dedicated security tooling. Catching the obvious ones at PR time has measurable impact.

Where AI Code Review Fails

Three categories that AI code review tools still consistently miss as of 2026.

Architectural intent

AI cannot read your design doc, your team's last six retros, or the constraint that drove the current shape of the module. It can tell you that a function is long. It cannot tell you that the function is long because the team decided last quarter to keep it inline rather than abstract. Architectural review remains a human job.

Intent matching

The PR description says "fix flaky login test." The diff also removes a retry on an unrelated network call. AI code review usually does not flag the second change as out of scope, because the diff is syntactically clean and the test is passing. Human review still catches the off-scope changes that matter.

Team-specific context

Every team has a few patterns that look wrong but are right (or look right but are wrong) because of a specific historical decision. The AI reviewer comes in fresh every PR and re-suggests the "fix" that the team rejected six months ago. The tools are improving on this with custom rule packs and team configuration, but the gap is still real.

AI Code Review Tools Worth Knowing in 2026

The AI code review space has matured into a handful of credible options. The right pick depends on which forge you live in and how much custom configuration your team wants.

GitHub Copilot for code review

GitHub Copilot code review is the lowest-friction GitHub-native option. A reviewer can request Copilot on a PR the same way they request a human reviewer, and teams can later enable automatic reviews if they want Copilot on every pull request. Copilot leaves comment reviews, not approvals.

Best for: GitHub-native teams that already pay for Copilot. The friction to adopt is essentially zero.
Limitations: Less configurable than dedicated AI code review tools. The signal-to-noise tuning is what GitHub ships globally rather than what your team prefers.

CodeRabbit

CodeRabbit is a dedicated AI code review platform with GitHub, GitLab, and Bitbucket integration. It posts inline review comments, supports custom rules, and integrates with Linear and Jira for context. CodeRabbit is one of the more configurable options in 2026.

Best for: Teams that want a dedicated AI code review tool with per-team configuration and forge flexibility.
Limitations: Adds a per-seat line item on top of existing AI tools. Effective use requires investment in custom rule packs.

Greptile

Greptile focuses on whole-codebase context, indexing the repository so its review comments reference patterns elsewhere in the code rather than reviewing the diff in isolation. Strong on consistency and codebase-aware refactors.

Best for: Larger codebases where consistency across files matters more than diff-only review.
Limitations: Higher setup cost. The codebase indexing step is non-trivial for very large monorepos.

Codium / Qodo Merge

Qodo Merge is the rebranded Codium AI code review product. It runs as a GitHub Action or app and produces a structured review with categorized findings. Good defaults, low setup cost.

Best for: Teams that want structured AI code review output without much configuration.
Limitations: Less customizable than CodeRabbit. The structured output is opinionated and not every team's preferred review style.

Codex for PR review

OpenAI now ships first-party Codex code review for GitHub pull requests. Once enabled, Codex can automatically review PRs as they move from draft to ready, and reviewers can explicitly ask for a review with @codex review. OpenAI positions it as a whole-codebase reviewer that can reason about dependencies and validate behavior by running code and tests.

Best for: GitHub teams already standardizing on Codex who want a first-party agentic reviewer instead of adding a separate review vendor.
Limitations: GitHub-centric. Still not a replacement for human approval, especially on architectural or product-intent questions.

Claude Code for PR review

Claude Code has two different review stories, and teams should not blur them together. Anthropic's official Claude Code Review is a separate research-preview feature for Team and Enterprise plans that analyzes pull requests and posts inline comments. Separately, Claude Code GitHub Actions and the Claude Code SDK let teams build custom review workflows with their own prompts and triggers.

Best for: Teams already committed to Anthropic that want either an official managed review product or a programmable review pipeline.
Limitations: Official Claude Code Review is not the same thing as "just run Claude on a PR." The managed product is limited to Team and Enterprise orgs and billed separately, while the DIY path takes more setup and tuning.

Integration Patterns That Work

Three integration patterns are now the working standard for AI code review.

Pattern 1: AI as first-pass, human as approver

The AI code reviewer posts comments first. The PR author addresses or dismisses each comment. Only then does a human reviewer get assigned. This pattern saves human reviewer time on the easy comments and focuses human attention on judgment calls. It is the most common pattern in 2026.

Pattern 2: AI as a parallel reviewer

The AI runs in parallel with a human reviewer. Both leave comments. The author addresses both. This pattern works when the team treats AI comments as a second opinion rather than a gating reviewer. It is the right pattern when the team has not yet built trust with the AI reviewer's signal-to-noise.

Pattern 3: AI as a precommit gate

The AI reviews before the PR is even opened, as a precommit or local check. This shifts feedback left and avoids cluttering the PR with comments. It works well for style and minor-bug categories. It does not work as well for the comments that benefit from full PR context.

Most teams settle on Pattern 1 over time. Pattern 2 is a useful starting point during the AI reviewer trust-building phase. Pattern 3 is best as a supplement rather than the only AI review layer.

A Practical Rollout for an Engineering Team

A working AI code review rollout for a team of five to twenty engineers:

Start with the tool already in your forge. If you are on GitHub, request Copilot reviews manually on one repo for two weeks before enabling automatic reviews. If you are piloting Codex, enable it on one repo and compare its findings against the human review stream.
Calibrate signal-to-noise. Track how many AI comments are useful versus dismissed. If the dismissal rate is over 70%, the tool needs configuration. If it is under 30%, you are getting good signal and the tool is paying for itself.
Add a dedicated AI code review tool if needed. If the built-in reviewer's signal-to-noise does not improve with configuration, evaluate CodeRabbit, Greptile, or Qodo Merge as a higher-quality alternative.
Train the team to address comments fast. AI review comments age badly. The PR author should address every AI comment (accept, dismiss, or reply) before requesting human review. This habit alone is more important than the choice of tool.
Keep human review for the things AI cannot judge. Architectural intent, off-scope changes, and team-specific decisions stay with a human reviewer. AI handles the rest.

What AI Code Review Means for the Workspace

AI code review is one piece of a larger pattern: AI is now part of every step of the engineering workflow, not just the writing step. Planning, drafting, reviewing, and shipping all have AI tools now. The cost of running multiple AI tools alongside each other has gone up. The value of having one place to see what every AI tool is doing has gone up alongside it.

Nimbalyst is the open-source visual workspace we have been building for exactly this scenario. We run Claude Code and Codex sessions on a shared kanban board, review their diffs inline, plan with mockups and diagrams in the same app, and we are now layering AI code review feedback into the same workspace. The desktop and iOS apps are MIT-licensed. The pattern works for a solo developer and scales to a team of twenty.

Frequently Asked Questions

What is AI code review?

AI code review is the use of an AI tool to review code changes before or during the human code review process. The AI reads a diff, optionally indexes the surrounding codebase, and posts inline comments suggesting fixes, flagging bugs, catching security smells, or pointing out style inconsistencies. AI code review is most useful as a first-pass reviewer that handles the routine feedback so human reviewers can focus on architectural and team-context judgment.

What is the best AI code review tool in 2026?

The best AI code review tool in 2026 depends on how much setup you want and which stack you already run. GitHub Copilot is the easiest starting point for GitHub-native teams. Codex is the strongest first-party agentic reviewer if you want GitHub PR review plus follow-up fixes in the same loop. Claude Code is strongest when you want Anthropic in the stack, but you need to distinguish between Anthropic's managed Claude Code Review product and a custom Claude Code automation you build yourself. CodeRabbit, Greptile, and Qodo still make sense when you want a dedicated review layer with more vendor-owned workflow.

How does GitHub AI code review work?

On GitHub, AI code review usually means one of three things: Copilot requested as a reviewer, Codex configured to review pull requests, or a GitHub App from a dedicated vendor. With Copilot specifically, teams usually start by manually requesting a review from the Reviewers menu, then optionally enable automatic reviews later. Copilot leaves comment reviews rather than approvals, so human approval still carries the merge decision.

Is AI code review good enough to replace human review?

No. AI code review in 2026 is good enough to handle most style and small-bug feedback, but it is not good enough to handle architectural review, intent matching, or team-specific context. The realistic rollout is AI code review as a first-pass reviewer and a human reviewer as the final approver. Teams that try to replace human review entirely typically discover the gap during a postmortem on something AI did not flag.

How much does AI code review cost?

AI code review pricing is not one clean category. GitHub Copilot code review is part of Copilot plan entitlements. Anthropic's official Claude Code Review is billed separately through usage credits, with Anthropic saying a review averages roughly $15 to $25 depending on PR size and verification work. Dedicated review platforms typically add their own per-seat pricing. Codex review is better thought of as part of the broader Codex-and-GitHub workflow than as a separate review-only SKU.

Can I use Claude Code or Codex as an AI code reviewer?

Yes, but not in the same way. Codex has an official GitHub PR review flow, including automatic review on ready-for-review PRs and explicit @codex review triggers. Claude Code also supports PR review, but teams need to choose between Anthropic's official Claude Code Review product and a custom workflow built with Claude Code GitHub Actions or the SDK. If you want the least setup, use the managed product. If you want the most control, build the workflow yourself.

Best AI Mockup Tools for Developers in 2026

Karl Wirth — Sat, 25 Jul 2026 13:00:00 +0000

If you are searching for the best AI mockup tools in 2026 as a developer, the question that matters is which artifact comes out the other end, and whether your coding agent can do useful work from it. Pretty first screens are common; agent-ready output is not.

A designer can stop at a polished screen. A developer usually cannot. You need something a Claude Code or Codex session can inspect, diff, turn into components, and keep iterating against after the first pass. Output format, file ownership, and agent-readiness end up mattering more than screenshot quality.

Disclosure up front: we build Nimbalyst. Nimbalyst is an open-source visual workspace where you work with agents, sessions, tasks, and files, and edit markdown, mockups, diagrams, diffs, and code. It runs Claude Code and OpenAI Codex side by side, with pluggable agent harnesses. Nimbalyst Mockups is the mockup editor inside that workspace. I am including it here because it solves a concrete developer workflow problem, but I will keep the trade-offs explicit.

One notable change from older 2025 and early 2026 roundups: Galileo AI is no longer on this list. The current galileo.ai product is an AI observability and evaluation platform, not a UI mockup tool, so it no longer belongs in a live buyer's guide for this category.

Quick answer

If you want the cleanest agent handoff, choose Nimbalyst Mockups or Subframe.
If you want a Git-backed app builder that can become production code fast, choose v0, Lovable, or Bolt.new.
If your team already lives in Figma, choose Figma Make.
If you want lower-fidelity planning and flow work, choose Uizard.
If you want a free experiment that generates UI designs and front-end code, test Google Stitch.

Quick comparison

Tool	Primary output	Where the canonical artifact lives	Can a coding agent work from it?	License / business model
Nimbalyst Mockups	`.mockup.html` HTML/CSS mockup file	Your repo	Yes, directly	Open source. Desktop and iOS apps are MIT
v0 by Vercel	Git-backed app code, editor, preview	v0 project plus GitHub when connected	Yes, through repo sync or export	Proprietary hosted product
Lovable	Full-stack app project with GitHub connection	Lovable project plus GitHub when connected	Yes, after Git sync	Proprietary hosted product
Figma Make	Functional prototype or web app in Figma Make	Figma cloud	Partially, through export, Dev Mode, or MCP bridge	Proprietary hosted product
Bolt.new	Running app with Git-backed version history	Bolt project plus GitHub when connected	Yes, after Git sync	Proprietary hosted product
Uizard	Wireframes, screens, prototypes, handoff assets	Uizard cloud	Partially. Better for human handoff than direct agent loop	Proprietary hosted product
Google Stitch	UI designs plus front-end code	Google Labs cloud	Partially. Exported code is usable, but the tool is not agent-native	Free Google Labs experiment
Subframe	React + Tailwind code, inspectable design surface	Subframe cloud plus synced codebase	Yes, through export and MCP flow	Proprietary hosted product

What developers should optimize for

Before picking a tool, decide which of these workflows you actually want:

Repo file first. The mockup is a file in your codebase. The agent reads it directly.
Git-backed app builder. The tool lives in the cloud, but the code can stay in sync with GitHub.
Design tool with export or bridge. The mockup lives in a design runtime, and code comes later through export, Dev Mode, or MCP.
Prototype-first planning. The output is mainly for humans, not for a coding agent to pick up without translation.

Picking the right workflow shape matters more than ranking the tools by visual polish.

Nimbalyst Mockups

Nimbalyst Mockups generates .mockup.html files that render as live HTML/CSS mockups in the editor and live beside your code in the repo.

What comes out: A real file. Plain HTML/CSS inside a .mockup.html extension.
Why developers like it: A coding agent can open the file directly, reason about it, and implement from it in the same workspace.
Where it is weaker: It is a planning surface, not a pixel-perfect design tool. If you need polished marketing comps, Figma Make or Stitch will look better faster.
Licensing: Nimbalyst's desktop and iOS apps are MIT licensed. The local app is free for individuals.

This is the lowest-friction option if your real goal is "mock it up, then have the agent ship it."

v0 by Vercel

v0 has moved well beyond "prompt me a component." Current v0 is a Git-aware development platform with an integrated editor, previews, automatic branching, automatic commits, and pull requests.

What comes out: Real app code. v0 is strongest with Next.js, React, Tailwind, and shadcn/ui style workflows.
Why developers like it: It can work on existing GitHub repos, generate production-leaning code, and manage a PR-based workflow without leaving the product.
Where it is weaker: The mockup is no longer the center of the product. v0 is closer to an app builder than a dedicated mockup tool, so it can be heavier than necessary for low-fi planning.
Licensing: Proprietary hosted product.

If you already want Git-backed code and Vercel-style deployment, v0 is one of the strongest choices in the category.

Lovable

Lovable sits in the same broad camp as v0, but leans even harder toward "describe the product, get a working app."

What comes out: A full-stack app project, not just a screen or component.
Why developers like it: It is fast for turning a product idea into something runnable and shareable. GitHub connectivity gives it a stronger ongoing workflow than a one-time export model.
Where it is weaker: It is better for building the app than for maintaining a durable mockup artifact that sits beside the code as a design spec. If your team wants a mockup that stays the source document, Lovable is not the cleanest shape.
Licensing: Proprietary hosted product.

Choose Lovable when the mockup is a waypoint on the way to a working product, rather than a stable design artifact for an external coding agent to keep revisiting.

Figma Make

Figma Make is the strongest option when your team already works in Figma and wants prompt-to-app behavior inside that ecosystem.

What comes out: A Figma Make file that can become a functional prototype or web app. Figma also supports code export paths and broader Dev Mode workflows.
Why developers like it: It fits the design system many product teams already have, and it gives a cleaner bridge to engineering than older static-design handoff models.
Where it is weaker: The canonical artifact still lives in Figma's cloud, not in your repo. An external coding agent can work from exported code or a bridge, but not as directly as it can from a repo file.
Licensing: Proprietary hosted product.

If your organization is already deep in Figma, Make is the default answer. If you want the mockup to be repo-native, look elsewhere.

Bolt.new

Bolt.new is best understood as a browser-native app builder with strong GitHub integration and fast iteration speed.

What comes out: A running app project, with version control tied to GitHub when connected.
Why developers like it: Bolt auto-commits changes and can pull external GitHub updates back in. It is good when you want to see something running immediately.
Where it is weaker: Like Lovable, Bolt is more app-builder than mockup editor. It optimizes for getting something live, not for preserving a dedicated mockup artifact a coding agent keeps consulting.
Licensing: Proprietary hosted product.

Bolt is a good fit if "show me the product working now" matters more than preserving a long-lived design document.

Uizard

Uizard is still one of the clearer tools in the low-fi planning lane: wireframes, flows, quick screen sets, and collaboration with non-designers.

What comes out: Uizard-native screens and prototypes, plus handoff assets and export options.
Why developers like it: It is fast for multi-screen ideation, rough product flows, and collaborative planning with PMs and founders.
Where it is weaker: The core artifact is still a cloud design document, not a repo-native file or an MCP-addressable design surface. That makes it weaker for direct coding-agent loops than the repo-first and Git-backed tools above.
Licensing: Proprietary hosted product.

Uizard is strong when the immediate next step is discussion, iteration, or human review. It is weaker when the next step is "agent, implement this."

Google Stitch

Google Stitch is the most interesting free experiment in the category right now.

What comes out: UI designs and front-end code from text or image prompts.
Why developers like it: Google positions Stitch as a bridge between design and development, with paste-to-Figma and front-end code export built into the story.
Where it is weaker: Stitch is still a Google Labs experiment, so the roadmap and long-term durability are less predictable than with a mature product. It also does not give you the same repo-native or MCP-native loop as Nimbalyst Mockups or Subframe.
Licensing: Free experiment from Google Labs, not an open-source product.

If budget matters and you want to test ideas quickly, Stitch is worth trying. I would be cautious about making it the center of a long-lived team workflow.

Subframe

Subframe is one of the few tools in this category that speaks directly to agent-driven developer workflows.

What comes out: React + Tailwind code, CSS inspection, component syncing, and MCP-oriented prompts.
Why developers like it: Subframe explicitly documents setup for Claude Code, Cursor, and Codex MCP servers. That makes it much more agent-friendly than a typical design tool.
Where it is weaker: The canonical design surface still lives in the vendor's cloud. It is cleaner than Figma for code-minded teams, but it still differs from a plain file in your repo.
Licensing: Proprietary hosted product.

If you want a visual editor an agent can actually work with through an MCP-style flow, Subframe is one of the best current options.

What actually preserves the agent loop

From a developer workflow perspective, the eight tools above collapse into three patterns:

1. Repo-native mockups

Nimbalyst Mockups is the cleanest example. The mockup is just a file in the codebase. The same agent that read the plan can read the mockup and implement from it.

2. Git-backed app builders

v0, Lovable, and Bolt.new all fit here. The tool experience is cloud-native, but the code can live in GitHub and move through a real branching workflow. This is much better than one-shot export, but it is still not the same as a design artifact that lives directly in the repo from the start.

3. Design surfaces with export or MCP bridges

Figma Make, Stitch, Uizard, and Subframe fit here, though Subframe is the strongest for direct agent workflows because it explicitly supports code export and MCP-oriented usage.

If you want the shortest path from "I have an idea" to "my agent can ship this," choose from pattern 1 or the stronger end of pattern 2 and 3.

Which one should you pick?

Pick Nimbalyst Mockups if you want the mockup to live in the repo as a file your coding agent can read directly.
Pick v0 if you want a Git-backed app builder that already thinks in code, branches, and PRs.
Pick Lovable if your main goal is to turn product ideas into working apps quickly.
Pick Figma Make if your team already lives in Figma and wants prompt-to-app inside that world.
Pick Bolt.new if speed to a working app matters more than preserving a dedicated mockup artifact.
Pick Uizard if you want fast low-fi ideation and collaborative flows.
Pick Google Stitch if you want a free experiment that outputs both UI designs and front-end code.
Pick Subframe if you want a visual editor with a stronger MCP and code-export story for agent workflows.

Try the repo-native route

If the pattern you want is "the mockup is a file in the repo, and the agent can implement from it without a separate handoff step," that is exactly what Nimbalyst Mockups is for.

The broader workspace runs Claude Code and Codex side by side, mockups sit next to specs and components in git, and the same agent that drafts the mockup can implement it. If that workflow is what you are optimizing for, download Nimbalyst or browse Nimbalyst Mockups in the extension catalog.

FAQ

Which AI mockup tool gives a coding agent the cleanest handoff?

Nimbalyst Mockups gives the cleanest direct handoff because the mockup is a plain .mockup.html file in your repo. Subframe is also strong because it supports MCP workflows and React plus Tailwind export.

Can Claude Code or Codex work from Figma Make?

Yes, but not as directly as a file in your repo. Figma Make can export code, and Figma's broader Dev Mode and MCP tooling can bridge design data into an agent workflow.

Are any of these AI mockup tools open source?

Nimbalyst is the only open-source product in this list. The desktop and iOS apps are MIT licensed. The other tools here are proprietary cloud products or experiments.

What should developers optimize for when choosing an AI mockup tool?

Optimize for the artifact that comes out the other end: whether the mockup becomes a repo file, a Git-backed codebase, an MCP-addressable design surface, or a cloud-only design document.

Karl Wirth is a founder of Nimbalyst, an open-source visual workspace where you work with agents, sessions, tasks, and files, and edit markdown, mockups, diagrams, diffs, and code. Nimbalyst Mockups generates AI UI mockups that sit as files in your repo.

FAQ

Which AI mockup tool gives a coding agent the cleanest handoff?

Nimbalyst Mockups gives the cleanest direct handoff because the mockup is a plain .mockup.html file in your repo. Subframe is also strong because it supports MCP workflows and React plus Tailwind export.

Can Claude Code or Codex work from Figma Make?

Yes, but not as directly as a file in your repo. Figma Make can export code, and Figma's broader Dev Mode and MCP tooling can bridge design data into an agent workflow.

Are any of these AI mockup tools open source?

Nimbalyst is the only open-source product in this list. The desktop and iOS apps are MIT licensed. The other tools here are proprietary cloud products or experiments.

What should developers optimize for when choosing an AI mockup tool?

Developers should optimize for the artifact that comes out the other end: whether the mockup becomes a repo file, a Git-backed codebase, an MCP-addressable design surface, or a cloud-only design document.

Best Mobile Apps for OpenAI Codex in 2026

Karl Wirth — Fri, 24 Jul 2026 21:00:00 +0000

On May 14, 2026, OpenAI put Codex into the ChatGPT mobile app. That changed the answer to "codex mobile app" overnight.

If we want to use Codex from a phone in 2026, we now have one official first-party path, one browser fallback, two terminal workarounds, and one native iOS app built around the fact that many teams run Codex and Claude Code side by side. Those options are not interchangeable. Some are great for quick approvals. Some are good enough only if we already live in tmux. Some are best avoided unless we have no other path.

This guide covers the five realistic ways to work with Codex from a phone in mid-2026, what each one actually runs, what the phone can and cannot do, and which teams each option fits. We list our own iOS app inside the lineup, but keep it in the same table and the same standard as everything else.

At a Glance

Option	What actually runs	Best mobile use	Local Codex	Cloud Codex	Claude Code too
ChatGPT mobile app	Codex on a connected laptop, Mac mini, devbox, or remote environment	Official live steering, approvals, check-ins	Yes	Yes	No
Nimbalyst iOS	Codex CLI sessions hosted by Nimbalyst desktop on the Mac	One mobile board for Codex and Claude Code	Yes	No	Yes
Tailscale + SSH	Codex CLI on our own machine	Raw terminal access	Yes	No	Yes, manually
ChatGPT web on mobile	Codex web in a phone browser	Browser fallback when the app is not available	No	Yes	No
Conductor + Tailscale handoff	Codex in a Conductor workspace on a Mac, reached over SSH	Conductor fallback for existing users	Yes	No	Yes

What Mobile Codex Means in 2026

The phrase codex mobile app now covers three different patterns:

The official OpenAI path: ChatGPT mobile connects to machines where Codex is running and lets us steer live work from the phone
The cloud task path: Codex web inside ChatGPT can still run cloud tasks tied to GitHub
The self-hosted path: Codex CLI runs on our own Mac or devbox and we reach it through a third-party mobile surface or plain SSH

A useful mobile workflow still comes down to four things:

Checking live progress without opening the laptop
Answering a clarifying question or approving the next step
Reviewing the diff or at least the result summary
Starting a new task while the idea is still fresh

The five options below cover that ground in very different ways.

The Options

1. ChatGPT Mobile App

Platform: iOS, Android | Price: Included with ChatGPT plans, with usage limits that vary by plan

This is now the default first-party answer for Codex on a phone. OpenAI's current mobile story is not just "cloud tasks in ChatGPT." The app can connect to machines where Codex is already running and load the live state of that environment on the phone.

How Codex actually runs: Codex runs on a connected machine such as a laptop, Mac mini, devbox, or managed remote environment, while the ChatGPT mobile app mirrors the live thread state, approvals, and project context. Codex web inside ChatGPT still exists for cloud tasks, but the mobile app is now the main first-party surface for active work.

What we can do from the phone:

Check active threads across connected hosts
Answer questions and approve the next command or action
Review screenshots, terminal output, test results, and diffs
Start new work from the phone and keep the thread moving while away from the desk

What we cannot do:

See Claude Code sessions on the same board
Get a vendor-agnostic queue across different agent harnesses
Avoid the ChatGPT and Codex account model entirely

Best for: Teams that already live inside the OpenAI stack and want the cleanest first-party mobile path for Codex.

2. Nimbalyst iOS

Platform: iOS native | Price: Free (MIT licensed)

Nimbalyst is an open-source visual workspace that runs Claude Code and OpenAI Codex side by side, with pluggable agent harnesses. The iOS app is a native companion to the desktop, not a web wrapper. We built it because no other tool gave us one mobile board for both Codex and Claude Code sessions.

How Codex actually runs: Codex sessions run on the Mac under the Nimbalyst desktop app, using the official Codex CLI under the hood. The iOS app syncs session state, transcripts, and diffs through the Nimbalyst sync layer so we can see and act on those sessions from a phone.

What we can do from the phone:

See every Codex session across every project on a single kanban board, color-coded by status
Read the full session transcript
Review diffs in a mobile-native viewer with red and green highlights, file-by-file swipe, and zoom
Reply to a session that is waiting for input, by keyboard or dictation
Start a new local Codex task with a project, branch, and prompt
Monitor Claude Code sessions on the same board, with no app switching
Receive push notifications when a session finishes, fails, or asks a question

What we cannot do:

Manage ChatGPT-hosted Codex web tasks today
Replace the desktop for full code editing
Help teams that do not want a Mac host for local Codex sessions

Pricing: The desktop and iOS apps are MIT licensed and free. The optional collaboration server can be self-hosted.

Best for: Teams running both Codex and Claude Code who want one mobile surface for both, with real diff review instead of terminal output.

3. Tailscale Plus SSH in Blink Shell, Termius, or Termux

Platform: iOS, Android | Price: Tailscale free or $5 per user per month for teams, plus the SSH client if paid

This is the direct terminal path. We put Tailscale on the Mac or devbox and on the phone, open a real SSH client such as Blink Shell or Termius, and drive Codex CLI remotely. iSH is possible on iOS, but it is more of a hobbyist workaround than the clean path.

How Codex actually runs: Codex CLI runs on our own machine. Tailscale is the network layer. We still need SSH on the host, either standard SSH over the tailnet or Tailscale SSH enabled on the destination.

What we can do from the phone:

See the Codex CLI live in a terminal
Type commands, answer prompts, kill the session, and reconnect later
Keep the exact same local Codex workflow we already use on desktop

What we cannot do:

Get a good visual diff review experience
Manage many sessions gracefully on one small screen
Survive host sleep unless we prepared the session with tmux, screen, or similar
Get push notifications unless we wire them up separately

Best for: Developers who already live in tmux and want the lowest-friction way to keep a local Codex CLI session reachable from a phone.

4. ChatGPT Web on a Mobile Browser

Platform: Any mobile browser | Price: Same as the underlying ChatGPT plan

This is the fallback if we cannot or do not want to use the native ChatGPT app. If we already use Codex web in ChatGPT, the phone browser gives us a usable but weaker version of that same surface.

How Codex actually runs: For cloud tasks, Codex runs in OpenAI's hosted environment tied to ChatGPT and GitHub. The browser is just the client surface. In practice this is the mobile browser version of Codex web, not a separate product.

What we can do from the phone browser:

Open Codex web from the same ChatGPT account
Check recent tasks and read summaries
Start lightweight new tasks
Jump out to GitHub when the next step is PR review

What we cannot do:

Match the native mobile app on ergonomics
Depend on a polished phone UI for long review sessions
Treat it as the best path if the ChatGPT app is available

Best for: Quick checks from a borrowed phone, tablet browser, or locked-down device where installing the app is not an option.

5. Conductor With a Tailscale Handoff

Platform: macOS desktop, with SSH over Tailscale as the mobile bridge | Price: Conductor is free today, plus Tailscale if we use it

Conductor is a Mac app for running Codex and Claude Code in isolated workspaces. It does not have a native mobile app. If we want phone access today, the practical path is to reach the underlying Mac over SSH and work from there.

How Codex actually runs: Conductor runs agents locally on the Mac inside its isolated workspaces. The phone does not get the Conductor UI. It gets whatever terminal access we have to that host.

What we can do from the phone:

SSH into the Mac that hosts the Conductor workspace
Inspect the workspace from the terminal
Resume or steer the underlying Codex session if the terminal workflow is already in place

What we cannot do:

See Conductor's native diff viewer on the phone
Get a first-party mobile UI from Conductor
Turn Conductor into a great phone experience without adding our own terminal habits and scripts

Best for: Conductor users who want any phone access at all and are willing to live in a terminal to get it.

Comparison Table

Feature	Nimbalyst iOS	ChatGPT App	Tailscale + SSH	ChatGPT Web	Conductor + SSH
Native mobile app	Yes	Yes	No	No	No
Runs local Codex sessions	Via desktop sync	Yes	Yes	No	Yes
Manages cloud Codex tasks	No	Yes	No	Yes	No
Visual diff review on phone	Yes	Yes	No	Limited	No
Reply to a waiting session	Yes (chat)	Yes	Yes (terminal)	Limited	Yes (terminal)
Start new sessions from phone	Yes	Yes	Yes (manual)	Yes	Yes (manual)
Push notifications	Yes (per session)	App-level	No	No	No
Handles Claude Code too	Yes	No	Yes (manually)	No	Yes
Host must stay on for local work	Yes	Yes	Yes	n/a for cloud	Yes
Open source	Yes (MIT apps)	No	Partial	No	No
Price	Free	Included with plan	Free to low-cost	Included with plan	Free + Tailscale

FAQ

Is there an official Codex mobile app?

Yes. As of May 14, 2026, Codex is in preview inside the ChatGPT mobile app on iOS and Android. That is now the official first-party mobile path.

Can we run Codex CLI directly on an iPhone?

Not in the way most people mean it. In practice, we run Codex on a Mac, devbox, or remote environment and control or monitor it from the phone through ChatGPT mobile, SSH, or a third-party mobile surface.

What is the best codex mobile app if we also use Claude Code?

If we want one mobile surface for both, Nimbalyst iOS is the strongest fit in this list. The official ChatGPT app is stronger for pure Codex, but it does not help with Claude Code mobile workflows.

Is `claude code mobile` the same thing as Codex mobile?

No. claude code mobile and Codex mobile are different product ecosystems. The overlap happens in third-party tools and terminal workflows, not in the official first-party apps.

How to Choose

We want the official Codex path with the fewest moving parts: use the ChatGPT mobile app
We run both Codex and Claude Code and want one board for everything: use Nimbalyst iOS
We already live in tmux and care more about raw access than mobile UX: use Tailscale plus a real SSH client
We only need a browser fallback: use ChatGPT web on mobile and keep expectations low
We already organize agent work in Conductor: treat SSH over Tailscale as the bridge, not as a polished mobile product

Mobile matters because longer-running agent work creates more judgment checkpoints away from the desk. If we can answer a question, approve a diff, or redirect a task from the phone, the work keeps moving. If not, every coffee run and commute becomes dead time.

Download the Nimbalyst iOS app to manage Codex and Claude Code sessions from a phone, and the Nimbalyst desktop app to run them on the Mac.

Claude Code Desktop vs Nimbalyst after using both for a month

Karl Wirth — Fri, 24 Jul 2026 19:00:00 +0000

Thirty days, two apps, one answer

For the last month we used Claude Code Desktop and Nimbalyst side by side on the same repos, for the same kind of work, on the same machines. We expected to settle on one. We did not.

It surprised me.

Anthropic's desktop app is now much better than the caricature people had in mind a few months ago. The current desktop app gives you parallel Code sessions with git worktree isolation, a built-in terminal, file editor, visual diff review, side chats, remote sessions, and Computer Use. It is a real product.

If you searched for a Claude Code Desktop alternative, the short answer after a month is:

Pick Claude Code Desktop if you mostly want the best official surface for one coding session at a time, the newest Anthropic features first, or Computer Use.
Pick Nimbalyst if your bottleneck is managing many agent sessions across code, specs, mockups, diagrams, and reviews.
Keep both installed if your week contains both of those shapes.

We landed on the third answer.

What we are actually comparing

Claude Desktop now has three tabs: Chat, Cowork, and Code. For developers, the relevant comparison is mostly against the Code tab, with Cowork mattering in two places. Dispatch can kick off coding work, and Computer Use spans the broader desktop experience.

As of June 1, 2026, Dispatch lives in Cowork rather than the Code tab directly, and Anthropic limits Dispatch to Pro and Max plans. So it is relevant here, but not identical to the day-to-day Code workflow.

Picking the right comparison matters because the wrong one makes the whole article mushy.

If the question is "does Anthropic now have a real first-party GUI for Claude Code?" the answer is clearly yes.

If the question is "does that eliminate the need for a workspace around coding agents?" my answer after using both is no.

Nimbalyst sets out to be more than a nicer skin on top of Claude Code. A clearer description: Nimbalyst is the open-source visual workspace for running Claude Code and Codex side by side. Agents, sessions, tasks, and files in one place. You can edit markdown, mockups, diagrams, diffs, and code. The desktop and iOS apps are MIT licensed. Our product thesis is that once you are running multiple agents and working across more than source files, the unit of work stops being a chat thread and starts being a session in a workspace.

Two different jobs, two different products.

Where Claude Code Desktop wins

Claude Code Desktop has three advantages worth calling out.

1. It is the official surface

Anthropic ships Claude Code, so Anthropic's app gets new Claude Code capabilities first. If a new model, tool, plugin path, or workflow primitive ships, the official app is the shortest path to using it. Third-party tools can catch up quickly, but quickly is still later than day one.

There is also a trust advantage in the official path. The desktop app runs the same underlying Claude Code engine with a graphical interface around it. If you want the least translated, least mediated path into Claude Code, the official app wins that category.

2. It is now very good at focused coding work

Older comparisons understate this part.

Claude Code Desktop is no longer a one-session toy. Anthropic now supports multiple Code sessions in parallel, and for git repos each session gets its own isolated worktree by default. A lot of the file-stomping pain that used to make desktop wrappers feel lightweight has been removed.

For one developer doing one concentrated piece of engineering work, the app feels good. The built-in terminal matters. The file editor matters. Side chats are smarter than they sound. Visual diff review matters. Remote sessions matter once a test suite or migration runs long enough that you do not want it tied to your laptop.

If your workflow is mostly "open one repo, push one task hard, stay inside Claude Code," Anthropic has built a serious answer.

3. Computer Use is strongest here

Computer Use is the biggest reason we would not uninstall Claude Desktop.

As of June 1, 2026, Anthropic offers Computer Use inside Claude Desktop on macOS and Windows, in research preview for Pro and Max plans. Other apps can build on Anthropic's Computer Use APIs, but Anthropic has the most integrated no-setup path today.

When you need an agent to drive a browser, click through an internal tool, open a simulator, or operate software with no API worth talking to, the official desktop app has a real edge.

Where Nimbalyst is better for our actual week

Once Claude Desktop became a credible multi-session coding app, the Nimbalyst case got narrower and clearer.

Our stronger claim is simpler: the workspace problems start after the coding session.

1. Session management over time, not just in the moment

Anthropic has already fixed a lot of the old "open another terminal" pain, which counts as progress. What we still hit on busy weeks is the question of state across twelve sessions and three projects, and which of them need attention first.

Nimbalyst's kanban is built around that question. Each session is a work item with phase, context, transcript, files, and links to the surrounding work. Planning, implementing, validating. It sounds almost trivial until you have enough agent output that you cannot hold it all in your head anymore.

Claude Desktop helps us run sessions. Nimbalyst helps us manage the inventory of sessions.

We saw that distinction stay true all month.

2. The work is not just code

A lot of AI coding commentary still assumes the artifact is always source code plus a diff.

Our weeks do not look like that anymore.

A feature usually starts with a markdown plan, then a mockup, then a diagram, then maybe a schema change, then the code, then a review pass. Claude Desktop is much better once code exists. Nimbalyst is better when the work spans mockups, diagrams, data models, spreadsheets, markdown, and code in the same place.

Here is where the lived difference shows up: the agent can work with the actual artifact, in the same workspace, without us bouncing between four tools and restitching context through prompts.

It changes how much setup each task needs.

3. Claude Code and Codex in one workspace

The canonical Nimbalyst line matters here because it is exactly the point. Nimbalyst is the open-source visual workspace for running Claude Code and Codex side by side.

Some tasks are better in Claude Code. Some are better in Codex. Sometimes we want a careful second read from one agent on work the other one produced. Sometimes we want the same plan run by both.

The official Anthropic app should be Claude-first. Anthropic's job is to be the best home for Claude. Nimbalyst is useful because the workspace stays useful even when the best agent for the task changes.

4. Mobile matters more than people admit

We do not want to write production code from a phone.

We do want to approve a change, answer a blocked question, read the last transcript output, or check whether a session finished while we were away from the desk.

A native mobile companion fills exactly that kind of low-friction gap. It is not glamorous, but it changes how often you can keep parallel work moving.

A week with both installed

Here is the concrete split that showed up for us.

Monday: We opened Nimbalyst first. Three sessions went onto the board: one code refactor, one planning/spec pass, one visual artifact task. The kind of morning where the workspace matters more than the individual chat.

Tuesday: We opened Claude Desktop for a deep architectural question we wanted to stay inside one long coding thread. Not about orchestration. About depth. The official app was the right tool.

Later Tuesday: We needed an agent to navigate software with no API, click through a settings flow, export a config, and bring it back. Claude Desktop handled that through Computer Use. Justifies keeping it around on its own.

Wednesday and Thursday: Back in Nimbalyst. Multiple parallel sessions, several artifacts open at once, one careful second-pass review in Codex, one long-running implementation in Claude Code. The board, the visual editors, and the mixed-agent setup compound here.

Friday: We were away from the desk and still wanted to keep work moving. A Nimbalyst-shaped moment. Read the output, approve one thing, send another session back with a correction, move on.

The pattern ended up being stable.

Claude Desktop handled the official first-party path, deep single-thread work, and Computer Use.

Nimbalyst handled the workspace.

Cost and pricing, stated carefully

Pricing here is not a "buy one app or the other" question.

Claude Desktop is free to download, but meaningful Claude Code and Cowork usage sit on paid Claude plans. As of June 1, 2026, Anthropic lists Pro at $20 per month and Max at $100 or $200 per month for individual users, with Team and Enterprise sold separately. Dispatch and the current Computer Use preview also have plan-specific limits.

Nimbalyst's desktop app is free and open source. You bring the Claude access or API usage you already use for the underlying agent.

So the practical cost question is less about app price and more about whether a workspace layer saves you enough coordination overhead to matter.

For us, it does.

If you want a Claude Code Desktop alternative

Our actual answer is narrower than "switch."

If you want the best official way to run Claude Code with the newest Anthropic features and the strongest Computer Use story, use Claude Code Desktop.

If you want a workspace for running many agent sessions, across code and non-code artifacts, with Claude Code and Codex side by side, use Nimbalyst.

If your work contains both deep single-thread coding and board-level orchestration, run both. The two products cover different layers of the stack, and using them together is a normal pattern rather than a sign of indecision.

A month in, we ended up there.

If you want the workspace layer, download Nimbalyst and run it next to Claude Code Desktop for a week. The split shows up quickly.

FAQ

Is Nimbalyst a good Claude Code Desktop alternative?

Yes, if your pain is not 'I need a GUI for Claude Code' but 'I need a workspace for multiple agent sessions, visual artifacts, and planning.' Claude Code Desktop is stronger as the official first-party app. Nimbalyst is stronger when you need a workspace layer above Claude Code.

When should I use Claude Code Desktop instead of Nimbalyst?

Use Claude Code Desktop when you want Anthropic's official surface, the newest Claude Code features first, or Computer Use in the most integrated setup. It is especially strong for deep single-thread coding work.

Can I use Claude Code Desktop and Nimbalyst together?

Yes. We landed on that setup. Claude Code Desktop handles the official Anthropic path, deep single-session work, and Computer Use. Nimbalyst handles parallel sessions, visual editors, planning, and mobile review.

DEV Community: Karl Wirth

Codex vs Claude Code (2026): A Real Head-to-Head

Codex vs Claude Code at a glance

Which should you pick

What "harness" actually means

The Codex harness options

1. Codex CLI

2. Official Codex Desktop App

3. Codex in ChatGPT and the cloud

4. Third-party Codex GUIs

The Claude Code harness options

1. Claude Code CLI

2. Claude Code in Claude Desktop and VS Code

3. Third-party Claude Code GUIs

Harness-vs-harness, head to head

When each harness wins

The honest verdict

Related Reading

FAQ

Is Codex better than Claude Code?

Codex vs Claude Code: which is cheaper?

Can I use Codex and Claude Code together?

Codex vs Claude Code for long autonomous runs?

Does Codex have a mobile advantage over Claude Code?

Codex vs Claude Code on Linux?

Codex vs Claude Code: which should I pick in 2026?

Why we put Obsidian, Linear, Terminal, Codex app, and Conductor in one workspace

The integration tax

What integration actually buys you

A few concrete examples

What Nimbalyst is

How to Use OpenAI Codex with a Visual Workspace

What you will be able to do at the end

Step 1. Install the Codex CLI

Step 2. Install the visual workspace

Step 3. Write a real AGENTS.md

Step 4. Start a Codex session in the workspace

Step 5. Review every diff visually

Step 6. Run a second Codex session in parallel

Step 7. Plan features in markdown the agent reads from

Step 8. Monitor from your phone

Common questions

Where to go next

Best Tools for Agentic Coding in 2026

Four layers, not one list

Layer 1: The agents

Layer 2: The IDE / editor layer

Layer 3: The workspace layer

Layer 4: The harness

A working stack

Where the field is still weak

Best Desktop App for Claude Code on Mac

What "desktop app for Claude Code" actually means in 2026

Terminal Claude Code

Anthropic's desktop app for Claude Code

Nimbalyst

A quick honest matrix

What I would actually recommend

Best Agent Harness for Claude Code and Codex: How to Choose

What an agent harness actually is

The parts of a real harness

Four approaches you can evaluate

Why this matters more for Claude Code and Codex than for any single agent

Evaluation criteria for the best harness

Where Nimbalyst fits

How to pick (or build) yours this quarter

FAQ

What is the best agent harness for Claude Code and Codex?

Can Claude Code and Codex share one project harness?

Should I choose an open-source or in-house agent harness?

How should I evaluate an agent harness?

Integrate 80% of everything for agent and human context

Why deep integration matters for agent context

Context as a graph, not a pile of tabs

Integrated visual editors are how the human stays in the loop

Why we are building the 80% that matters

Nimbalyst is one example you can learn from and use

Claude Code Pricing for Engineering Teams (2026 Guide)

Claude Code Pricing: Quick Answer

The Four Billing Paths That Matter