David Van Assche (S.L)

Posted on Jun 23 • Edited on Jun 24

Top AI Coding CLIs of 2026 (mid year): the harness is everything

#ai #opensource #coding #cli

The May piece (Every AI Coding CLI in 2026: the complete map, 30 tools compared) sorted tools by pricing and openness. That is one axis. It is not the axis a reader actually cares about when they open a terminal and need to pick something. The axis that matters is surface affordance: what shape of work am I trying to do, and what shape of tool meets that shape?

This follow-up reorganises around that question, narrows to engineers using terminals, and refreshes everything that changed in 30 days. The Desktop / Cloud Web / Cloud Agent piece (for non-technical builders) is the companion article.

Two framing observations before the lists.

First: the wars are over and the wire format is settled. Every serious CLI harness in mid-2026 accepts at least one of OpenAI-compatible or Anthropic-Messages endpoints. The "which protocol" question stopped mattering. The "which provider, which model, at what price" question is the live one.

Second: "free" almost always means free-software-plus-paid-tokens. The CLI is open source and free; the API tokens it spends are not. There are still a handful of genuinely-free-with-a-real-model offerings, but the centre of gravity moved.

A clean four-way taxonomy

After two articles' worth of poking at this, the cleanest split is four categories. Mutually exclusive, collectively exhaustive, no overlap arguments:

Category	Where you work	Examples
CLI	A terminal	Claude Code, Codex CLI, Aider, OpenCode
Desktop	A native IDE or extension	Cursor, Devin Desktop, Zed, Continue
Cloud Web	A browser tab	claude.ai, ChatGPT Canvas, Gemini Canvas
Cloud Agent	Async, you hand off a task	Devin, Replit Agent, Bolt, Lovable

This article is about the first column. The rest live in the companion piece.

A few tools span two surfaces. When they do, place them by primary user surface (where the person actually sits) and note the secondary as an architectural property, not a separate product.

What changed in 30 days

The AI coding tooling landscape moves fast enough that any month-old map needs a delta page.

Retired, sunsetted, or in EOL mode.

Phind shut down January 16, 2026, citing "commoditisation by foundation-model providers". The cleanest single example of the thesis.
Gemini CLI sunsets paid auth June 18, 2026. Replaced by Antigravity CLI (GA since May 19 at I/O).
Roo Code announced shutdown April 21, archived May. Users migrate to Kilo Code.
Cascade (Windsurf's agent) end-of-life July 1, 2026.
iFlow CLI sunset announced April 17, 2026. Verify before recommending.
Amazon Q Developer: new signups blocked from May 15, 2026. Existing seats only. Effectively EOL.

Rebranded.

Windsurf is now Devin Desktop (June 2, 2026), following Cognition's $250M acquisition.
Mistral's Le Chat is now Vibe (May 28, 2026). The Vibe CLI shipped its 2.0 in January.
Goose moved from Block to the Linux Foundation Agentic AI Foundation in April 2026. New repo at aaif-goose/goose. Foundation-governed vendor neutrality, not just code-level neutrality.

Pricing churn worth noting.

Qwen Code's free OAuth retired April 15, 2026. Cheapest paths today: local Qwen3-Coder on Ollama (~46GB at 4-bit), OpenRouter's qwen/qwen3-coder:free rate-limited tier ($10 one-time buys 1k req/day), or Alibaba's $50/mo ModelStudio Coding Plan (about 90k req/mo). The CLI is still Apache-2.0; you just cannot run it free against Alibaba any more.
Devin's enterprise-only $500/mo floor collapsed in April 2026. Now $20/mo + $2.25 per ACU. Biggest pricing event of the year for cloud agents (covered in the companion piece).
GitHub Copilot moved to usage-based AI Credits on June 1, 2026. Free CLI now token-metered.
DeepSeek V3 / R1 deprecate July 24, 2026, replaced by V4 (Pro + Flash, April 2026). Anything on the original article that referenced V3 pricing is already obsolete.

If you are working from a one-month-old map, this is your patch.

CLIs in mid-2026: free vs paid

I am splitting by tier because at the terminal you mostly care about two things: how much it costs, and whether you can point it at your own model. For each, I am noting the third-party API support and the local-inference path (Ollama, LM Studio, llama.cpp, vLLM), because in mid-2026 that combination is what defines a flexible harness.

Free CLI, BYO tokens or local inference

These are open-source or free-tier CLIs. The tool is free; what you pay for is the model behind it. All of them accept OpenAI-compatible endpoints and most also accept Anthropic-Messages, which means you can route them at any provider whose API speaks one of those two protocols (including DeepSeek, Qwen via OpenRouter, GLM/Z.AI, Kimi, Mistral, plus any local Ollama / LM Studio / vLLM endpoint).

Tool	What it is	Model-specific tilt	Note
Aider	Apache-2.0, git-native pair programming. Repomap + auto-commit.	None (LiteLLM under the hood)	Mature, ~39k stars
Goose	Foundation-governed (LF AAIF since April 2026). 15+ providers.	None	~29k stars
OpenCode (sst)	Apache-2.0; 75+ providers via Vercel AI SDK + Models.dev registry	None	~172k stars, the breakout
Crush (Charm)	Apache-2.0. Mid-session model switching. TUI polish.	None	Active
Codex CLI (OpenAI)	Free if you have any ChatGPT plan (uses plan quota); `--oss` flag for local Ollama; BYOK at API rates otherwise	OpenAI-tilted, `--oss` opens it up	April 2026 billing → token credits
Antigravity CLI (Google)	Free tier via Google account	Strongly Gemini	Replaced Gemini CLI on May 19
Qwen Code (Alibaba)	Apache-2.0 CLI; free OAuth retired April 15, 2026. Now: OpenRouter free-rate, Fireworks, DashScope, local Ollama, or paid Alibaba plan	Qwen3-Coder	Active, tier change
Mistral Vibe CLI	Apache-2.0 CLI. Devstral 2 free at launch (planned $0.40/$2.00 per 1M tokens)	Devstral 2 / Mistral	Vibe 2.0 January 2026
Kimi Code CLI (Moonshot)	OSS CLI; Modified MIT model weights; BYOK to Kimi API	Kimi K2 family	K2.7-Code shipped June 12
Hermes Agent (Nous Research)	OSS. Native CLI + TUI + desktop app. "Agent that grows with you."	None — multi-provider, Llama 4 local documented	Blank Slate mode June 20
DeepSeek-TUI / Deep Code	Community projects, no first-party CLI from DeepSeek	DeepSeek V4	Active
sgpt (shell-gpt)	OSS. One-shot shell-native. LiteLLM bridge to everything.	None	Last update May 2026
OpenClaw	OSS BYOK. Kilo Gateway aggregator option (500+ models at 0% markup).	None	Active
Kilo CLI	MIT CLI sibling of Kilo Code extension	None	Active

A note on what "model-specific tilt" means: harnesses like Qwen Code, Kimi Code CLI, Mistral Vibe, and Antigravity are purpose-built around one family of models, with provider configs and prompt formatting tuned to that family. They will run other models via OpenAI-compatible endpoints, but their default registry and best-tested path is the family they ship for. The model-agnostic ones (Aider, Goose, OpenCode, Crush, Hermes, OpenClaw, sgpt) treat providers as interchangeable from day one.

A note on what is genuinely free at the model level (not just at the CLI level): Codex CLI on ChatGPT Free tier, Antigravity CLI on Google's free Gemini tier, GitHub Copilot Free plan (limited credits), Amp by Sourcegraph (free while the ad-supported model is in test), OpenRouter's free Qwen3-Coder route (rate-limited). Everything else is BYOK and you pay for the tokens.

A note on what is missing first-party: DeepSeek has no first-party CLI (Deep Code and DeepSeek-TUI are community). Zhipu/GLM ships day-one integrations into Claude Code / Cline / Goose / OpenCode / Crush / Kilo rather than its own CLI. Meta has no Llama-native CLI; Hermes Agent is the de-facto Llama-4 host. gpt-oss lives inside Codex CLI's --oss mode and every BYOK harness via Ollama.

Paid-subscription CLIs

These require an active subscription to use meaningfully. No real free tier for daily-driver work.

Tool	Pricing	Provider matrix	Local	Tilt
Claude Code (Anthropic)	Bundled into Claude paid plans, or BYOK at Anthropic API rates. No free CLI usage.	Anthropic-Messages only on managed; subagent/extension layer is the workaround	No first-party local	Strongly Claude
GitHub Copilot CLI	Free plan exists but CLI consumes AI Credits (1 credit = $0.01 since June 1, 2026); Pro $10, Pro+ $39, Max $100 per month	Claude Code + OpenAI Codex wired as third-party agents inside Copilot	No	Multi-model
Amp (Sourcegraph)	Currently free during ad-model test; Enterprise tier exists	Claude Opus 4.7 + others via Sourcegraph	No	None
Alibaba ModelStudio Coding Plan	$50/month, ~90k req/mo, 6k per rolling 5-hour window	Replaces the retired Qwen Code free tier	N/A	Qwen3-Coder

Routers and proxies

Worth knowing about because most of the free-tier CLIs above can be pointed at one of these as a single aggregated provider.

OpenRouter — multi-provider aggregator with one API surface. Paid per-token. Free routes available for some models (Qwen3-Coder, Hermes 3 405B).
Together AI / Fireworks / DeepInfra / Groq — US-jurisdiction hosters serving Chinese and Meta open weights. Useful when you want capability without the data-residency tradeoff.
Kilo Gateway — 500+ models at 0% markup. Pairs cleanly with OpenClaw and Kilo CLI.
9router, CLIProxyAPI — OSS self-hosted routers if you want to manage your own bills and rate limits.

Where does the tool stop

A second axis worth tracking is not just what tool, but where does it stop:

suggest → edit → commit → PR → deploy → live app

Your choice is partly about how far you want automation to terminate. Aider stops at edit. Claude Code stops at PR. Codex CLI stops at PR (sandboxed). Devin stops at deploy. Bolt stops at live app.

Pick by where you want the handoff back to the human, not just by capability.

Open-weights cloud API pricing (June 2026 snapshot)

If you are pointing an OSS CLI at a cloud model, here is the current landscape. All prices USD per million tokens. Verified against official provider pages on 2026-06-21.

Model	Provider	In	Out	Cached In	Context	License
DeepSeek V4-Flash	api.deepseek.com	0.14	0.28	0.0028	1M	MIT
DeepSeek V4-Pro	api.deepseek.com	0.435	0.87	0.003625	1M	MIT
GLM-4.6	Z.AI	0.60	2.20	0.11	200K	MIT
GLM-4.5-Air	Z.AI	0.20	1.10	0.03	128K	MIT
Qwen3-Max (Intl)	Alibaba Model Studio	1.20	6.00	tiered	252K	Apache 2.0
Qwen3-Coder-Plus	Alibaba Intl	1.00	5.00	tiered	1M	Apache 2.0
Kimi K2.6	platform.kimi.ai	0.95	4.00	0.16	200K	Modified MIT
MiniMax M2.7	MiniMax PayGo	0.30	1.20	0.06	197K	MIT
Mistral Large 3	mistral.ai	0.50	1.50	n/d	128K	Apache 2.0
Mistral Small 4	mistral.ai	0.10	0.30	n/d	128K	Apache 2.0
Devstral 2	mistral.ai	0.40	2.00	n/d	128K	Apache 2.0
Llama 4 Maverick	DeepInfra	0.15	0.60	n/d	1M	Llama 4 Community
Llama 4 Scout	Together AI	0.08	0.30	n/d	10M	Llama 4 Community
gpt-oss-120B	Groq	0.15	0.60	n/d	128K	Apache 2.0
gpt-oss-20B	Groq	0.075	0.30	n/d	128K	Apache 2.0

Three things to notice.

First, DeepSeek V4-Flash at $0.14 per million input tokens is the floor for credible agentic coding. Anything cheaper is a smaller-capability tier. The May article's pricing data was V3-era and is already obsolete.

Second, open-weights API pricing dropped roughly 80% year-on-year (early 2025 to early 2026, per inference.net's cross-provider analysis). Chinese providers drove the floor; US hosters competed on serving speed rather than on the models themselves.

Third, tool-use is now universal on this list. By mid-2026, function-calling stopped being a differentiator. Context windows similarly inflated: 128K is the floor, 200K is normal, frontier pushes to 1M and 10M.

One pitfall worth flagging: Qwen's tiered billing. Alibaba bills the whole request at the tier set by the input-token count. A coding agent that swells context mid-conversation can jump from $1.00/Mtok to $6.00/Mtok input in a single step. Worth a footnote in your config.

Sovereignty: EU, USA, World

This is the political part of the picture that does not go away by pretending it is not there. Compliance teams ask about it. Procurement asks about it. So:

China-based providers (DeepSeek, Z.AI/Zhipu, Alibaba/Qwen, Moonshot/Kimi, MiniMax, 01.AI) dominate two ends at once: the cheapest credible frontier and the largest concentration of permissively-licensed flagship weights. Mostly MIT or Apache 2.0 across the board. Data residency depends on which endpoint you call; Singapore international endpoints sit outside mainland jurisdiction, mainland endpoints don't.

US-based hosters (Together, Fireworks, Groq, DeepInfra, OpenRouter) mostly don't own a model. They serve Chinese open weights at US-jurisdiction inference cost. Useful if you want Chinese-trained capability but US-or-EU data residency. The single US-trained open frontier family is Meta's Llama 4 under the Community License (open, but not OSI-approved). OpenAI's gpt-oss-120B/20B (Apache 2.0) is the only fully-permissive US flagship-tier open release.

EU-based. Mistral. That is the list. The only sovereign EU frontier-capable open-weights stack. GDPR-native, La Plateforme runs in EU DCs. Slightly more expensive at the high end than Chinese equivalents, dramatically cheaper than US closed frontier. Codestral, Devstral, Magistral Small, Mistral Large 3, Mistral Small 4 are all Apache 2.0. Magistral Medium (reasoning) is the only closed/premier model in the lineup.

If your compliance position is "no customer data leaves EU jurisdiction" the practical answer in mid-2026 is: self-host open weights on EU hardware, or use Mistral. That is the entire shortlist.

Ecodex: the calibration-first CLI

Disclosure first: I work on Empirica, and Ecodex is Empirica's CLI harness. Alpha, daily-driven by the team, opinionated and based on the Empirica system for Claude.

I am including it because it competes on an axis the rest of this article does not cover. Every CLI above is competing on the same thing: better edits, better context, better tool-use. Ecodex competes on metacognition and governance: it is a coding CLI that is accountable for what it claims to know.

The shape, briefly.

Ecodex is a fork of openai/codex bundled with the Empirica epistemic-discipline framework. It does two things stock codex does not.

Per-action enforcement. A Sentinel firewall sits between the model and the tools. State-changing tools (Edit, Write, Bash on non-read commands) require an open transaction with a passed CHECK gate. Investigation tools (Read, Grep, Glob) flow freely until a hypothesis-bearing prompt arms an investigation-proportionality budget. The agent literally cannot edit a file without first declaring what it knows and passing the gate. The block is not silently dropped; the sentinel emits an explicit permissionDecision: deny and codex honours it.
Per-transaction calibration. Every unit of work opens with a PREFLIGHT (the agent declares thirteen calibration vectors representing its current epistemic state) and closes with a POSTFLIGHT (the same vectors re-declared, then grounded against deterministic services like test results, git metrics, artifact counts). The divergence between what the agent claimed and what actually happened gets recorded. Over time, the divergence becomes a signal you can act on.

Out of the box it ships curated open-weights provider defaults: DeepSeek, Qwen3-Coder, Kimi K2.6, GLM, Mistral (Devstral 2 for agentic coding, Codestral for completion, EU-hosted at api.mistral.ai/v1, shipped by default as of commit c2457d0d6e), and local routes via Ollama, LM Studio, llama.cpp, vLLM. Hot-swap mid-session via /model, no restart.

That Mistral default is worth pausing on if your stack has an EU compliance constraint. It makes Ecodex the only harness in this comparison set that ships an EU-hosted cloud provider as a first-class pick and lets you self-host the same open weights (Devstral is open weights) on your own EU hardware via vLLM or Ollama. Code never leaves the EU on either path.

The 30-second moment, if you want to see the differentiator without reading more:

$ ecodex
> /model        # pick DeepSeek V4-Flash, or local Qwen3-Coder via Ollama
> fix the off-by-one in parse_range in utils.py

The statusline shows the live phase (noetic, then praxic) and an intuition-vs-search badge. The agent reads and greps freely. If it attempts to Edit before grounding, the Sentinel blocks the call with a visible reason ("praxic tool requires CHECK=proceed"). It investigates more, passes CHECK, makes the fix, runs the tests, and at POSTFLIGHT prints the grounded delta. You see belief measured against outcome.

Install paths:

brew install nubaeon/tap/ecodex
# or: cargo install --git https://github.com/Nubaeon/ecodex codex-cli
# or: direct binary from https://github.com/Nubaeon/ecodex/releases/latest

It is alpha. It is opinionated. The discipline overhead is the point. If you do not want a CLI that argues with you about whether you have done enough investigation, this is not the CLI for you. If you do, it is the only one I am aware of that builds that discipline into the harness rather than asking you to remember to do it yourself.

Source: github.com/EmpiricaAI/ecodex. The compliance crosswalk (mapping the substrate to EU AI Act, GDPR, ISO 42001) is at docs/ecodex/positioning/compliance-crosswalk.md, relevant if your stack has a regulatory anchor.

What I would actually use

Three reads for three working contexts. Not the only right answers; just where I would start in mid-2026.

Serious work, paid, closed weights. Claude Code for the reasoning model, Codex CLI for the sandbox. Switch between them per task shape.
Open weights, BYO model. OpenCode for the breadth (75+ providers), Aider for git-native discipline. Goose if you specifically want foundation-governed vendor neutrality.
Open weights with accountability built in. Ecodex, with the caveats above. The category-of-one for now.

What I am not saying

That the duopoly is going away. Claude Code and Codex CLI are not getting cheaper; they are getting better, and the closed-frontier reasoning capability gap is still real.
That open weights are at parity with closed frontier on every task. They are close on coding-specific benchmarks, behind on certain reasoning shapes, and the gap is narrowing fast.
That you should pick by sovereignty before capability. Pick by capability for the task; reach for sovereignty when the compliance position requires it.

The honest summary is: the floor is much lower than it was a year ago, the open tier is genuinely usable for daily work, and the most interesting category to watch is the one that does not compete on "better edits" but on "accountable edits". Reader's pick.

The Desktop / Cloud Web / Cloud Agent companion is the next post in this series. It covers Cursor, the Windsurf-to-Devin-Desktop rebrand, Cloud Agent's price collapse, and the builder versus engineer split.

Comments and corrections welcome. Especially: tools I should have included, pricing I got wrong, or sovereignty framings I am missing.

Series: AI Coding Harness Map (2026). Pricing verified against official provider pages on 2026-06-21.

DEV Community