DEV Community

Cover image for Coding CLIs in mid-2026: the engineer's map (and what changed in 30 days)
David Van Assche (S.L)
David Van Assche (S.L)

Posted on

Coding CLIs in mid-2026: the engineer's map (and what changed in 30 days)

The May piece (Every AI Coding CLI in 2026: the complete map, 30 tools compared) sorted tools by pricing and openness. That is one axis. It is not the axis a reader actually cares about when they open a terminal and need to pick something. The axis that matters is surface affordance: what shape of work am I trying to do, and what shape of tool meets that shape?

This follow-up reorganises around that question, narrows to engineers using terminals, and refreshes everything that changed in 30 days. The Desktop / Cloud Web / Cloud Agent piece (for non-technical builders) is the companion article.

Two framing observations before the lists.

First: the wars are over and the wire format is settled. Every serious CLI harness in mid-2026 accepts at least one of OpenAI-compatible or Anthropic-Messages endpoints. The "which protocol" question stopped mattering. The "which provider, which model, at what price" question is the live one.

Second: "free" almost always means free-software-plus-paid-tokens. The CLI is open source and free; the API tokens it spends are not. There are still a handful of genuinely-free-with-a-real-model offerings, but the centre of gravity moved.


A clean four-way taxonomy

After two articles' worth of poking at this, the cleanest split is four categories. Mutually exclusive, collectively exhaustive, no overlap arguments:

Category Where you work Examples
CLI A terminal Claude Code, Codex CLI, Aider, OpenCode
Desktop A native IDE or extension Cursor, Devin Desktop, Zed, Continue
Cloud Web A browser tab claude.ai, ChatGPT Canvas, Gemini Canvas
Cloud Agent Async, you hand off a task Devin, Replit Agent, Bolt, Lovable

This article is about the first column. The rest live in the companion piece.

A few tools span two surfaces. When they do, place them by primary user surface (where the person actually sits) and note the secondary as an architectural property, not a separate product.


What changed in 30 days

The AI coding tooling landscape moves fast enough that any month-old map needs a delta page.

Retired, sunsetted, or in EOL mode.

  • Phind shut down January 16, 2026, citing "commoditisation by foundation-model providers". The cleanest single example of the thesis.
  • Gemini CLI sunsets paid auth June 18, 2026. Replaced by Antigravity CLI (GA since May 19 at I/O).
  • Roo Code announced shutdown April 21, archived May. Users migrate to Kilo Code.
  • Cascade (Windsurf's agent) end-of-life July 1, 2026.
  • iFlow CLI sunset announced April 17, 2026. Verify before recommending.
  • Amazon Q Developer: new signups blocked from May 15, 2026. Existing seats only. Effectively EOL.

Rebranded.

  • Windsurf is now Devin Desktop (June 2, 2026), following Cognition's $250M acquisition.
  • Mistral's Le Chat is now Vibe (May 28, 2026). The Vibe CLI shipped its 2.0 in January.
  • Goose moved from Block to the Linux Foundation Agentic AI Foundation in April 2026. New repo at aaif-goose/goose. Foundation-governed vendor neutrality, not just code-level neutrality.

Pricing churn worth noting.

  • Qwen Code's free OAuth retired April 15, 2026. Cheapest paths today: local Qwen3-Coder on Ollama (~46GB at 4-bit), OpenRouter's qwen/qwen3-coder:free rate-limited tier ($10 one-time buys 1k req/day), or Alibaba's $50/mo ModelStudio Coding Plan (about 90k req/mo). The CLI is still Apache-2.0; you just cannot run it free against Alibaba any more.
  • Devin's enterprise-only $500/mo floor collapsed in April 2026. Now $20/mo + $2.25 per ACU. Biggest pricing event of the year for cloud agents (covered in the companion piece).
  • GitHub Copilot moved to usage-based AI Credits on June 1, 2026. Free CLI now token-metered.
  • DeepSeek V3 / R1 deprecate July 24, 2026, replaced by V4 (Pro + Flash, April 2026). Anything on the original article that referenced V3 pricing is already obsolete.

If you are working from a one-month-old map, this is your patch.


CLIs in mid-2026: free vs paid

I am splitting by tier because at the terminal you mostly care about two things: how much it costs, and whether you can point it at your own model. For each, I am noting the third-party API support and the local-inference path (Ollama, LM Studio, llama.cpp, vLLM), because in mid-2026 that combination is what defines a flexible harness.

Free CLI, BYO tokens or local inference

These are open-source or free-tier CLIs. The tool is free; what you pay for is the model behind it. All of them accept OpenAI-compatible endpoints and most also accept Anthropic-Messages, which means you can route them at any provider whose API speaks one of those two protocols (including DeepSeek, Qwen via OpenRouter, GLM/Z.AI, Kimi, Mistral, plus any local Ollama / LM Studio / vLLM endpoint).

Tool What it is Model-specific tilt Note
Aider Apache-2.0, git-native pair programming. Repomap + auto-commit. None (LiteLLM under the hood) Mature, ~39k stars
Goose Foundation-governed (LF AAIF since April 2026). 15+ providers. None ~29k stars
OpenCode (sst) Apache-2.0; 75+ providers via Vercel AI SDK + Models.dev registry None ~172k stars, the breakout
Crush (Charm) Apache-2.0. Mid-session model switching. TUI polish. None Active
Codex CLI (OpenAI) Free if you have any ChatGPT plan (uses plan quota); --oss flag for local Ollama; BYOK at API rates otherwise OpenAI-tilted, --oss opens it up April 2026 billing → token credits
Antigravity CLI (Google) Free tier via Google account Strongly Gemini Replaced Gemini CLI on May 19
Qwen Code (Alibaba) Apache-2.0 CLI; free OAuth retired April 15, 2026. Now: OpenRouter free-rate, Fireworks, DashScope, local Ollama, or paid Alibaba plan Qwen3-Coder Active, tier change
Mistral Vibe CLI Apache-2.0 CLI. Devstral 2 free at launch (planned $0.40/$2.00 per 1M tokens) Devstral 2 / Mistral Vibe 2.0 January 2026
Kimi Code CLI (Moonshot) OSS CLI; Modified MIT model weights; BYOK to Kimi API Kimi K2 family K2.7-Code shipped June 12
Hermes Agent (Nous Research) OSS. Native CLI + TUI + desktop app. "Agent that grows with you." None — multi-provider, Llama 4 local documented Blank Slate mode June 20
DeepSeek-TUI / Deep Code Community projects, no first-party CLI from DeepSeek DeepSeek V4 Active
sgpt (shell-gpt) OSS. One-shot shell-native. LiteLLM bridge to everything. None Last update May 2026
OpenClaw OSS BYOK. Kilo Gateway aggregator option (500+ models at 0% markup). None Active
Kilo CLI MIT CLI sibling of Kilo Code extension None Active

A note on what "model-specific tilt" means: harnesses like Qwen Code, Kimi Code CLI, Mistral Vibe, and Antigravity are purpose-built around one family of models, with provider configs and prompt formatting tuned to that family. They will run other models via OpenAI-compatible endpoints, but their default registry and best-tested path is the family they ship for. The model-agnostic ones (Aider, Goose, OpenCode, Crush, Hermes, OpenClaw, sgpt) treat providers as interchangeable from day one.

A note on what is genuinely free at the model level (not just at the CLI level): Codex CLI on ChatGPT Free tier, Antigravity CLI on Google's free Gemini tier, GitHub Copilot Free plan (limited credits), Amp by Sourcegraph (free while the ad-supported model is in test), OpenRouter's free Qwen3-Coder route (rate-limited). Everything else is BYOK and you pay for the tokens.

A note on what is missing first-party: DeepSeek has no first-party CLI (Deep Code and DeepSeek-TUI are community). Zhipu/GLM ships day-one integrations into Claude Code / Cline / Goose / OpenCode / Crush / Kilo rather than its own CLI. Meta has no Llama-native CLI; Hermes Agent is the de-facto Llama-4 host. gpt-oss lives inside Codex CLI's --oss mode and every BYOK harness via Ollama.

Paid-subscription CLIs

These require an active subscription to use meaningfully. No real free tier for daily-driver work.

Tool Pricing Provider matrix Local Tilt
Claude Code (Anthropic) Bundled into Claude paid plans, or BYOK at Anthropic API rates. No free CLI usage. Anthropic-Messages only on managed; subagent/extension layer is the workaround No first-party local Strongly Claude
GitHub Copilot CLI Free plan exists but CLI consumes AI Credits (1 credit = $0.01 since June 1, 2026); Pro $10, Pro+ $39, Max $100 per month Claude Code + OpenAI Codex wired as third-party agents inside Copilot No Multi-model
Amp (Sourcegraph) Currently free during ad-model test; Enterprise tier exists Claude Opus 4.7 + others via Sourcegraph No None
Alibaba ModelStudio Coding Plan $50/month, ~90k req/mo, 6k per rolling 5-hour window Replaces the retired Qwen Code free tier N/A Qwen3-Coder

Routers and proxies

Worth knowing about because most of the free-tier CLIs above can be pointed at one of these as a single aggregated provider.

  • OpenRouter — multi-provider aggregator with one API surface. Paid per-token. Free routes available for some models (Qwen3-Coder, Hermes 3 405B).
  • Together AI / Fireworks / DeepInfra / Groq — US-jurisdiction hosters serving Chinese and Meta open weights. Useful when you want capability without the data-residency tradeoff.
  • Kilo Gateway — 500+ models at 0% markup. Pairs cleanly with OpenClaw and Kilo CLI.
  • 9router, CLIProxyAPI — OSS self-hosted routers if you want to manage your own bills and rate limits.

Where does the tool stop

A second axis worth tracking is not just what tool, but where does it stop:

suggest → edit → commit → PR → deploy → live app
Enter fullscreen mode Exit fullscreen mode

Your choice is partly about how far you want automation to terminate. Aider stops at edit. Claude Code stops at PR. Codex CLI stops at PR (sandboxed). Devin stops at deploy. Bolt stops at live app.

Pick by where you want the handoff back to the human, not just by capability.


Open-weights cloud API pricing (June 2026 snapshot)

If you are pointing an OSS CLI at a cloud model, here is the current landscape. All prices USD per million tokens. Verified against official provider pages on 2026-06-21.

Model Provider In Out Cached In Context License
DeepSeek V4-Flash api.deepseek.com 0.14 0.28 0.0028 1M MIT
DeepSeek V4-Pro api.deepseek.com 0.435 0.87 0.003625 1M MIT
GLM-4.6 Z.AI 0.60 2.20 0.11 200K MIT
GLM-4.5-Air Z.AI 0.20 1.10 0.03 128K MIT
Qwen3-Max (Intl) Alibaba Model Studio 1.20 6.00 tiered 252K Apache 2.0
Qwen3-Coder-Plus Alibaba Intl 1.00 5.00 tiered 1M Apache 2.0
Kimi K2.6 platform.kimi.ai 0.95 4.00 0.16 200K Modified MIT
MiniMax M2.7 MiniMax PayGo 0.30 1.20 0.06 197K MIT
Mistral Large 3 mistral.ai 0.50 1.50 n/d 128K Apache 2.0
Mistral Small 4 mistral.ai 0.10 0.30 n/d 128K Apache 2.0
Devstral 2 mistral.ai 0.40 2.00 n/d 128K Apache 2.0
Llama 4 Maverick DeepInfra 0.15 0.60 n/d 1M Llama 4 Community
Llama 4 Scout Together AI 0.08 0.30 n/d 10M Llama 4 Community
gpt-oss-120B Groq 0.15 0.60 n/d 128K Apache 2.0
gpt-oss-20B Groq 0.075 0.30 n/d 128K Apache 2.0

Three things to notice.

First, DeepSeek V4-Flash at $0.14 per million input tokens is the floor for credible agentic coding. Anything cheaper is a smaller-capability tier. The May article's pricing data was V3-era and is already obsolete.

Second, open-weights API pricing dropped roughly 80% year-on-year (early 2025 to early 2026, per inference.net's cross-provider analysis). Chinese providers drove the floor; US hosters competed on serving speed rather than on the models themselves.

Third, tool-use is now universal on this list. By mid-2026, function-calling stopped being a differentiator. Context windows similarly inflated: 128K is the floor, 200K is normal, frontier pushes to 1M and 10M.

One pitfall worth flagging: Qwen's tiered billing. Alibaba bills the whole request at the tier set by the input-token count. A coding agent that swells context mid-conversation can jump from $1.00/Mtok to $6.00/Mtok input in a single step. Worth a footnote in your config.


Sovereignty: EU, USA, World

This is the political part of the picture that does not go away by pretending it is not there. Compliance teams ask about it. Procurement asks about it. So:

China-based providers (DeepSeek, Z.AI/Zhipu, Alibaba/Qwen, Moonshot/Kimi, MiniMax, 01.AI) dominate two ends at once: the cheapest credible frontier and the largest concentration of permissively-licensed flagship weights. Mostly MIT or Apache 2.0 across the board. Data residency depends on which endpoint you call; Singapore international endpoints sit outside mainland jurisdiction, mainland endpoints don't.

US-based hosters (Together, Fireworks, Groq, DeepInfra, OpenRouter) mostly don't own a model. They serve Chinese open weights at US-jurisdiction inference cost. Useful if you want Chinese-trained capability but US-or-EU data residency. The single US-trained open frontier family is Meta's Llama 4 under the Community License (open, but not OSI-approved). OpenAI's gpt-oss-120B/20B (Apache 2.0) is the only fully-permissive US flagship-tier open release.

EU-based. Mistral. That is the list. The only sovereign EU frontier-capable open-weights stack. GDPR-native, La Plateforme runs in EU DCs. Slightly more expensive at the high end than Chinese equivalents, dramatically cheaper than US closed frontier. Codestral, Devstral, Magistral Small, Mistral Large 3, Mistral Small 4 are all Apache 2.0. Magistral Medium (reasoning) is the only closed/premier model in the lineup.

If your compliance position is "no customer data leaves EU jurisdiction" the practical answer in mid-2026 is: self-host open weights on EU hardware, or use Mistral. That is the entire shortlist.


Ecodex: the calibration-first CLI

Disclosure first: I work on Empirica, and Ecodex is Empirica's CLI harness. Alpha, daily-driven by the team, opinionated and based on the Empirica system for Claude.

I am including it because it competes on an axis the rest of this article does not cover. Every CLI above is competing on the same thing: better edits, better context, better tool-use. Ecodex competes on metacognition and governance: it is a coding CLI that is accountable for what it claims to know.

The shape, briefly.

Ecodex is a fork of openai/codex bundled with the Empirica epistemic-discipline framework. It does two things stock codex does not.

  • Per-action enforcement. A Sentinel firewall sits between the model and the tools. State-changing tools (Edit, Write, Bash on non-read commands) require an open transaction with a passed CHECK gate. Investigation tools (Read, Grep, Glob) flow freely until a hypothesis-bearing prompt arms an investigation-proportionality budget. The agent literally cannot edit a file without first declaring what it knows and passing the gate. The block is not silently dropped; the sentinel emits an explicit permissionDecision: deny and codex honours it.
  • Per-transaction calibration. Every unit of work opens with a PREFLIGHT (the agent declares thirteen calibration vectors representing its current epistemic state) and closes with a POSTFLIGHT (the same vectors re-declared, then grounded against deterministic services like test results, git metrics, artifact counts). The divergence between what the agent claimed and what actually happened gets recorded. Over time, the divergence becomes a signal you can act on.

Out of the box it ships curated open-weights provider defaults: DeepSeek, Qwen3-Coder, Kimi K2.6, GLM, Mistral (Devstral 2 for agentic coding, Codestral for completion, EU-hosted at api.mistral.ai/v1, shipped by default as of commit c2457d0d6e), and local routes via Ollama, LM Studio, llama.cpp, vLLM. Hot-swap mid-session via /model, no restart.

That Mistral default is worth pausing on if your stack has an EU compliance constraint. It makes Ecodex the only harness in this comparison set that ships an EU-hosted cloud provider as a first-class pick and lets you self-host the same open weights (Devstral is open weights) on your own EU hardware via vLLM or Ollama. Code never leaves the EU on either path.

The 30-second moment, if you want to see the differentiator without reading more:

$ ecodex
> /model        # pick DeepSeek V4-Flash, or local Qwen3-Coder via Ollama
> fix the off-by-one in parse_range in utils.py
Enter fullscreen mode Exit fullscreen mode

The statusline shows the live phase (noetic, then praxic) and an intuition-vs-search badge. The agent reads and greps freely. If it attempts to Edit before grounding, the Sentinel blocks the call with a visible reason ("praxic tool requires CHECK=proceed"). It investigates more, passes CHECK, makes the fix, runs the tests, and at POSTFLIGHT prints the grounded delta. You see belief measured against outcome.

Install paths:

brew install nubaeon/tap/ecodex
# or: cargo install --git https://github.com/Nubaeon/ecodex codex-cli
# or: direct binary from https://github.com/Nubaeon/ecodex/releases/latest
Enter fullscreen mode Exit fullscreen mode

It is alpha. It is opinionated. The discipline overhead is the point. If you do not want a CLI that argues with you about whether you have done enough investigation, this is not the CLI for you. If you do, it is the only one I am aware of that builds that discipline into the harness rather than asking you to remember to do it yourself.

Source: github.com/EmpiricaAI/ecodex. The compliance crosswalk (mapping the substrate to EU AI Act, GDPR, ISO 42001) is at docs/ecodex/positioning/compliance-crosswalk.md, relevant if your stack has a regulatory anchor.


What I would actually use

Three reads for three working contexts. Not the only right answers; just where I would start in mid-2026.

  • Serious work, paid, closed weights. Claude Code for the reasoning model, Codex CLI for the sandbox. Switch between them per task shape.
  • Open weights, BYO model. OpenCode for the breadth (75+ providers), Aider for git-native discipline. Goose if you specifically want foundation-governed vendor neutrality.
  • Open weights with accountability built in. Ecodex, with the caveats above. The category-of-one for now.

What I am not saying

  • That the duopoly is going away. Claude Code and Codex CLI are not getting cheaper; they are getting better, and the closed-frontier reasoning capability gap is still real.
  • That open weights are at parity with closed frontier on every task. They are close on coding-specific benchmarks, behind on certain reasoning shapes, and the gap is narrowing fast.
  • That you should pick by sovereignty before capability. Pick by capability for the task; reach for sovereignty when the compliance position requires it.

The honest summary is: the floor is much lower than it was a year ago, the open tier is genuinely usable for daily work, and the most interesting category to watch is the one that does not compete on "better edits" but on "accountable edits". Reader's pick.


The Desktop / Cloud Web / Cloud Agent companion is the next post in this series. It covers Cursor, the Windsurf-to-Devin-Desktop rebrand, Cloud Agent's price collapse, and the builder versus engineer split.

Comments and corrections welcome. Especially: tools I should have included, pricing I got wrong, or sovereignty framings I am missing.

Series: AI Coding Harness Map (2026). Pricing verified against official provider pages on 2026-06-21.

Top comments (0)