DEV Community

Cover image for R-CLI: an open-source model harness that beats Claude Code
Jonathan Murray for Backboard.io

Posted on

R-CLI: an open-source model harness that beats Claude Code

R-CLI by Backboard.io hit 92% on Terminal-Bench 2.1, the standard benchmark for autonomous coding agents, placing it on top of the global leaderboard using Codex 5.5. Yes, we beat OpenAI using their own model. That's great, but we're more excited about the next point.

Try it - use the DEVTOCLI promo code in your Backboard.io account.

Inside R-CLI, Backboard.io's coding harness, an open-source model just beat Claude Code at coding.

Not matched. Beat. On Terminal-Bench 2.1, Backboard.io's R-CLI running GLM 5.1 (fully open source) scores 70%. Claude Code running Opus 4.7 scores 69.7%. The open model is in front, and it costs a fraction of what Claude Code costs to run.

R-CLI is the coding surface of Backboard.io, the full-stack, model-agnostic AI platform. If you have been looking for an open source Claude Code alternative, this is the one that does not ask you to trade away performance to get it.

The numbers

Setup Model Open source? Terminal-Bench 2.1
Backboard.io R-CLI GLM 5.1 Yes 70%
Backboard.io R-CLI Codex 5.5 No 92%
Anthropic Claude Code Opus 4.7 No 69.7%

Two results worth sitting with:

  • With an open-source model, R-CLI beats Claude Code. No proprietary model required to get past the best closed coding agent.
  • With a frontier model (Codex 5.5), R-CLI hits 92%. The same harness scales up when you want maximum capability.

The harness is the product, not the model

Here is the part most people get backwards. A coding agent's score is not mostly about the model. It is about the harness around it: how it plans, how it manages context, how it recovers from mistakes, how it delegates work.

R-CLI is built on Backboard.io's RLM, our recursive coding engine. Instead of stuffing one giant context window and hoping the model keeps track, the RLM breaks work into bounded child contexts and delegates off the main model. The orchestration does the heavy lifting. That is why a 70% open-source result is even possible: the harness closes the gap that the model alone would leave open.

Swap the model, keep the harness. Run GLM 5.1 to beat Claude Code on open source. Run Codex 5.5 to hit 92%. Same R-CLI underneath.

We destroy them on cost

Performance parity would already be a story. Cost is where it stops being close.

Run R-CLI on an open-source model and you are not paying a per-token premium to a frontier lab at all. Self-host it and the marginal cost of a coding run approaches your own compute. Even when you choose to run R-CLI on a top closed model like Codex, the recursive engine does the same work for meaningfully less than the raw harness, because it is not burning tokens on a bloated single context.

Better score. Open model. A fraction of the cost. Pick all three.

Your code never leaves your VPC

Every closed coding tool, Claude Code, Codex, Copilot, Cursor, ships your source to a vendor's API to function. For a lot of teams that is a hard stop: defence, intelligence, regulated health and finance, anyone with real IP to protect.

Because R-CLI can run entirely on an open-source model, it can also run fully on-prem and air-gapped. Frontier-level coding with zero code leaving your infrastructure. The GLM 5.1 result is the proof that on-prem is not a downgrade. You are not choosing between privacy and performance anymore.

Frontier coding, air-gapped. That combination did not exist until now.

Run it yourself

R-CLI is in alpha right now. We are bringing developers in to run it on their own repos, on the model of their choice, and report back with real numbers, not scripted praise.

Request alpha access: backboard.io

Once you are in, the flow is simple:

  1. Install R-CLI and drop in your Backboard.io API key.
  2. Point it at a model. Choose GLM 5.1 (open source) to reproduce the 70%, Codex 5.5 for 92%, or your own on-prem deployment.
  3. Run it on your codebase and check the result against your own tasks.

We are not asking you to trust the leaderboard. We are asking you to run it and see.

One key, the whole stack

Here is what that Backboard.io API key actually unlocks. It does not just run R-CLI.

The same key gives R-CLI native access to the top coding models, Codex, Opus, and the rest, with nothing else to wire up. And the moment you want to build the software around your code, the same key already reaches the entire Backboard.io platform:

  • 17,000+ models for agents, chatbots, and anything else you are building, routed behind one key.
  • Memory and stateful threads, so what you build remembers users across conversations.
  • Agentic RAG over your own documents.
  • Voice (text-to-speech and speech-to-text), image, web search, and parallel tool calls, all on the same key.

You are not standing up a coding tool here, then a model gateway, then a memory service, then a voice provider. You add one API key and you can build software, ship agentic AI, add voice and image, and run tool calls, all from the same place. R-CLI writes the code. Backboard.io is the stack the code runs on.

Why we built it

Backboard.io's thesis is simple: the best AI infrastructure should be the most open and the most accessible, not the most locked down. R-CLI is that thesis applied to coding, the same one key platform that gives you memory, model routing, and RAG, now pointed at your codebase. The best score on Terminal-Bench 2.1 with an open model, runnable on your own hardware, at a cost that makes closed tools hard to justify.

The open source Claude Code alternative is not a compromise version. It is the better one.

Request alpha access: backboard.io

Top comments (0)