DEV Community

Cover image for We bet against the GPU arms race. Here's what shipped.
Jonathan Murray for Backboard.io

Posted on

We bet against the GPU arms race. Here's what shipped.

On July 1 we announced four things at once. The press release version is here. This is the version for people who actually build things.

The short story: while the industry spends hundreds of billions on new hardware, we took the opposite bet. Get more out of the GPUs that already exist, and keep everything inside the customer's own environment.

Here's what came out of it.

BackboardQuant: compression that doesn't lobotomize the model

Everyone who has quantized a model knows the trade: smaller and faster, but dumber. The interesting engineering problem was making that trade disappear.

BackboardQuant (yes, we call it BBQ) compresses models by up to 70% with functionally no quality loss. In our testing, compressed models retained full-precision performance while running up to 2.7x faster.

What that means in practice: one GPU doing the work of two or three. If you're serving models at scale, that's your inference bill cut by more than half without touching your architecture. It ships built into our enterprise deployments.

Backboard Studio: the benchmark result we didn't expect

We built Studio because frontier-lab coding tools are excellent and priced like it. The goal was matching them at a fraction of the cost.

The result on Terminal-Bench 2.1, the neutral public harness for agentic coding:

  • Backboard Studio running Claude Opus 4.8: 79.8%
  • Opus 4.8 on its own harness result: 74.6% The harness matters more than people think. Same model, better scaffolding, five points better.

The part I care most about: running GLM 5.2, an open-source model, Studio clears 72%. That's frontier-class agentic coding with no proprietary model in the loop. Pair that with a built-in token optimizer that cuts frontier model usage by up to 30%, and "up to 90% cheaper" stops sounding like marketing.

Studio runs in the cloud or fully self-hosted, so proprietary code never leaves your infrastructure. It's available now.

Nash: one app instead of shadow AI

Every enterprise we talk to has the same problem: employees are pasting company data into whatever chat app they found. The fix isn't a ban, it's a sanctioned option that's better than what they'd find on their own.

Nash gives users thousands of models across text and image in one chat app, with memory that stays out of the model providers' hands. Consumer and enterprise, live at hellonash.ai.

Memory: still #1, and you can check

Backboard ranks first on LoCoMo and LongMemEval, the two leading independent AI memory benchmarks. We published the results and the harnesses so you can reproduce them yourself:

The throughline: sovereign by design

None of these are separate products bolted together. The whole stack, API, application layer, and models, can run inside a customer's own cloud. Data never leaves. For governments, hospitals, and banks, that's the difference between "we'd love to use AI" and actually using it.

One more thing, because it matters to us: all of this was built in Nepean, Ontario, by a team made up entirely of graduates of Canadian universities, colleges, and CEGEPs. The default assumption is that this kind of work only happens in San Francisco. It doesn't.

Try it

If you write code, Backboard Studio is the fastest way to see whether any of this holds up. Run it against whatever you're using now and compare the bill.

Questions about the benchmarks, the compression numbers, or the harness? Ask in the comments. I'll answer.

Top comments (0)