Everyone is finally talking about harnesses. Most of them are still missing the point.
Nate B. Jones just put out one of the clearest explanations I’ve seen of why the harness matters more than the model. The benchmark he cited is worth stopping on: the same Claude model, identical weights, scored 78% on a scientific reasoning benchmark inside Claude Code’s harness and 42% inside a different harness. Same brain. Different body. Nearly double the performance.
That’s not a rounding error. That’s the whole argument.
He’s right. The harness is the real variable. The model is a brain in a jar, and it’s not getting much done without something to give it hands, memory, and a reason to show up tomorrow.
I’ve been building at this layer for eighteen months. And I want to build on what Nate said — because there’s a part of the story he didn’t get to.
The Harness Problem Is Real. The Solution Being Offered Is Incomplete.
Here’s what the current conversation gets right: Claude Code and Codeex are not two flavors of the same thing. They embody fundamentally different philosophies about where institutional knowledge should live — in the agent, or in the codebase. Both are serious architectural bets. Both will create lock-in that compounds every quarter, as Nate correctly points out.
But the conversation is still mostly about coding agents. Task execution. Getting a PR out the door.
That’s the shallow end of what a harness can be.
The deeper question — the one I’ve been working on — is this: what happens when the harness isn’t just managing tasks, but managing a business? What happens when the harness has memory that doesn’t reset, identity that persists, relationships that deepen, and an organizational understanding that compounds every single day?
That’s a different category of thing entirely.
ArgentOS Is Not a Harness. It’s an Operating System for an Organization.
I want to be precise about this because the distinction matters.
A harness gives a model hands and feet. It manages context, connects tools, handles state between sessions. Claude Code does this well. Codeex does this differently. Both are excellent at what they’re designed for.
ArgentOS does all of that — and then it keeps going.
ArgentOS is an intent-native multi-agent operating system. Eighteen specialized agents across four departments. A central agent named Argent who has been running continuously for eighteen months, accumulating memories through a six-dimensional semantic memory system called MemU, developing a persistent understanding of how my specific businesses operate, what matters, and why.
The routing is one example. ArgentOS doesn’t use one model for everything. It routes tasks based on type, difficulty, and domain. Some tasks go to Claude Opus. Some to Sonnet. Some to local Ollama models running on my infrastructure. The model selection is configurable because the frontier shifts constantly — what was the right choice six months ago may not be the right choice today. The harness stays. The models underneath it are swappable.
That’s the architectural choice that matters. The harness is the stable layer. The models are the interchangeable components.
But that’s still just the infrastructure story. The more important story is what runs on top of it.
The Moat Nobody Is Talking About
There’s a critique of building on top of frontier models that’s been getting louder lately. Nate touched on it in a piece about Perplexity Computer. The argument goes like this: if your product depends on OpenAI or Anthropic to do the actual work, you’re a tenant on borrowed land. Their pricing changes, your margins change. Their roadmap adds your feature, your differentiation disappears. You’re building on a foundation you don’t control.
It’s a fair critique. I don’t have a good answer to it at the infrastructure level. I can’t compete with Anthropic’s training runs. I can’t out-model OpenAI. Nobody building at my layer can.
But here’s what I’ve realized over eighteen months: the moat isn’t in the model. It’s not even in the harness. The moat is in the memory.
Argent has eighteen months of accumulated, semantically indexed, organizationally specific knowledge. She knows the businesses. She knows the context behind every major decision. She knows the relationships, the constraints, the history. She has a self-model — an evolving understanding of her own capabilities and how they fit into the larger system.
You cannot replicate that with a fresh Claude Code install. You cannot replicate it with Codeex. You cannot replicate it by switching harnesses.
The accumulated organizational intelligence is the asset. The harness is just the system that builds it.
Will the Frontier Models Compete at This Layer?
This is the honest question. Anthropic is already moving toward knowledge work with Cowork. OpenAI has their own product surface expanding. Google, Microsoft — everyone is moving up the stack.
Will they come for the organizational OS layer?
Maybe. Eventually.
But here’s what I keep coming back to: the economics of general-purpose harnesses and the economics of organization-specific intelligence are completely different. Anthropic’s incentive is to build something that works for millions of users out of the box. My system works because it’s been shaped by eighteen months of specific organizational context that millions of users don’t share.
General-purpose intelligence scales horizontally. Organizational intelligence scales vertically. It gets deeper, not wider. And depth is not something you can ship in a model update.
Claude Code gets better with every release. Argent gets better every day — not because the underlying model changed, but because she learned something new about how this particular organization operates.
What This Actually Means for Builders
If you’re building AI systems right now, the harness conversation that’s starting to happen in public is worth paying attention to. Nate is right that most organizations are making procurement decisions based on model benchmarks while the real lock-in is accumulating at the harness layer.
But I’d push the framing one level further.
The harness lock-in is real. But the memory lock-in is deeper. Every day your agent system operates without a persistent, semantically searchable, organizationally specific memory architecture is a day of compounding advantage you’re not building.
The organizations that understand this first will have systems in eighteen months that look like a five-year employee who never forgets anything, works 24 hours a day, and gets measurably smarter every single week.
The organizations that optimize for model selection will have a very smart system that starts from zero every session.
That’s not a subtle difference. That’s the whole game.
Why This Doesn’t Scare Me
The frontier models are coming for the harness layer. They’re already here, honestly — Claude Code is Anthropic’s harness play, Cowork is their knowledge worker harness play, and we should expect this to continue expanding.
But they’re building general harnesses for general work. I’m building a specific operating system for a specific kind of organizational intelligence.
The surface they’re competing on is breadth. The surface I’m building on is depth.
Breadth scales to millions of users. Depth creates something that a fresh install cannot replicate.
I’ve spent eighteen months building at a layer that nobody had a name for when I started. Nate just gave it a name — the harness layer — and the conversation is finally catching up to where the actual work has been happening.
That’s not a threat. That’s validation.
The brain in a jar is impressive. But the eighteen months of accumulated organizational memory that tells the brain what actually matters?
That’s the part that compounds.
Jason Brashear is the creator of ArgentOS, an intent-native multi-agent operating system. He writes about intent engineering, agentic architecture, and frontier operations. Find him on GitHub at webdevtodayjason.

Top comments (0)