Solomon Neas

Posted on Apr 22 • Originally published at solomonneas.dev

Anthropic Broke My OpenClaw Stack. GPT 5.4 Put It Back Together

#openclaw #gpt54 #anthropic #claude

If you quit OpenClaw after Anthropic pulled third-party harness support, I don't blame you.

That week was a mess. One day Claude was the center of the stack. Then Anthropic cut subscription coverage for third-party harnesses, Boris Cherny said some of the Claude CLI blocking looked like an overactive abuse classifier, Peter Steinberger tried to route around it, and OpenClaw still kept hitting the wall when the request looked too much like a harnessed system-prompt wrapper.¹²³

That is the part a lot of people missed. The problem was not just billing. The problem was that "Claude CLI is allowed" and "OpenClaw using Claude CLI works" turned out to be two different statements.

So I stopped trying to save the old stack.

I rebuilt it around GPT 5.4 as the orchestrator, Gemini for research, Claude only where first-party Anthropic paths still made sense, and a local-first retrieval layer that did not depend on vendor mood swings. It took more than a model swap. I had to fix crashing bootstrap injections, tighten routing rules until the agent stopped bullshitting itself, and treat memory and embeddings like infrastructure instead of a nice extra.

This post is the field report. If you think OpenClaw became unusable after the Anthropic mess, fair enough. If you think GPT 5.4 cannot anchor a serious OpenClaw setup, I have receipts.

The Week Anthropic Pulled the Rug

On April 3, Anthropic emailed subscribers to say that starting April 4 at 12 p.m. Pacific, Claude Pro and Max subscription limits would no longer cover third-party harnesses, with OpenClaw called out by name.⁴ Continuing to use Claude through those tools would require extra usage, billed separately from the subscription.⁵ Anthropic's rationale was blunt: subscription plans were not built for the call volume and usage patterns of agentic harnesses.⁶

That part was not rumor. It was confirmed in the customer email, picked up by TechCrunch, The Register, VentureBeat, and a dozen community threads full of people realizing their entire setup had just changed cost models overnight.⁵⁶⁷

At that point the obvious question was whether Claude Code CLI could still serve as a bridge. Peter found a workaround path that routed requests through the Claude CLI backend instead of the older OAuth-backed OpenClaw flow. For a minute, that looked like a real survival route. Then it became clear that "Claude CLI is allowed" and "OpenClaw using Claude CLI works" were not actually the same thing.²³

That distinction is the whole fight.

What Boris Said, and Why That Still Wasn't Enough

On April 6, Boris Cherny replied publicly that the blocking behavior did not look intentional and was "likely an overactive abuse classifier," and that Anthropic was looking into it while clarifying policy going forward.¹ If you were trying to keep OpenClaw alive on Claude, that sounded like hope.

Peter treated it like hope. So did a lot of users.

But the later public discussion makes the problem obvious. Peter said OpenClaw added Claude CLI support based on those assurances, only to find that Anthropic still blocked the setup in practice. His explanation is the clearest version of the current state: Claude CLI usage may be allowed in the abstract, but OpenClaw still appears to hit blocking when the request includes the sort of appended system prompt and harness context that marks it as OpenClaw-shaped traffic.²³

That is why this felt so maddening. Anthropic did not seem to be checking only the binary or the auth path. It looked more like they were classifying the behavior and the envelope. A plain claude -p call could appear fine. The same model, wrapped in OpenClaw's injected prompt layer, could trip a different outcome.

That is a very different kind of problem than a simple billing toggle.

Why I Stopped Treating Claude as the Center of the Stack

Once that sank in, the answer got ugly but obvious.

I stopped trying to make OpenClaw depend on Anthropic's goodwill.

Instead of centering the whole system on Claude and using other models as occasional helpers, I inverted it:

GPT 5.4 became the main orchestrator for planning, coding, structure, and conversation.
Gemini 3 Pro became the research and long-context lane.
Claude Opus stayed in the stack only through first-party Anthropic surfaces, where it was still worth using for architecture, review, and deep reasoning.
Local embeddings became mandatory infrastructure, not a nice-to-have.

That part matters more than people think. A lot of "OpenClaw is unusable now" complaints were really complaints about losing a cheap Claude substrate. Once that substrate went bad, the rest of the stack had no spine. GPT 5.4 gave the system a spine again.

Not because it was magical. Because it was stable, good at orchestration, and available on terms that did not collapse the minute Anthropic changed its classifier.

The Ugly OpenClaw Reality That Skeptics Were Right About

The skeptics were not wrong about the rough edges.

The week after the Anthropic cutoff, OpenClaw could absolutely feel cursed. I saw runs that posted "working" notifications for twenty-plus minutes and then stalled. I saw hollow filler replies instead of actual task progress. I saw backend glitches pile on top of billing chaos. And I saw enough papercuts around PDFs and update behavior that a normal user would have been justified in walking away.

That is not me hedging. That is me giving the critics their due.

There were at least two categories of problems happening at once:

External instability: Anthropic changed the economics and kept the technical boundaries vague.
Internal OpenClaw fragility: some core workflows still had rough implementation edges right when users needed confidence most.

If all you saw was that surface layer, OpenClaw looked broken and overcomplicated.

The trick was surviving long enough to harden the inside.

The First Big Fix: Local-First Enforcement Had to Stop Crashing the Gateway

One of the most important fixes I made predated the April blowup, but it became much more important after it.

I had a local-first enforcement hook that injected routing rules at bootstrap. The idea was straightforward: force the stack to check local systems and cheap retrieval paths before burning third-party tokens. In practice, the first version crashed the gateway on every message.

The root cause was stupid and surgical.

My hook was pushing malformed bootstrap file objects. OpenClaw's bootstrap pipeline expected a workspace file object with the shape:

{ name, path, content, missing }

The broken hook was effectively sending the wrong keys, which meant path was undefined. Somewhere inside the bundled runtime, OpenClaw did a file.path.replace(...) call. Since path was missing, the gateway face-planted before the agent could even reply.

The fix was to stop guessing and match the actual expected shape exactly:

event.context.bootstrapFiles.push({
  name: "LOCAL_FIRST_RULES.md" as any,
  path: `${event.context.workspaceDir}/LOCAL_FIRST_RULES.md`,
  content: RULES_CONTENT,
  missing: false,
});

That one fix took the system from "broken before every reply" to "capable of receiving routing constraints at startup." It also taught the most annoying lesson of the whole month: a lot of agent reliability work is not model work. It is object shape, lifecycle timing, and boring plumbing.

The Second Big Fix: Prompt Rules Only Work When They Are Concrete and Mean

After the crash fix, I still had a behavior problem.

The agent knew it should prefer local systems first, but "should" is not enforcement. Vague rules got ignored. Gentle guidance got rationalized away. Anything that felt like judgment instead of a hard edge leaked.

So I rewrote the routing rules until they behaved like guardrails instead of wishes.

The final structure was basically this:

Pre-flight check first
Code Search API before grepping repos
Prompt Library before drafting prompts from scratch
Agent Intel before answering knowledge questions
Delegate file-scanning tasks instead of doing them directly in the expensive orchestrator lane
Hard violations for common shortcuts

The part that finally worked was not abstract policy. It was specifics.

For example, coder delegation did not start sticking until I added:

explicit command blocklists for grep, find, cat, head, tail, sed, awk, and friends
examples of the exact rationalizations I wanted to kill
direct language that said "no, even if it would be faster"
cost framing tied to me personally paying for wasted calls

That last part sounds petty, but it worked. Models respond better when the rule cashes out into a human consequence.

The Third Big Fix: Split the Work Across Strict Agent Lanes

This was the real system redesign.

I stopped pretending one general-purpose agent should do everything well and cheaply.

Instead, I ran OpenClaw with strict lanes:

Lane 1: GPT 5.4 as the orchestrator

This is the lane that made the whole thing survivable.

GPT 5.4 handled:

planning
orchestration
coding direction
multi-step reasoning
structured drafting
deciding which lane should do what

This turned out to be the right use of it. Not as a gimmick. Not as a pure chatbot. As the traffic cop and field commander.

Lane 2: Parallel research lanes instead of one research monoculture

I stopped treating research like a single-provider job.

Gemini 3 Pro handled the giant-context and multimodal work:

long source packs
vision work
large-doc sweeps
broad comparison passes

Perplexity handled sourced current-event sweeps and fast citation gathering.

Claude Opus handled the tighter synthesis pass when I wanted a second serious read on architecture, tradeoffs, or whether a conclusion actually held up.

Running those lanes in parallel was better than pretending one model should be researcher, fact-checker, and synthesizer all at once.

That also kept GPT 5.4 from being used like an overpriced research scraper.

Lane 3: Native harnesses for serious building

For serious builds, I stopped pretending the generic orchestrator lane should do all the heavy lifting itself.

I used native coding harnesses where they were strongest:

Codex CLI for build-heavy coding work in its own native loop
Claude Code when I wanted Anthropic's first-party harness behavior and tool loop
OpenClaw as the orchestrator that decides when those lanes should take over

This part matters because harness engineering changes outcomes. Public 2026 benchmark writeups put GPT-5.4-Codex at 56.8% on SWE-bench Pro, while Anthropic's later Opus line was reported at 64.3% on the same benchmark family in a first-party-style coding setup.⁸⁹ Separate harness comparisons also make the bigger point: the same base model can move a lot depending on which harness is driving it, which is why I care less about leaderboard chest-thumping and more about using the model in the environment where it actually behaves best.¹⁰

There was also a policy boundary here. Anthropic's published position was that subscriber OAuth access was reserved for Claude Code and Claude.ai, while third-party harnesses needed API-key access or explicit permission.¹¹ So even when Opus looked better in a given harness, the practical Anthropic-safe path was Claude Code, not trying to smuggle subscriber OAuth back through OpenClaw.

That is why Claude stayed in the stack, but not at the center of it.

That was the emotional adjustment some people never made. They were still trying to get back to the old world. I was building for the new one.

The Quiet Hero: Local Embeddings and Memory Cards

The part I trust most in this stack is not the fanciest model. It is the memory system.

A lot of agent setups fail because they rely on fresh-session vibes and giant transcript sludge. When the vendor changes behavior, the whole thing turns into a goldfish with tools.

I wanted something closer to a compact operating memory:

a slim MEMORY.md as the index
atomic knowledge cards in memory/cards/*.md
daily raw logs in memory/YYYY-MM-DD.md
semantic search across all of it
local embeddings first

For embeddings, I run qwen3-embedding:8b locally through Ollama for memory search and code retrieval. That matters for three reasons.

First, it is cheap. Second, it is private. Third, it removes one more dependency on third-party token economics for the boring but essential retrieval layer.

This is the part that reminds me most of the better agent-memory systems people keep circling around lately. Keep the hot context small. Keep durable knowledge chunked. Search first, load second. Do not dump a 60 KB memory blob into every session and pray.

That change alone made the stack feel less brittle.

The Missing Glue: Handoff Notes Between Codex, Claude Code, and OpenClaw

One thing I also borrowed from the gbrain and gstack style of thinking was this: the useful part is not just having multiple agents. It is having a clean handoff surface between them.

That is where a lot of multi-model setups quietly fall apart. Codex figures something out. Claude Code discovers a root cause. OpenClaw finishes the task. Then the lesson dies in whichever terminal happened to learn it.

I got sick of that.

So I added a Memory Handoff workflow for Claude Code and the other coding lanes. When a session produces something durable, it is supposed to emit a short structured handoff instead of burying the answer in chat exhaust. The format is dead simple: what happened, why it matters, the durable facts, the evidence, and where the knowledge should land next.

That matters because I do not want three competing memory systems.

Codex and Claude Code can keep local session context
OpenClaw on Rocinante is the canonical long-term memory
handoffs are the bridge

In practice that means architecture decisions, weird bug roots, workflow changes, setup gotchas, and user preferences get written as small transfer documents in .claude/memory-handoffs/. From there, Rocinante can route them into memory/cards/, TOOLS.md, USER.md, rules/, or .learnings/ depending on what the fact actually is.

That sounds boring. Good. Boring is what you want from memory plumbing.

The important part is that it kills the old handoff-letter problem. I am not pasting giant end-of-session summaries back into a fresh model anymore. I am pushing small, structured notes across lanes so Codex, Claude Code, and OpenClaw stop relearning the same lesson like idiots.

The Ingestion Layer: n8n as the Information Bus

The other piece people should know about is n8n.

I do not just use n8n for social posting. It has turned into the information bus for the stack.

On my setup, n8n already owns a bunch of glue work that would otherwise become brittle shell scripts and forgotten cron notes: blog publishing, cross-post fan-out, draft queues, webhook-driven publishing, and audit logging. Once that was in place, it made sense to treat it as part of the agent infrastructure too.

So the stack is not just model routing plus memory cards. It is also ingestion.

webhooks for one-shot publish and routing events
scheduled jobs for polling, dedupe, and follow-up work
workflow state for draft queues and publish history
audit trails that can be shipped into Wazuh

That matters because good agent systems are really data-movement systems in disguise. The model is only one layer. The rest is capture, routing, storage, and retrieval.

In my case, n8n handles a lot of the "something happened, now move it where it belongs" work. A blog draft can become a canonical post, then a cross-post job, then a social preview queue, then a record in the broader content pipeline. A Claude Code memory handoff can get ingested on schedule so durable knowledge does not stay trapped in a repo-local folder. Different jobs. Same philosophy.

That is the part I think more OpenClaw users should steal. Not my exact stack. The pattern.

Use the orchestrator for judgment. Use specialist lanes for depth. Use structured handoffs so context survives model boundaries. Use an automation layer like n8n so the system keeps moving even when no one is staring at the terminal.

The Bugs That Made the Trench Story Real

There were also smaller but very real wounds.

PDF handling

OpenClaw had document handling gaps and PDF weirdness bad enough that it became one of the receipts I kept for public criticism. The public issue for automatic inbound PDF text extraction was already there, and I later tracked the related fix work around standardFontDataUrl because broken document ingestion does not just look sloppy, it kills trust in the agent's ability to work with real inputs.¹²

Systemd update behavior

Another nasty one was update behavior that could silently strip user-added environment directives when regenerating the managed systemd unit. There is a public issue for that too, and it is the kind of bug that makes people think the product is haunted because their setup breaks after a successful update.¹³

The meta-problem

Those bugs matter beyond their own scope. They compound. If billing is unstable, prompt behavior is unstable, and upgrades silently stomp local directives, people stop giving the tool the benefit of the doubt.

That is why the fix story matters. Reliability is not one grand heroic patch. It is a pile of ugly little wins that stop the system from bleeding out.

What I Would Say to OpenClaw Quitters

If you quit because Anthropic killed the old Claude subscription path, I think your frustration was fair.

If you quit because OpenClaw felt too fragile during that transition, that was fair too.

But if you are writing off OpenClaw entirely, I think you are throwing away the wrong part.

The wrong part was depending on Anthropic as the unquestioned center of the stack.

The useful part is still here:

multi-lane orchestration
agent-specific routing
background work
channel integration
local-first retrieval
structured memory
tool access

That combination is still powerful.

The post-Anthropic version of OpenClaw just needs a different default architecture:

GPT 5.4 as the main orchestrator, with Codex CLI for native build-heavy coding work
parallel research lanes across Gemini, Perplexity, and Opus
Claude only through first-party lanes like Claude Code or direct API usage
local embeddings for retrieval
strict routing rules
bootstrap-level prompt injection defenses

That is not a consolation prize. It is a more grown-up design.

The Proof I Wanted for the Skeptics

The reason I am writing this at all is simple.

I wanted receipts.

Not "trust me bro, I made it work." Not a tweet thread with the edges sanded off. Real receipts. Real failure modes. Real fixes. Real architecture changes.

If I answer somebody on X about OpenClaw after the Anthropic mess, I want to point to a field report that says:

yes, the Anthropic cutoff was real
yes, the Claude CLI classifier mess was real
yes, OpenClaw had enough rough edges to make people quit
and yes, I still made GPT 5.4 work as the orchestrator anyway

That last part is the point.

Not that GPT 5.4 solved every problem. It did not. Not that Anthropic stopped mattering. They still matter. The point is that once the stack stopped assuming Claude had to be the center of the universe, OpenClaw became usable again.

That is a very different claim from "everything is fine."

Everything was not fine.

It was a trench story. Then it became a system.

The Bottom Line

Anthropic did pull the rug on third-party subscription harnessing.⁴⁵⁶ Boris did publicly say some of the Claude CLI blocking looked like a bug or overactive classifier and that it was being addressed.¹ Peter did try to build around that guidance, and then said OpenClaw still hit blocks in practice because the harness behavior itself appeared to trigger Anthropic's defenses.²³

That left OpenClaw users in a miserable spot: shaky policy, inconsistent runtime behavior, and a lot of reasons to distrust the stack.

My answer was not to pretend that away. My answer was to redesign the stack around reality.

That is what this whole post comes down to.

I did not save OpenClaw by finding a new Claude loophole. I saved my OpenClaw workflow by making Claude optional, making GPT 5.4 the orchestrator, making retrieval local-first, and making the routing rules strict enough that the system stopped lying to itself.

That is the version of OpenClaw I still believe in.

Originally published at solomonneas.dev/blog/openclaw-after-anthropic-how-i-made-gpt-54-work. Licensed under CC BY-NC-ND 4.0 - attribution required, no commercial use, no derivatives.

Boris Cherny (@bcherny), "@steipete This is not intentional, likely an overactive abuse classifier. Looking, and working on clarifying the policy going forward," X, April 6, 2026, https://x.com/bcherny/status/2041035127430754686. ↩
Peter Steinberger (@steipete), "Since this is blowing up on hacker news. Boris said that CLI usage is allowed. Thus we added support for it, only to find out that we are still blocked there...," X, April 21, 2026, https://x.com/steipete/status/2046685973233189375. ↩
"Anthropic says OpenClaw-style Claude CLI usage is allowed again," Hacker News, accessed April 22, 2026, https://news.ycombinator.com/item?id=47844269. ↩
Anthropic third-party harness cutoff timeline captured in the author's OpenClaw operating logs, April 3 to April 4, 2026. ↩
Anthony Ha, "Anthropic Says Claude Code Subscribers Will Need to Pay Extra for OpenClaw Usage," TechCrunch, April 4, 2026, https://techcrunch.com/2026/04/04/anthropic-says-claude-code-subscribers-will-need-to-pay-extra-for-openclaw-support/. ↩
Thomas Claburn, "Anthropic Closes Door on Subscription Use of OpenClaw," The Register, April 6, 2026, https://www.theregister.com/2026/04/06/anthropic_closes_door_on_subscription/. ↩
Carl Franzen, "Anthropic Cuts Off the Ability to Use Claude Subscriptions With OpenClaw and Similar Tools," VentureBeat, April 4, 2026, https://venturebeat.com/technology/anthropic-cuts-off-the-ability-to-use-claude-subscriptions-with-openclaw-and. ↩
"Claude Code vs Codex vs Aider vs OpenCode vs Pi 2026," thoughts.jock.pl, accessed April 22, 2026, https://thoughts.jock.pl/p/ai-coding-harness-agents-2026. ↩
Better Stack Community, "Claude Opus 4.7: Benchmarks, Tokenizer Changes, and Coding Performance," accessed April 22, 2026, https://betterstack.com/community/guides/ai/claude-opus-4-7/. ↩
Ibid. ↩
Thomas Claburn, "Anthropic: No, Absolutely Not, You May Not Use Third-Party Harnesses With Claude Subs," The Register, February 20, 2026, https://www.theregister.com/2026/02/20/anthropic_clarifies_ban_third_party_claude_access/. ↩
"Feature: Auto-extract text from inbound PDF/document attachments," Issue #28818, openclaw/openclaw, GitHub, February 27, 2026, https://github.com/openclaw/openclaw/issues/28818. ↩
"openclaw update silently drops user-added EnvironmentFile/Environment directives when regenerating systemd unit," Issue #66248, openclaw/openclaw, GitHub, April 2026, https://github.com/openclaw/openclaw/issues/66248. ↩

DEV Community