DEV Community

Shouvik Palit
Shouvik Palit

Posted on

Escalate the Model, Not the Conversation

Trooper started as a fallback proxy for agents. Claude hits a quota, falls back to Ollama, session continues. No crashes, no lost context.

The interesting problem that came up wasn't model routing. It was context preservation.

When you're debugging something hard, you build up context over many turns. The problem statement, what you've tried, what failed. When you switch from a local model to Claude, all of that context has to go with it. And when you come back to local, the local model needs to know what Claude said.

That's what 4.0 solves.


How it works

A local model handles requests by default. Fast, free, private.

When it gets stuck, one click escalates to Claude — the full conversation history is injected automatically. Claude answers. Then control returns to the local model, which continues the conversation knowing exactly what Claude said.

No copy-pasting. No restarting the conversation. No lost context.

The escalation moment

You're debugging a slow Postgres query. Llama gives you a decent answer — check your EXPLAIN output, look for function calls on indexed columns. Good start.

Not enough. You hit Escalate.

Claude receives the full session. It knows you're debugging a slow query. It knows what Llama already told you. It picks up exactly where the conversation left off.

You click Back to local.

Now ask Llama to summarize what Claude said.

It does. Correctly. Because the session store was updated with Claude's response. Llama reads the full history including what Claude said and continues from there.

What's under the hood

Trooper is a Go proxy that sits between your client and any LLM provider. The chat UI is a static HTML file served by the same process.

When you escalate:

  1. The UI fetches the full session history from /session/:id
  2. Sends it to Claude via /v1/messages with X-Force-Cloud: true
  3. Claude's response gets written back to the session store via /session/:id/append
  4. Next local turn, Llama reads the full history including Claude's response

The SITREP panel on the right extracts intent, confidence, entities and open loops from the conversation using a rule-based classifier — no LLM call needed.


The proxy still works

The proxy layer is unchanged. Agents, SDK clients, curl — everything routes through /v1/messages the same way it always did.

# Agent flow — unchanged
export ANTHROPIC_BASE_URL=http://localhost:3000
export OPENAI_BASE_URL=http://localhost:3000

# Chat UI — new in 4.0
open http://localhost:3000/chat
Enter fullscreen mode Exit fullscreen mode

Try it

git clone https://github.com/shouvik12/trooper
cd trooper
export CLAUDE_API_KEY=sk-ant-...  # optional
export OLLAMA_MODEL=llama3.1:8b
go run .
open http://localhost:3000/chat
Enter fullscreen mode Exit fullscreen mode

Works without a Claude key-escalation falls back to Ollama. Add the key when you want real cloud escalation.


github.com/shouvik12/trooper

Top comments (0)