DEV Community

Shouvik Palit
Shouvik Palit

Posted on

I Tested Privacy-Aware Routing with 4 AI Agents: What Actually Stayed Local

Following up on my earlier Trooper experiments, I wanted to see if per-request privacy routing actually works in practice.

The test: 4 agents running simultaneously. Some handling public knowledge (OAuth security, Redis vs Memcached). Others handling sensitive data (API keys, customer PII).

The rule: Credentials and PII stay on my machine. Everything else can use Claude.

The Setup

Each agent gets a x_force_local flag:

Agent 1 - security-analyst (☁️ Claude)

Task: "What are the top 3 OAuth2 vulnerabilities?"  
Routing: Public knowledge, let Claude handle it
Enter fullscreen mode Exit fullscreen mode

Agent 2 - credential-formatter (🔒 Qwen local)

Task: "Format as JSON: api_key=sk-prod-x7f9k2m, vault_url=https://vault.acme.io:8200"  
Routing: Contains credentials  must stay on machine
Enter fullscreen mode Exit fullscreen mode

Agent 3 - architecture-advisor (☁️ Claude)

Task: "Redis or Memcached for session storage?"  
Routing: General best practices, use cloud
Enter fullscreen mode Exit fullscreen mode

Agent 4 - compliance-reporter (🔒 Qwen local)

`Task: "Summarize: 47 tickets today. 3 had PII (Alice Johnson, Bob Chen, Maria Garcia)"  
Routing: Contains customer names — privacy violation if sent to cloud`
Enter fullscreen mode Exit fullscreen mode

The Result

Every agent completed successfully:

  • Cloud agents: 3.8s and 2.4s (Claude handled complex reasoning)
  • Local agents: 2.4s and 1.2s (Qwen formatted data locally)

The critical part: API keys, vault URLs, and customer names never left my machine. Zero network calls to Anthropic for those two agents.

What Happened Under the Hood

When Agent 2 (credential-formatter) ran with x_force_local: true:

  1. Request intercepted by Trooper proxy
  2. Privacy flag detected
  3. Routed to local Ollama instead of Claude API
  4. Session context maintained via 3-layer system (Anchor/SITREP/Tail)
  5. JSON response returned — credentials never hit the network

The vault URL and API key stayed on my hardware.

The Code

Using the OpenAI SDK (works with any OpenAI-compatible client):

from openai import OpenAI

client = OpenAI(
    api_key="your-anthropic-key",
    base_url="http://localhost:3000/v1",  # Trooper proxy
)

# Regular request → Claude
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "OAuth2 vulnerabilities?"}],
    extra_headers={"X-Session-ID": "security-analyst"}
)

# Privacy request → Qwen local
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Format: api_key=sk-prod..."}],
    extra_headers={"X-Session-ID": "credential-formatter"},
    extra_body={"x_force_local": True}  # This keeps it local
)
Enter fullscreen mode Exit fullscreen mode

That's the entire API. One boolean flag controls routing.

Why This Matters

Most LLM proxies route between cloud providers. LiteLLM falls back from Claude to OpenAI. That's useful for uptime, but both destinations are someone else's servers.

Trooper's x_force_local routes to your machine. Different failure mode, different privacy guarantee.

When you need it:

  • Code refactoring with internal URLs
  • Proprietary algorithms (not secret, just yours)
  • Customer data that shouldn't leave your network
  • Cost control (force expensive operations local)
  • Offline work (flights, train rides, API outages)

When you don't:

  • Public API questions
  • General best practices
  • Complex reasoning that needs Claude's horsepower

The point isn't "local always" or "cloud always." It's per-request control based on what you're asking.

How Context Preservation Works

The hardest part of routing isn't switching models — it's maintaining conversation state.

Trooper uses a 3-layer compaction system:

**Anchor (~10%):** First 2 turns verbatim, never dropped  
**SITREP (~20%):** Rule-based summary of middle turns  
**Tail (~70%):** Last N turns verbatim
Enter fullscreen mode Exit fullscreen mode

Total budget: 6144 tokens (configurable)

When Agent 4 (compliance-reporter) ran locally, Qwen received the anchor, a compressed SITREP of what Claude said earlier, and the immediate context.

What Doesn't Work Great

Local models aren't Claude. Qwen 2.5 is fast and solid for structured tasks (JSON formatting, parsing, summarization). But if you need deep reasoning, route to Claude.

Context compression is lossy. Trooper compresses middle turns into summaries. For precision-critical workflows, keep sessions short or increase the context window.

You need Ollama running. This isn't plug-and-play:

ollama pull qwen2.5:3b
ollama serve
Enter fullscreen mode Exit fullscreen mode

I use qwen2.5:3b (2GB, fast) for most tasks. Switch to 7b (5GB) when I need better output quality.

Compared to My Previous Post

Last time I showed what happens when Claude quota runs out: Trooper automatically falls back to Ollama with context preserved. That's reactive — something breaks, the system recovers.

This is proactive: you tell it "keep this request local" before sending. Different problem, same underlying context system.

Try It Yourself

# 1. Pull local model
ollama pull qwen2.5:3b

# 2. Clone and run Trooper
git clone https://github.com/shouvik12/trooper
cd trooper
export CLAUDE_API_KEY=sk-ant-...
go run main.go providers.go classifier.go
Enter fullscreen mode Exit fullscreen mode

Trooper starts on localhost:3000.

Point any OpenAI-compatible client at it and add x_force_local: true when you want privacy routing.

Repo: https://github.com/shouvik12/trooper

Feedback welcome — especially on edge cases or use cases I haven't considered.


This is v3.1. The x_force_local feature shipped last week. Still iterating on auto-routing classification.

Top comments (0)