<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shouvik Palit</title>
    <description>The latest articles on DEV Community by Shouvik Palit (@shouvik12).</description>
    <link>https://dev.to/shouvik12</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890793%2Fc3274ea7-2b55-4b67-9d70-2f3c08b63374.png</url>
      <title>DEV Community: Shouvik Palit</title>
      <link>https://dev.to/shouvik12</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shouvik12"/>
    <language>en</language>
    <item>
      <title>I Tested Privacy-Aware Routing with 4 AI Agents: What Actually Stayed Local</title>
      <dc:creator>Shouvik Palit</dc:creator>
      <pubDate>Wed, 13 May 2026 02:40:52 +0000</pubDate>
      <link>https://dev.to/shouvik12/i-tested-privacy-aware-routing-with-4-ai-agents-what-actually-stayed-local-39oa</link>
      <guid>https://dev.to/shouvik12/i-tested-privacy-aware-routing-with-4-ai-agents-what-actually-stayed-local-39oa</guid>
      <description>&lt;p&gt;Following up on my &lt;a href="https://dev.to/shouvik12/how-i-built-a-go-proxy-that-keeps-your-llm-conversation-alive-when-cloud-quota-runs-out-8k5"&gt;earlier Trooper experiments&lt;/a&gt;, I wanted to see if per-request privacy routing actually works in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; 4 agents running simultaneously. Some handling public knowledge (OAuth security, Redis vs Memcached). Others handling sensitive data (API keys, customer PII).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The rule:&lt;/strong&gt; Credentials and PII stay on my machine. Everything else can use Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Each agent gets a &lt;code&gt;x_force_local&lt;/code&gt; flag:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent 1 - security-analyst (☁️ Claude)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "What are the top 3 OAuth2 vulnerabilities?"  
Routing: Public knowledge, let Claude handle it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Agent 2 - credential-formatter (🔒 Qwen local)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Task:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Format as JSON: api_key=sk-prod-x7f9k2m, vault_url=https://vault.acme.io:8200"&lt;/span&gt;&lt;span class="w"&gt;  
&lt;/span&gt;&lt;span class="err"&gt;Routing:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Contains&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;credentials&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;must&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;stay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;machine&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Agent 3 - architecture-advisor (☁️ Claude)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Task: "Redis or Memcached for session storage?"  
Routing: General best practices, use cloud
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Agent 4 - compliance-reporter (🔒 Qwen local)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;`Task: "Summarize: 47 tickets today. 3 had PII (Alice Johnson, Bob Chen, Maria Garcia)"  
Routing: Contains customer names — privacy violation if sent to cloud`
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvsb7hdks8f7yurzyfn5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftvsb7hdks8f7yurzyfn5.png" alt=" " width="800" height="487"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every agent completed successfully:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud agents:&lt;/strong&gt; 3.8s and 2.4s (Claude handled complex reasoning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local agents:&lt;/strong&gt; 2.4s and 1.2s (Qwen formatted data locally)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The critical part:&lt;/strong&gt; API keys, vault URLs, and customer names never left my machine. Zero network calls to Anthropic for those two agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened Under the Hood
&lt;/h2&gt;

&lt;p&gt;When Agent 2 (credential-formatter) ran with &lt;code&gt;x_force_local: true&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Request intercepted by Trooper proxy&lt;/li&gt;
&lt;li&gt;Privacy flag detected&lt;/li&gt;
&lt;li&gt;Routed to local Ollama instead of Claude API&lt;/li&gt;
&lt;li&gt;Session context maintained via 3-layer system (Anchor/SITREP/Tail)&lt;/li&gt;
&lt;li&gt;JSON response returned — credentials never hit the network&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The vault URL and API key stayed on my hardware.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;Using the OpenAI SDK (works with any OpenAI-compatible client):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-anthropic-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Trooper proxy
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Regular request → Claude
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OAuth2 vulnerabilities?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Session-ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;security-analyst&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Privacy request → Qwen local
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Format: api_key=sk-prod...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Session-ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credential-formatter&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x_force_local&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;# This keeps it local
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire API. One boolean flag controls routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Most LLM proxies route between cloud providers. LiteLLM falls back from Claude to OpenAI. That's useful for uptime, but both destinations are someone else's servers.&lt;/p&gt;

&lt;p&gt;Trooper's &lt;code&gt;x_force_local&lt;/code&gt; routes to &lt;strong&gt;your machine&lt;/strong&gt;. Different failure mode, different privacy guarantee.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When you need it:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code refactoring with internal URLs&lt;/li&gt;
&lt;li&gt;Proprietary algorithms (not secret, just yours)&lt;/li&gt;
&lt;li&gt;Customer data that shouldn't leave your network&lt;/li&gt;
&lt;li&gt;Cost control (force expensive operations local)&lt;/li&gt;
&lt;li&gt;Offline work (flights, train rides, API outages)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When you don't:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Public API questions&lt;/li&gt;
&lt;li&gt;General best practices&lt;/li&gt;
&lt;li&gt;Complex reasoning that needs Claude's horsepower&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point isn't "local always" or "cloud always." It's per-request control based on what you're asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Context Preservation Works
&lt;/h2&gt;

&lt;p&gt;The hardest part of routing isn't switching models — it's maintaining conversation state.&lt;/p&gt;

&lt;p&gt;Trooper uses a 3-layer compaction system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;**Anchor (~10%):** First 2 turns verbatim, never dropped  
**SITREP (~20%):** Rule-based summary of middle turns  
**Tail (~70%):** Last N turns verbatim
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Total budget: 6144 tokens (configurable)&lt;/p&gt;

&lt;p&gt;When Agent 4 (compliance-reporter) ran locally, Qwen received the anchor, a compressed SITREP of what Claude said earlier, and the immediate context.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Doesn't Work Great
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Local models aren't Claude.&lt;/strong&gt; Qwen 2.5 is fast and solid for structured tasks (JSON formatting, parsing, summarization). But if you need deep reasoning, route to Claude.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context compression is lossy.&lt;/strong&gt; Trooper compresses middle turns into summaries. For precision-critical workflows, keep sessions short or increase the context window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You need Ollama running.&lt;/strong&gt; This isn't plug-and-play:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull qwen2.5:3b
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use &lt;code&gt;qwen2.5:3b&lt;/code&gt; (2GB, fast) for most tasks. Switch to &lt;code&gt;7b&lt;/code&gt; (5GB) when I need better output quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compared to My Previous Post
&lt;/h2&gt;

&lt;p&gt;Last time I showed what happens when Claude quota runs out: Trooper automatically falls back to Ollama with context preserved. That's &lt;strong&gt;reactive&lt;/strong&gt; — something breaks, the system recovers.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;proactive&lt;/strong&gt;: you tell it "keep this request local" before sending. Different problem, same underlying context system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Pull local model&lt;/span&gt;
ollama pull qwen2.5:3b

&lt;span class="c"&gt;# 2. Clone and run Trooper&lt;/span&gt;
git clone https://github.com/shouvik12/trooper
&lt;span class="nb"&gt;cd &lt;/span&gt;trooper
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
go run main.go providers.go classifier.go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Trooper starts on &lt;code&gt;localhost:3000&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Point any OpenAI-compatible client at it and add &lt;code&gt;x_force_local: true&lt;/code&gt; when you want privacy routing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/shouvik12/trooper" rel="noopener noreferrer"&gt;https://github.com/shouvik12/trooper&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback welcome — especially on edge cases or use cases I haven't considered.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is v3.1. The x_force_local feature shipped last week. Still iterating on auto-routing classification.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>privacy</category>
    </item>
    <item>
      <title>How I built a Go proxy that keeps your LLM conversation alive when cloud quota runs out</title>
      <dc:creator>Shouvik Palit</dc:creator>
      <pubDate>Sun, 03 May 2026 01:23:28 +0000</pubDate>
      <link>https://dev.to/shouvik12/how-i-built-a-go-proxy-that-keeps-your-llm-conversation-alive-when-cloud-quota-runs-out-8k5</link>
      <guid>https://dev.to/shouvik12/how-i-built-a-go-proxy-that-keeps-your-llm-conversation-alive-when-cloud-quota-runs-out-8k5</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
If you've ever been mid-conversation with Claude or GPT, hit a quota limit, and switched to a local Ollama model,you know the pain. The local model has zero context. It's like walking into a meeting 45 minutes late and nobody catches you up.&lt;br&gt;
I got frustrated enough to build something about it. That something is Trooper.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is Trooper&lt;/strong&gt;&lt;br&gt;
Trooper is a lightweight Go proxy (~850 lines, two files) that sits between your application and your LLM providers. When a cloud provider returns a quota error (429, 402, 529), Trooper automatically falls back to a local Ollama instance without dropping the conversation context.&lt;br&gt;
Single binary. Zero dependencies. Easy to audit since it sits in front of your API keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real problem: context loss on fallback&lt;/strong&gt;&lt;br&gt;
Most fallback proxies solve the routing problem but ignore the context problem. They either pass the raw message history as-is (which blows up the local model's context window) or they truncate the oldest turns (which kills continuity).&lt;br&gt;
Neither works well in practice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution: three-layer context compaction&lt;/strong&gt;&lt;br&gt;
Trooper uses a structured compaction strategy before handing off to Ollama:&lt;br&gt;
&lt;strong&gt;Anchor&lt;/strong&gt; : The first two turns of the conversation are always preserved. These establish the original intent and set the tone.&lt;br&gt;
&lt;strong&gt;SITREP&lt;/strong&gt; : The middle turns get compressed into a structured summary called a SITREP. It extracts intent, entities, open loops, recent actions, and resolved items. The local model gets situational awareness, not raw history.&lt;br&gt;
&lt;strong&gt;Tail&lt;/strong&gt; : The most recent turns are preserved within a configurable token budget.&lt;/p&gt;

&lt;p&gt;A real SITREP looks like this in the logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📦  Context compaction triggered — 538 tokens exceeds 500 budget
📦  Context compaction complete
    Total turns    : 7
    Anchor turns   : 2 (~43 tokens)
    Middle turns   : 2 → SITREP (~71 tokens)
    Recent turns   : 3 (~323 tokens)
    Tokens used    : 437 / 500
    SITREP         : intent="trooper" stage=unclear confidence=0.60 open=1 actions=0 resolved=0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The local model knows what you were working on, what's broken, what's been resolved, and what the last few exchanges were. That's enough to keep the conversation coherent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Go&lt;/strong&gt;&lt;br&gt;
Single binary distribution was the main reason. No runtime, no dependencies, drop it anywhere and it runs. The codebase being ~850 lines also means anyone can read the whole thing in an afternoon — important for something that proxies API keys.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider support&lt;/strong&gt;&lt;br&gt;
Trooper currently supports Claude, Gemini, and OpenAI as cloud providers with automatic fallback to Ollama. The provider chain is configurable via environment variables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's next&lt;/strong&gt;&lt;br&gt;
V3.0 is focused on foundation hardening — concurrency fixes and improved error handling. V3.1 will improve the SITREP extraction quality on longer conversations, which is where intent detection starts to degrade today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it&lt;/strong&gt;&lt;br&gt;
github.com/shouvik12/trooper&lt;br&gt;
Would love feedback on the context compaction approach — especially from anyone running larger local models. What's your cold-start latency on fallback?&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>llm</category>
      <category>ai</category>
      <category>go</category>
    </item>
  </channel>
</rss>
