<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Luke Manning</title>
    <description>The latest articles on DEV Community by Luke Manning (@manningworks).</description>
    <link>https://dev.to/manningworks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878613%2F4da04de4-7eb2-47fe-afa3-3b00648b7332.png</url>
      <title>DEV Community: Luke Manning</title>
      <link>https://dev.to/manningworks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/manningworks"/>
    <language>en</language>
    <item>
      <title>Straico Has Great Models But No Streaming, So I Built a Proxy</title>
      <dc:creator>Luke Manning</dc:creator>
      <pubDate>Wed, 15 Apr 2026 15:08:54 +0000</pubDate>
      <link>https://dev.to/manningworks/straico-has-great-models-but-no-streaming-so-i-built-a-proxy-31b7</link>
      <guid>https://dev.to/manningworks/straico-has-great-models-but-no-streaming-so-i-built-a-proxy-31b7</guid>
      <description>&lt;p&gt;I use &lt;a href="https://opencode.ai/" rel="noopener noreferrer"&gt;OpenCode&lt;/a&gt; as my main AI coding tool. I switched from Claude Code after Anthropic started going after open source projects and I &lt;a href="https://github.com/anthropics/claude-code/issues/38335" rel="noopener noreferrer"&gt;kept hitting session limits on my subscription&lt;/a&gt; extremely fast. &lt;/p&gt;

&lt;p&gt;OpenCode works with any OpenAI-compatible API. &lt;a href="https://straico.com/" rel="noopener noreferrer"&gt;Straico&lt;/a&gt; gives me access to Claude, GPT, Gemini, DeepSeek, and a bunch more through a single API key. Cheap too. Problem is, Straico's API is missing two things OpenCode needs: streaming responses and function calling.&lt;/p&gt;

&lt;p&gt;Without streaming, OpenCode just hangs. Never gets a response. But Straico keeps eating tokens on their end anyway. Without function calling, the AI can't use tools like reading files or running bash commands. Both are non-negotiable for an agentic coding tool.&lt;/p&gt;

&lt;p&gt;So I built a proxy. It sits between OpenCode and Straico, translating requests and responses to fill in the gaps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenCode
  → localhost:8000 (my proxy)
    → Straico API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What started as "just simulate streaming and inject tool definitions" turned into a surprisingly full-featured thing. The codebase is at &lt;a href="https://github.com/ManningWorks/DOAI-Proxy" rel="noopener noreferrer"&gt;github.com/ManningWorks/DOAI-Proxy&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture I Ended Up With
&lt;/h2&gt;

&lt;p&gt;I didn't start with a provider pattern. I started with four files: &lt;code&gt;server.js&lt;/code&gt;, &lt;code&gt;streaming.js&lt;/code&gt;, &lt;code&gt;tools.js&lt;/code&gt;, &lt;code&gt;utils.js&lt;/code&gt;. But once I started thinking about the possibility of adding other providers down the line, I refactored into something cleaner.&lt;/p&gt;

&lt;p&gt;The provider pattern lives in &lt;code&gt;providers/&lt;/code&gt;. &lt;code&gt;BaseProvider&lt;/code&gt; is an abstract class that handles the interface contract and retry logic. &lt;code&gt;StraicoProvider&lt;/code&gt; extends it with Straico-specific request/response transformation. &lt;code&gt;ProviderFactory&lt;/code&gt; instantiates the right one based on the &lt;code&gt;PROVIDER_TYPE&lt;/code&gt; env var.&lt;/p&gt;

&lt;p&gt;Right now only Straico exists, but the factory already has stubs for OpenAI and Anthropic. The &lt;code&gt;ADDING_PROVIDERS.md&lt;/code&gt; doc in the repo lays out how to add a new one.&lt;/p&gt;

&lt;p&gt;The other modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;server.js&lt;/code&gt; - Express server, routing, auth, request lifecycle&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;streaming.js&lt;/code&gt; - SSE simulation with two modes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tools.js&lt;/code&gt; - Tool injection and response parsing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;utils.js&lt;/code&gt; - Logging, formatting, log rotation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;utils/model-limits.js&lt;/code&gt; - Fetches context limits from Straico's API&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;summarizer.js&lt;/code&gt; - Conversation summarization for long sessions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scripts/sync-opencode-config.js&lt;/code&gt; - Syncs model list to OpenCode config&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Streaming Without Streaming
&lt;/h2&gt;

&lt;p&gt;Straico returns the full response at once. No SSE. No chunks. The proxy has to fake it.&lt;/p&gt;

&lt;p&gt;Two modes: &lt;code&gt;none&lt;/code&gt; and &lt;code&gt;smart&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;none&lt;/code&gt; is what I'd recommend as default. It sends the entire response in one SSE chunk, then the &lt;code&gt;[DONE]&lt;/code&gt; marker. Fast, no formatting issues, still technically SSE.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;smart&lt;/code&gt; is more interesting. It splits the response into chunks with delays to simulate real streaming. The naive approach is &lt;code&gt;responseText.match(new RegExp('.{1,15}', 'g'))&lt;/code&gt; and that kind of works. But it breaks markdown. Split mid-bold, mid-code-block, mid-backtick and the rendering glitches.&lt;/p&gt;

&lt;p&gt;So &lt;code&gt;smartChunkText()&lt;/code&gt; in &lt;code&gt;streaming.js&lt;/code&gt; looks for safe boundaries. It prefers splitting on newlines, then whitespace. It also checks for markdown delimiters and extends the chunk to avoid splitting them. There's a max size limit (&lt;code&gt;targetSize * 10&lt;/code&gt;) to prevent infinite extension.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// streaming.js - simplified version of the boundary logic&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delim&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;**&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;__&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;`&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delimStart&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delimStart&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;delimStart&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delimEnd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;delimStart&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;delim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delimEnd&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;end&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delimEnd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;start&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;maxSize&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Default is 15 characters per chunk with 80ms delay. That feels about right for most models. Configurable via &lt;code&gt;STREAM_CHUNK_SIZE&lt;/code&gt; and &lt;code&gt;STREAM_DELAY_MS&lt;/code&gt; env vars.&lt;/p&gt;

&lt;p&gt;I set &lt;code&gt;STREAM_MODE=none&lt;/code&gt; as the recommended default. &lt;code&gt;smart&lt;/code&gt; works but it's more of a showcase thing. The boundary detection catches most cases but I wouldn't trust it with complex nested markdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  Function Calling via Prompt Injection
&lt;/h2&gt;

&lt;p&gt;Straico doesn't support function calling natively. The workaround: inject tool definitions into the system prompt and parse the AI's response to detect tool calls.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;injectToolsIntoSystem()&lt;/code&gt; in &lt;code&gt;tools.js&lt;/code&gt; appends a formatted list of available tools to the system message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You have access to the following tools:
- bash: Run bash commands
- read: Read file contents

When you need to use a tool, format your response like this:
TOOL_CALL: &amp;lt;tool_name&amp;gt;
ARGUMENTS: &amp;lt;json_arguments&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a sentinel comment (&lt;code&gt;&amp;lt;!-- proxy-tools-injected --&amp;gt;&lt;/code&gt;) to prevent double-injection if the same messages get processed twice.&lt;/p&gt;

&lt;p&gt;The tricky part is parsing. Different models output tool calls in different formats. I ended up with four parsers that run in sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Minimax XML&lt;/strong&gt; - &lt;code&gt;&amp;lt;minimax:tool_call&amp;gt;&lt;/code&gt; with &lt;code&gt;&amp;lt;invoke&amp;gt;&lt;/code&gt; tags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude XML&lt;/strong&gt; - &lt;code&gt;&amp;lt;invoke name="..."&amp;gt;&lt;/code&gt; with &lt;code&gt;&amp;lt;parameter_list&amp;gt;&lt;/code&gt; tags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Native&lt;/strong&gt; - JSON with &lt;code&gt;"tool_calls": [...]&lt;/code&gt; embedded in the response&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Format&lt;/strong&gt; - The &lt;code&gt;TOOL_CALL: / ARGUMENTS:&lt;/code&gt; format from the injection prompt&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each parser tries to extract tool calls from the response text. The first one that succeeds wins. This was a gradual thing. I started with just the text format parser. Then Minimax models returned XML. Then Claude models returned different XML. Then some models returned JSON that looked like OpenAI's format. Four parsers later and it handles most cases.&lt;/p&gt;

&lt;p&gt;The text format parser was the hardest to get right. Matching &lt;code&gt;TOOL_CALL: tool_name ARGUMENTS: {json}&lt;/code&gt; seems simple until the JSON contains nested objects, strings with braces, or the model forgets the space between the tool name and ARGUMENTS. The implementation tracks brace depth to find where the JSON actually ends:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;argsStartIndex&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;char&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;responseText&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;char&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;braceCount&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;char&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;braceCount&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;braceCount&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;argsEndIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;foundClosingBrace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The proxy also validates tool calls against the list of available tools. If the model invents a tool that doesn't exist, it gets filtered out. If all tool calls are invalid, the response is treated as regular text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Call Streaming
&lt;/h2&gt;

&lt;p&gt;OpenCode expects tool calls to arrive as SSE chunks, same as regular text. &lt;code&gt;streamToolCalls()&lt;/code&gt; in &lt;code&gt;streaming.js&lt;/code&gt; sends an init chunk with the tool name and ID, then an args chunk with the arguments, then a final chunk with &lt;code&gt;finish_reason: 'tool_calls'&lt;/code&gt;. Each chunk has a small delay (20ms, 10ms, 20ms) to feel like actual streaming.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conversation Summarization
&lt;/h2&gt;

&lt;p&gt;This one sneaked up on me. Straico has model context limits. Some models have 8k tokens, some have 128k. OpenCode sends the entire conversation history with every request. In a long coding session, that history grows fast.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;summarizer.js&lt;/code&gt; checks if the estimated token count is approaching the model's limit. When it hits a configurable threshold (default 70% of the model's &lt;code&gt;word_limit&lt;/code&gt;), it takes all but the most recent messages, sends them to Straico for summarization, and replaces them with a single summary message.&lt;/p&gt;

&lt;p&gt;The summarization itself uses Straico's &lt;code&gt;smart_llm_selector&lt;/code&gt; with &lt;code&gt;pricing_method: balance&lt;/code&gt;, so it picks a cheap model for the summary. Configurable via &lt;code&gt;SUMMARIZATION_MODEL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I'm still not 100% sure this is the right approach. The summary is lossy. Sometimes the model needs context from earlier messages that the summary glossed over. But without it, long sessions just fail with context limit errors. Tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Limits and Validation
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;utils/model-limits.js&lt;/code&gt; fetches all available models from Straico's &lt;code&gt;/models&lt;/code&gt; endpoint at startup. It caches their context limits (&lt;code&gt;word_limit&lt;/code&gt;) and max output tokens (&lt;code&gt;max_output&lt;/code&gt;). The proxy uses this to validate incoming requests. If &lt;code&gt;estimated_input_tokens + max_tokens &amp;gt; word_limit&lt;/code&gt;, it rejects the request with a 400 error before even hitting Straico.&lt;/p&gt;

&lt;p&gt;The model list is also exposed at &lt;code&gt;/v1/models&lt;/code&gt; so OpenCode can discover what's available. There's an admin endpoint at &lt;code&gt;/v1/admin/refresh-models&lt;/code&gt; to force a refresh if Straico adds new models.&lt;/p&gt;

&lt;p&gt;The sync script (&lt;code&gt;scripts/sync-opencode-config.js&lt;/code&gt;) goes one step further. It fetches the model list from Straico, then updates &lt;code&gt;~/.config/opencode/opencode.json&lt;/code&gt; with all chat-type models. The Docker entrypoint runs this script before starting the server, so the model list is always current.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;p&gt;Four modes, controlled by &lt;code&gt;AUTH_MODE&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;required&lt;/code&gt; - Needs &lt;code&gt;PROXY_API_KEY&lt;/code&gt;, rejects requests without it. Default in production.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;optional&lt;/code&gt; - Uses the key if set, warns if not. Default in development.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;disabled&lt;/code&gt; - No auth. For isolated environments.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;external&lt;/code&gt; - Trusts an external auth header. For when the proxy sits behind an API gateway or service mesh.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key comparison uses &lt;code&gt;crypto.timingSafeEqual&lt;/code&gt; to prevent timing attacks. Took me a moment to realise I needed buffer length checks too, since &lt;code&gt;timingSafeEqual&lt;/code&gt; throws if the buffers are different lengths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retry and Graceful Shutdown
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;BaseProvider.makeRequestWithRetry()&lt;/code&gt; wraps every API call with exponential backoff. Retries on 429, 5xx, and network errors (&lt;code&gt;ECONNREFUSED&lt;/code&gt;, &lt;code&gt;ECONNRESET&lt;/code&gt;, &lt;code&gt;ETIMEDOUT&lt;/code&gt;). Default is 3 attempts with a 1-second base delay.&lt;/p&gt;

&lt;p&gt;Graceful shutdown was one of those things I didn't think about until I ran into issues. When Docker sends SIGTERM, the proxy stops accepting new requests and waits for active ones to drain. There's a timeout (default 30 seconds) after which it force-exits. Without this, long-running streaming responses would get cut off mid-chunk when the container restarted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Docker Setup
&lt;/h2&gt;

&lt;p&gt;The Dockerfile uses &lt;code&gt;node:18-alpine&lt;/code&gt; and an entrypoint script. The entrypoint runs the OpenCode config sync, then starts the server.&lt;/p&gt;

&lt;p&gt;Docker Compose mounts two volumes. The &lt;code&gt;.env&lt;/code&gt; file for config. And &lt;code&gt;~/.config/opencode&lt;/code&gt; so the sync script can write to the OpenCode config file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./.env:/app/.env:ro&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;~/.config/opencode:/root/.config/opencode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One thing I got wrong initially was the Dockerfile &lt;code&gt;CMD&lt;/code&gt;. I had &lt;code&gt;CMD ["node", "server.js"]&lt;/code&gt; which meant the config sync never ran. Switched to &lt;code&gt;ENTRYPOINT ["/app/docker-entrypoint.sh"]&lt;/code&gt; and that fixed it. Small thing, but it meant every container restart would have stale model lists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Straico-Specific Quirks
&lt;/h2&gt;

&lt;p&gt;Straico's API is mostly OpenAI-compatible but with some differences that caught me out.&lt;/p&gt;

&lt;p&gt;Tool result messages use &lt;code&gt;role: "tool"&lt;/code&gt; in OpenAI format. Straico doesn't support that role. The proxy converts them to &lt;code&gt;role: "user"&lt;/code&gt; with a &lt;code&gt;[Tool Result]:&lt;/code&gt; prefix. Same with assistant messages that contain tool calls. Those get converted to the text format the injection prompt expects.&lt;/p&gt;

&lt;p&gt;Empty assistant messages get filtered out entirely. Some models return an assistant message with empty content before making a tool call. Straico chokes on those.&lt;/p&gt;

&lt;p&gt;There's a &lt;code&gt;TOOL_RESULT_MAX_LENGTH&lt;/code&gt; env var that truncates large tool outputs. Some tool results (file reads, command output) can be massive. Without truncation, they blow out the context window and the next request fails.&lt;/p&gt;

&lt;p&gt;The proxy also normalises messages. OpenAI sends content as arrays of objects (text parts, image parts, system reminders). The proxy flattens those into plain strings and strips out &lt;code&gt;&amp;lt;system-reminder&amp;gt;&lt;/code&gt; tags. Straico doesn't know what to do with the array format.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;The provider pattern is solid but I'd start with it from the beginning rather than refactoring into it. The four-file structure worked fine until I wanted to add features that crossed module boundaries. The abstraction would have saved me some reshuffling.&lt;/p&gt;

&lt;p&gt;The smart streaming mode is neat but I'd think harder about whether it's worth the complexity. The boundary detection handles most markdown but not all edge cases. &lt;code&gt;none&lt;/code&gt; mode is faster and more reliable. I use &lt;code&gt;none&lt;/code&gt; day to day.&lt;/p&gt;

&lt;p&gt;The summarization feature is the part I'm least confident about. It works, but the lossy compression means sometimes context gets dropped at exactly the wrong moment. I might revisit this with a sliding window approach instead of a hard summarize-and-replace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where It Stands
&lt;/h2&gt;

&lt;p&gt;The proxy handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All 90+ Straico models through a single endpoint&lt;/li&gt;
&lt;li&gt;Streaming simulation (both modes)&lt;/li&gt;
&lt;li&gt;Function calling with four parser strategies&lt;/li&gt;
&lt;li&gt;Conversation summarization for long sessions&lt;/li&gt;
&lt;li&gt;Model context validation&lt;/li&gt;
&lt;li&gt;Authentication with four modes&lt;/li&gt;
&lt;li&gt;Retry with exponential backoff&lt;/li&gt;
&lt;li&gt;Graceful shutdown with request draining&lt;/li&gt;
&lt;li&gt;Docker deployment with automatic model sync&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It runs on my machine and OpenCode talks to it at &lt;code&gt;http://localhost:8000/v1&lt;/code&gt;. Works well enough that I don't think about it most of the time. Which is exactly what a proxy should do.&lt;/p&gt;

&lt;p&gt;The code is on GitHub if you want to look or use it. Or add a provider. The architecture supports it.&lt;/p&gt;

</description>
      <category>straico</category>
      <category>openai</category>
      <category>showdev</category>
      <category>opencode</category>
    </item>
  </channel>
</rss>
