<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jeanfrancois Arcand</title>
    <description>The latest articles on DEV Community by Jeanfrancois Arcand (@jfarcand).</description>
    <link>https://dev.to/jfarcand</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3817009%2F2ab1b17e-9dff-42fe-9bad-7916c4e3d9fe.jpeg</url>
      <title>DEV Community: Jeanfrancois Arcand</title>
      <link>https://dev.to/jfarcand</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jfarcand"/>
    <language>en</language>
    <item>
      <title>I built an OpenAI-compatible server that routes to whichever AI CLI you have installed</title>
      <dc:creator>Jeanfrancois Arcand</dc:creator>
      <pubDate>Tue, 10 Mar 2026 14:42:15 +0000</pubDate>
      <link>https://dev.to/jfarcand/what-happens-when-you-treat-ai-clis-as-llm-backends-k0h</link>
      <guid>https://dev.to/jfarcand/what-happens-when-you-treat-ai-clis-as-llm-backends-k0h</guid>
      <description>&lt;p&gt;I've been building dev tools that need LLM completions. The usual path is: pick a provider, wire up their SDK, manage API keys, handle their specific response format. Repeat for each provider you want to support.&lt;/p&gt;

&lt;p&gt;But I already had &lt;code&gt;claude code&lt;/code&gt;, &lt;code&gt;copilot&lt;/code&gt;, and &lt;code&gt;codex&lt;/code&gt; installed on my machine. They're authenticated, they have access to the models, they even have their own billing. Why am I signing up for separate API keys?&lt;/p&gt;

&lt;p&gt;So I tried something different: what if I just spawned those CLI tools as subprocesses and parsed their output?&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea
&lt;/h2&gt;

&lt;p&gt;Every AI coding CLI emits structured output in some format — JSON, NDJSON, JSONL — when you ask nicely. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Claude Code&lt;/span&gt;
claude &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"hello"&lt;/span&gt; &lt;span class="nt"&gt;--output-format&lt;/span&gt; json

&lt;span class="c"&gt;# GitHub Copilot&lt;/span&gt;
copilot &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"hello"&lt;/span&gt;

&lt;span class="c"&gt;# Codex&lt;/span&gt;
codex &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="s2"&gt;"hello"&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="nt"&gt;--full-auto&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output varies wildly between tools, but the information is the same: a text completion, maybe a model name, maybe token counts. So I wrote parsers for each one and put a common trait on top.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb6y2425cjr08kx3mo6g.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbb6y2425cjr08kx3mo6g.jpeg" alt="Ski sign in Québec: RALENTISSEZ — slow down, tricky turns&amp;lt;br&amp;gt;
  ahead" width="800" height="1066"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That became &lt;a href="https://github.com/dravr-ai/dravr-embacle" rel="noopener noreferrer"&gt;embacle&lt;/a&gt;, a Rust library. But honestly, most people don't need the Rust library directly. What turned out to be more useful is what sits on top of it.&lt;/p&gt;
&lt;h2&gt;
  
  
  embacle-server: drop-in replacement for OpenAI's API
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;embacle-server&lt;/code&gt; is an OpenAI-compatible HTTP server. You start it, point your existing OpenAI client at it, and it routes requests to whichever CLI tool you want.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;embacle-server
embacle-server &lt;span class="nt"&gt;--provider&lt;/span&gt; claude_code &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now any OpenAI client works against it. Python, TypeScript, curl — doesn't matter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "claude:opus",
    "messages": [{"role": "user", "content": "hello"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get back a standard OpenAI-shaped response. SSE streaming works too — same &lt;code&gt;data: [DONE]&lt;/code&gt; protocol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important caveat:&lt;/strong&gt; this is not free. The CLI tools use your existing subscriptions. If you're paying for Claude Code or Copilot, embacle lets your other applications use those same tokens through a standard API. It doesn't bypass billing — it reuses the access you already have.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model routing
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;model&lt;/code&gt; field determines which backend handles the request. Prefix with the provider name to be explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Route to Claude Code&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;: &lt;span class="s2"&gt;"claude:opus"&lt;/span&gt;, &lt;span class="s2"&gt;"messages"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;...]&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Route to Copilot&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;: &lt;span class="s2"&gt;"copilot:gpt-4o"&lt;/span&gt;, &lt;span class="s2"&gt;"messages"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;...]&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Route to Gemini&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"model"&lt;/span&gt;: &lt;span class="s2"&gt;"gemini:gemini-2.5-pro"&lt;/span&gt;, &lt;span class="s2"&gt;"messages"&lt;/span&gt;: &lt;span class="o"&gt;[&lt;/span&gt;...]&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or pass a bare model name and the server's default provider handles it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiplex: fan out to multiple providers
&lt;/h3&gt;

&lt;p&gt;This one surprised me by being useful. You can send the same prompt to multiple providers simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8080/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": ["claude:opus", "copilot:gpt-4o"], "messages": [...]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You get back an array of responses. Good for comparing outputs or doing consensus-based validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tool calling (even when the CLI doesn't support it)
&lt;/h3&gt;

&lt;p&gt;Here's where it gets interesting. Most CLI tools don't have native function calling. They just emit text. But the OpenAI API expects &lt;code&gt;tool_calls&lt;/code&gt; in the response.&lt;/p&gt;

&lt;p&gt;embacle bridges this with a text-based simulation layer. It injects a tool catalog into the system prompt, asks the LLM to emit &lt;code&gt;&amp;lt;tool_call&amp;gt;&lt;/code&gt; XML blocks, parses them out, and returns proper OpenAI-shaped &lt;code&gt;tool_calls&lt;/code&gt; in the response. The caller never knows the difference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"properties"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;}}}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool_choice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"auto"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What's the weather in Montréal?"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works with any of the 10 supported CLI backends. The simulation isn't perfect — it depends on the LLM following the XML format — but in practice it works reliably with Claude Code, GPT-5.3-codex, and Github Copilot.&lt;/p&gt;

&lt;h2&gt;
  
  
  embacle-mcp: MCP server
&lt;/h2&gt;

&lt;p&gt;If you're in the MCP ecosystem (Claude Desktop, Cursor, etc.), &lt;code&gt;embacle-mcp&lt;/code&gt; exposes the same providers via JSON-RPC 2.0 over stdio or HTTP/SSE. Install it, add it to your MCP config, and you can use any of the 10 CLI tools as an MCP tool server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;embacle-mcp
embacle-mcp &lt;span class="nt"&gt;--transport&lt;/span&gt; stdio &lt;span class="nt"&gt;--provider&lt;/span&gt; claude_code
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  For Rust developers
&lt;/h2&gt;

&lt;p&gt;OK, if you've read this far and you write Rust, here's the library-level view.&lt;/p&gt;

&lt;p&gt;All 10 runners implement the same trait:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[async_trait]&lt;/span&gt;
&lt;span class="k"&gt;pub&lt;/span&gt; &lt;span class="k"&gt;trait&lt;/span&gt; &lt;span class="n"&gt;LlmProvider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Send&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nb"&gt;Sync&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;default_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnerError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;complete_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;ChatStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnerError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;health_check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnerError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using it is straightforward. First, add the dependency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;embacle&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.8.1"&lt;/span&gt;
&lt;span class="py"&gt;tokio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"full"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pick a runner and call &lt;code&gt;complete()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;path&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;PathBuf&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;embacle&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;ClaudeCodeRunner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunnerConfig&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;embacle&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;types&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LlmProvider&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;dyn&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;error&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Error&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;RunnerConfig&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;PathBuf&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"claude"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;ClaudeCodeRunner&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;vec!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nn"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Respond with exactly: PING_OK"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="nf"&gt;.complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="py"&gt;.content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(())&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, this spawns &lt;code&gt;claude -p "Respond with exactly: PING_OK" --output-format json&lt;/code&gt;, reads stdout, parses the JSON, and gives you back a typed &lt;code&gt;ChatResponse&lt;/code&gt;. If the process times out, exits non-zero, or emits garbage, you get a typed &lt;code&gt;RunnerError&lt;/code&gt; — not a panic.&lt;/p&gt;

&lt;p&gt;Because every runner implements the same trait, you can swap providers, build fallback chains, add metrics decorators, or run structured output extraction on top — all through the same interface. The &lt;a href="https://docs.rs/embacle" rel="noopener noreferrer"&gt;docs.rs page&lt;/a&gt; has working examples for each of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoffs
&lt;/h2&gt;

&lt;p&gt;I should be honest about what this approach gives up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latency.&lt;/strong&gt; Spawning a subprocess per request is slower than a persistent HTTP connection. For interactive use it's fine. For high-throughput batch processing, you'll feel it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No streaming everywhere.&lt;/strong&gt; Some CLIs don't support streaming output. embacle falls back to buffered mode, but you won't get token-by-token delivery from every provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Depends on installed tools.&lt;/strong&gt; If &lt;code&gt;claude&lt;/code&gt; isn't on the machine, the Claude runner won't work. The &lt;code&gt;discovery&lt;/code&gt; module probes for binaries at startup, but you're still coupled to what's installed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool simulation is best-effort.&lt;/strong&gt; The XML-based tool calling works well in practice but isn't a guarantee. If the LLM doesn't follow the format, the parse fails and you get the raw text back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my use case — an internal platform where multiple services need LLM completions and the machine already has authenticated CLI tools — the tradeoffs are worth it. Zero API key management, one interface, and I can switch between Claude and Copilot by changing a string.&lt;/p&gt;




&lt;p&gt;Source: &lt;a href="https://github.com/dravr-ai/dravr-embacle" rel="noopener noreferrer"&gt;github.com/dravr-ai/dravr-embacle&lt;/a&gt; — Apache-2.0.&lt;/p&gt;

&lt;p&gt;Install from source with &lt;code&gt;cargo install embacle-server&lt;/code&gt; (or &lt;code&gt;embacle-mcp&lt;/code&gt;), or grab a prebuilt binary from the &lt;a href="https://github.com/dravr-ai/dravr-embacle/releases" rel="noopener noreferrer"&gt;GitHub releases page&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>openai</category>
      <category>mcp</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
