<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lynkr</title>
    <description>The latest articles on DEV Community by Lynkr (@lynkr).</description>
    <link>https://dev.to/lynkr</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3645387%2F794ced23-25c9-41ed-863a-401839a48d59.png</url>
      <title>DEV Community: Lynkr</title>
      <link>https://dev.to/lynkr</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lynkr"/>
    <language>en</language>
    <item>
      <title>How to Configure LibreChat with Lynkr Using a Custom OpenAI-Compatible Endpoint</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Tue, 16 Jun 2026 05:02:25 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-configure-librechat-with-lynkr-using-a-custom-openai-compatible-endpoint-3423</link>
      <guid>https://dev.to/lynkr/how-to-configure-librechat-with-lynkr-using-a-custom-openai-compatible-endpoint-3423</guid>
      <description>&lt;h1&gt;
  
  
  How to Configure LibreChat with Lynkr Using a Custom OpenAI-Compatible Endpoint
&lt;/h1&gt;

&lt;p&gt;LibreChat is one of the best open-source AI chat and agent surfaces for teams that want self-hosting, MCP support, flexible model backends, and a real product surface instead of a demo UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt; is an open-source &lt;strong&gt;LLM gateway&lt;/strong&gt; built for coding assistants, agents, and MCP-heavy workflows. It gives you one OpenAI-compatible endpoint in front of multiple providers, with routing, caching, and cleaner model infrastructure behind it.&lt;/p&gt;

&lt;p&gt;Put together, they make a clean split:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LibreChat&lt;/strong&gt; handles the app layer: chat UI, agents, files, MCP, user workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr&lt;/strong&gt; handles the gateway layer: routing, provider switching, fallback, caching, and model control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why this pairing works so well. LibreChat already supports custom OpenAI-compatible endpoints, and Lynkr is a strong fit for the kind of multi-provider, tool-using, agentic traffic LibreChat users actually generate.&lt;/p&gt;

&lt;p&gt;This article is the practical follow-up to the architecture case: the goal here is to get LibreChat talking to Lynkr with a minimal working setup.&lt;/p&gt;

&lt;p&gt;I built Lynkr, so founder disclosure applies. I’m keeping this grounded to what LibreChat and Lynkr support today.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Lynkr fits LibreChat especially well
&lt;/h2&gt;

&lt;p&gt;There are plenty of tools that can sit between an app and a model provider, but Lynkr is a particularly strong fit for LibreChat for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-compatible endpoint&lt;/strong&gt;: LibreChat already has a clean seam for custom OpenAI-compatible APIs, which makes Lynkr easy to drop in underneath it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built for agentic traffic&lt;/strong&gt;: LibreChat is not just plain chat. It supports agents, MCP, tools, and more complex request patterns. Lynkr is designed for those heavier workflows, not just one-shot completions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing + caching in one place&lt;/strong&gt;: if you want LibreChat to stay clean at the product layer while the backend evolves, Lynkr gives you a better home for that logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider portability&lt;/strong&gt;: you can keep LibreChat stable while moving between Ollama, OpenRouter, Bedrock, OpenAI, and others behind the gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Good fit for MCP and coding workflows&lt;/strong&gt;: Lynkr was built around the kind of traffic these users actually generate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means this setup is not just “put any proxy in front of LibreChat.” It is specifically about using a gateway that matches the workload.&lt;/p&gt;




&lt;h2&gt;
  
  
  What you need
&lt;/h2&gt;

&lt;p&gt;Before starting, you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a running &lt;strong&gt;LibreChat&lt;/strong&gt; instance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js 20+&lt;/strong&gt; for Lynkr&lt;/li&gt;
&lt;li&gt;at least one backend provider configured in Lynkr&lt;/li&gt;
&lt;li&gt;one LibreChat endpoint that points to Lynkr instead of directly to a provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Repo references:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LibreChat:&lt;/strong&gt; &lt;a href="https://github.com/danny-avila/LibreChat" rel="noopener noreferrer"&gt;github.com/danny-avila/LibreChat&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two details matter here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;LibreChat explicitly supports &lt;strong&gt;custom OpenAI-compatible APIs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr exposes an &lt;strong&gt;OpenAI-compatible &lt;code&gt;/v1&lt;/code&gt; endpoint&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s the seam we’re using.&lt;/p&gt;




&lt;h2&gt;
  
  
  The target architecture
&lt;/h2&gt;

&lt;p&gt;The setup we want looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser
  ↓
LibreChat
  ↓
Lynkr (OpenAI-compatible endpoint)
  ↓
OpenAI / Bedrock / OpenRouter / Ollama / Anthropic-compatible backends / others
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In other words:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LibreChat owns the chat UI, agents, MCP, files, and workflows&lt;/li&gt;
&lt;li&gt;Lynkr owns routing, caching, and model-side control&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 1: Install and start Lynkr
&lt;/h2&gt;

&lt;p&gt;Install Lynkr globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now create a minimal &lt;code&gt;.env&lt;/code&gt; for Lynkr.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example A: local testing with Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;OLLAMA_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434
&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:latest
&lt;span class="nv"&gt;FALLBACK_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
&lt;/span&gt;&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;span class="nv"&gt;PROMPT_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;SEMANTIC_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start Lynkr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example B: cloud setup with OpenRouter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter
&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_openrouter_key
&lt;span class="nv"&gt;FALLBACK_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
&lt;/span&gt;&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;span class="nv"&gt;PROMPT_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;SEMANTIC_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Optional direct smoke test against Lynkr
&lt;/h3&gt;

&lt;p&gt;Before touching LibreChat, you can test Lynkr directly with a simple OpenAI-compatible request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8081/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer dummy-key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that succeeds, the gateway path is working before LibreChat is added on top.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick health check
&lt;/h3&gt;

&lt;p&gt;Once Lynkr is running, verify the endpoint responds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:8081/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a JSON response showing the service is running.&lt;/p&gt;

&lt;p&gt;At this point, Lynkr should expose an OpenAI-compatible base URL at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8081/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the URL LibreChat should talk to.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Decide how LibreChat should see Lynkr
&lt;/h2&gt;

&lt;p&gt;LibreChat supports custom endpoints and also lets users provide a custom &lt;code&gt;baseURL&lt;/code&gt; for supported endpoint flows.&lt;/p&gt;

&lt;p&gt;For this setup, the important part is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LibreChat should send requests to &lt;strong&gt;Lynkr’s &lt;code&gt;/v1&lt;/code&gt; base URL&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;LibreChat should use a model name that Lynkr will accept and route&lt;/li&gt;
&lt;li&gt;LibreChat should not need to know which upstream provider you finally use behind Lynkr&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the mental model is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LibreChat config → one base URL → Lynkr
Lynkr config → actual providers, routing, fallback, cache
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That separation is the whole point.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Add Lynkr as the LibreChat custom endpoint
&lt;/h2&gt;

&lt;p&gt;The exact UI path can differ depending on how you run LibreChat and how you expose custom endpoints, but the working shape is the same.&lt;/p&gt;

&lt;p&gt;In LibreChat, configure a custom OpenAI-compatible endpoint with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Base URL:&lt;/strong&gt; &lt;code&gt;http://localhost:8081/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API key:&lt;/strong&gt; any value Lynkr accepts for your setup, or the key your deployment expects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; a model string that Lynkr can map or forward&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Minimal example values
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Endpoint Type: Custom OpenAI-compatible endpoint
Base URL: http://localhost:8081/v1
API Key: dummy-key
Model: gpt-4o-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your LibreChat instance and Lynkr instance run on different hosts, replace &lt;code&gt;localhost&lt;/code&gt; with the actual reachable host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://lynkr.internal:8081/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://lynkr.yourdomain.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are running LibreChat in Docker and Lynkr on the host machine, you may need to use a host-reachable name rather than &lt;code&gt;localhost&lt;/code&gt;, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://host.docker.internal:8081/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s one of the most common gotchas.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Pick a model name strategy
&lt;/h2&gt;

&lt;p&gt;This part trips people up more than it should.&lt;/p&gt;

&lt;p&gt;LibreChat wants a model name. Lynkr also needs to know what to do with that model name.&lt;/p&gt;

&lt;p&gt;There are two clean ways to handle this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1: pass through a real model name
&lt;/h3&gt;

&lt;p&gt;Use a model name that corresponds to the backend you want Lynkr to use.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gpt-4o-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-3-5-sonnet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the simplest starting point if Lynkr is forwarding traffic in a straightforward way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: use stable logical model names in LibreChat
&lt;/h3&gt;

&lt;p&gt;A better long-term pattern is to let LibreChat use a stable name like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chat-fast
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chat-quality
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then map those choices in the gateway layer.&lt;/p&gt;

&lt;p&gt;That way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LibreChat users keep the same model choices&lt;/li&gt;
&lt;li&gt;Lynkr can change the real backend later&lt;/li&gt;
&lt;li&gt;you can move from OpenRouter to Bedrock or from cloud to local without rewriting the app-side model menu&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even if you do not formalize that on day one, this is the direction I’d recommend.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Run a smoke test from LibreChat
&lt;/h2&gt;

&lt;p&gt;Once the endpoint is configured, test a simple chat request first.&lt;/p&gt;

&lt;p&gt;Use something cheap and easy to inspect, like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Write a Python function that reverses a string.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If that works, test a second request that makes backend behavior easier to reason about, like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Summarize the difference between Redis and PostgreSQL in 5 bullets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What you’re looking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LibreChat sends successfully to Lynkr&lt;/li&gt;
&lt;li&gt;Lynkr forwards successfully to the configured provider&lt;/li&gt;
&lt;li&gt;the response comes back normally in LibreChat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this works, the base integration is done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Add one routing policy behind Lynkr
&lt;/h2&gt;

&lt;p&gt;This is where the setup becomes more useful than direct provider wiring.&lt;/p&gt;

&lt;p&gt;A good first pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route lightweight general chat to a cheaper model&lt;/li&gt;
&lt;li&gt;route harder reasoning or code-heavy work to a stronger model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Conceptually, the setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LibreChat user chooses: chat-fast
  → Lynkr routes to cheaper tier

LibreChat user chooses: chat-quality
  → Lynkr routes to stronger tier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Even without exposing every provider in LibreChat, you still get flexibility behind the gateway.&lt;/p&gt;

&lt;p&gt;That’s cleaner than giving end users six raw vendor choices and expecting them to know when each one is appropriate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 7: Add fallback once the happy path works
&lt;/h2&gt;

&lt;p&gt;Do not start with fallback complexity on the first try. Get one provider working first.&lt;/p&gt;

&lt;p&gt;After that, the next useful improvement is fallback.&lt;/p&gt;

&lt;p&gt;Example pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;primary backend: OpenRouter&lt;/li&gt;
&lt;li&gt;fallback backend: Bedrock&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;or&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;primary backend: local Ollama model&lt;/li&gt;
&lt;li&gt;fallback backend: stronger cloud model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you a much better operational story than “LibreChat is hardwired to one provider and breaks when that provider has a bad day.”&lt;/p&gt;

&lt;p&gt;That’s one of the clearest reasons to keep failover logic below the app layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  A concrete request flow example
&lt;/h2&gt;

&lt;p&gt;Here’s what this looks like in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Without Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LibreChat → OpenAI directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want to change providers, add fallback, or introduce routing, those concerns start leaking into the app layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  With Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LibreChat → Lynkr → OpenRouter
                    ↘ Bedrock fallback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LibreChat stays pointed at one endpoint.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;less config churn in the app&lt;/li&gt;
&lt;li&gt;easier backend changes&lt;/li&gt;
&lt;li&gt;cleaner rollout of new models&lt;/li&gt;
&lt;li&gt;a better place to add caching and routing&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common gotchas
&lt;/h2&gt;

&lt;p&gt;Here are the ones most likely to waste your time.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Wrong base URL
&lt;/h2&gt;

&lt;p&gt;If you point LibreChat at:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:8081/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;you may hit endpoint mismatches depending on how the OpenAI-compatible client path is built.&lt;/p&gt;

&lt;p&gt;Use the &lt;code&gt;/v1&lt;/code&gt; base URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Docker networking confusion
&lt;/h2&gt;

&lt;p&gt;If LibreChat runs in Docker, &lt;code&gt;localhost&lt;/code&gt; usually means &lt;strong&gt;the container itself&lt;/strong&gt;, not your host machine.&lt;/p&gt;

&lt;p&gt;Use a network-reachable host such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://host.docker.internal:8081/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or a proper internal hostname.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Model name mismatch
&lt;/h2&gt;

&lt;p&gt;If LibreChat sends a model string Lynkr does not recognize or route correctly, requests will fail even though the endpoint is reachable.&lt;/p&gt;

&lt;p&gt;When debugging, simplify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pick one known model string&lt;/li&gt;
&lt;li&gt;use one provider&lt;/li&gt;
&lt;li&gt;get one request working&lt;/li&gt;
&lt;li&gt;only then layer routing or aliases on top&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Starting with too much complexity
&lt;/h2&gt;

&lt;p&gt;Don’t try to validate all of these at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;custom endpoint&lt;/li&gt;
&lt;li&gt;multiple providers&lt;/li&gt;
&lt;li&gt;fallback&lt;/li&gt;
&lt;li&gt;caching&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;agents&lt;/li&gt;
&lt;li&gt;MCP tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Get the base chat completion path working first.&lt;/p&gt;

&lt;p&gt;Then expand.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Treating LibreChat like the provider control plane
&lt;/h2&gt;

&lt;p&gt;LibreChat is excellent at the user/product layer.&lt;/p&gt;

&lt;p&gt;But if you keep using the app layer to own provider switching, cost control, and failover, you lose the main architectural benefit of using a gateway underneath it.&lt;/p&gt;

&lt;p&gt;Keep the responsibilities split.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why this setup is worth it
&lt;/h2&gt;

&lt;p&gt;Even the minimal version gives you a better foundation than direct provider wiring.&lt;/p&gt;

&lt;p&gt;You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;one stable endpoint&lt;/strong&gt; in LibreChat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;provider portability&lt;/strong&gt; behind the scenes&lt;/li&gt;
&lt;li&gt;a better place for &lt;strong&gt;routing&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a better place for &lt;strong&gt;fallback&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a better place for &lt;strong&gt;caching&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a cleaner path from simple chat UI to broader agent infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And as your setup grows, that separation only gets more valuable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Minimal checklist
&lt;/h2&gt;

&lt;p&gt;If you just want the shortest possible version, this is it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Lynkr&lt;/li&gt;
&lt;li&gt;Start Lynkr on port &lt;code&gt;8081&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Verify &lt;code&gt;http://localhost:8081/v1&lt;/code&gt; is reachable&lt;/li&gt;
&lt;li&gt;In LibreChat, add a custom OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;Set the base URL to &lt;code&gt;http://localhost:8081/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use a model string Lynkr can route&lt;/li&gt;
&lt;li&gt;Run one smoke-test prompt&lt;/li&gt;
&lt;li&gt;Only after that, add routing and fallback&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;LibreChat already gives you the hard part: a good open-source surface for chat, agents, MCP, and self-hosting.&lt;/p&gt;

&lt;p&gt;Lynkr gives you the missing infrastructure layer under it.&lt;/p&gt;

&lt;p&gt;That combination is stronger than pushing all model concerns into the app itself.&lt;/p&gt;

&lt;p&gt;If you’re building a self-hosted AI stack that needs to survive provider churn, model changes, and growing workflow complexity, this is the shape I’d use.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LibreChat:&lt;/strong&gt; &lt;a href="https://github.com/danny-avila/LibreChat" rel="noopener noreferrer"&gt;github.com/danny-avila/LibreChat&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this article was useful, star the repos and let me know if you want the next one to go deeper on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;LibreChat + Lynkr with Docker Compose&lt;/li&gt;
&lt;li&gt;model aliases and routing strategy&lt;/li&gt;
&lt;li&gt;using LibreChat agents on top of a Lynkr-backed stack&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Cut Microsoft Agent Framework Costs With a Gateway Layer</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sun, 14 Jun 2026 09:19:54 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-cut-microsoft-agent-framework-costs-with-a-gateway-layer-5gke</link>
      <guid>https://dev.to/lynkr/how-to-cut-microsoft-agent-framework-costs-with-a-gateway-layer-5gke</guid>
      <description>&lt;p&gt;Microsoft Agent Framework is built for production multi-agent systems, which is exactly why its LLM bill can grow faster than expected. If you are running workflows with retries, handoffs, tools, and checkpoints, the easiest savings do not come from prompting harder — they come from adding a gateway layer under the framework.&lt;/p&gt;

&lt;p&gt;I built Lynkr, so obvious founder disclosure: this article uses Lynkr as the gateway example. I’ll keep it practical and focus on where the cost actually shows up in Microsoft Agent Framework workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a real Microsoft Agent Framework problem
&lt;/h2&gt;

&lt;p&gt;The current Microsoft Agent Framework README positions it as a production-grade framework for Python and .NET, with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-agent workflows&lt;/li&gt;
&lt;li&gt;sequential, concurrent, handoff, and group collaboration patterns&lt;/li&gt;
&lt;li&gt;middleware&lt;/li&gt;
&lt;li&gt;observability&lt;/li&gt;
&lt;li&gt;provider flexibility&lt;/li&gt;
&lt;li&gt;checkpointing and human-in-the-loop flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of stack where token usage grows quietly.&lt;/p&gt;

&lt;p&gt;A single prompt-response app is easy to reason about. A production workflow is not. Once you add routing, retries, multiple agents, MCP tools, and long-lived execution state, the same context starts getting resent over and over.&lt;/p&gt;

&lt;p&gt;That creates four predictable cost leaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the spend comes from in Microsoft Agent Framework workloads
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Repeated shared context across agents
&lt;/h3&gt;

&lt;p&gt;Multi-agent systems reuse a lot of the same context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task instructions&lt;/li&gt;
&lt;li&gt;tool definitions&lt;/li&gt;
&lt;li&gt;previous messages&lt;/li&gt;
&lt;li&gt;workflow state&lt;/li&gt;
&lt;li&gt;grounding context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even when the framework orchestrates cleanly, the model provider still sees repeated input tokens.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tool-heavy steps explode prompt size
&lt;/h3&gt;

&lt;p&gt;Once agents start using tools, responses stop looking like simple chat. You get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;search results&lt;/li&gt;
&lt;li&gt;file reads&lt;/li&gt;
&lt;li&gt;JSON blobs&lt;/li&gt;
&lt;li&gt;browser outputs&lt;/li&gt;
&lt;li&gt;execution traces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those payloads are often much larger than the user’s actual request.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Every task does not need the same model
&lt;/h3&gt;

&lt;p&gt;A workflow step that says “classify this,” “summarize these logs,” or “extract the next action” does not need the same model as “resolve a hard bug across four files.”&lt;/p&gt;

&lt;p&gt;Without a routing layer, teams overpay by sending too much easy work to premium models.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Retries and loops multiply waste
&lt;/h3&gt;

&lt;p&gt;Production agent systems do retries, fallbacks, approvals, and re-runs. That is good engineering. It is also how token bills get weird at the end of the month.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gateway pattern that fits Microsoft Agent Framework
&lt;/h2&gt;

&lt;p&gt;The cleanest setup is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Microsoft Agent Framework app
        ↓
     Lynkr gateway
        ↓
OpenAI / Azure OpenAI / Bedrock / OpenRouter / Ollama / Databricks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The framework keeps doing orchestration. The gateway handles cost control under it.&lt;/p&gt;

&lt;p&gt;That split matters because you do &lt;strong&gt;not&lt;/strong&gt; want cost logic duplicated across every agent, every workflow node, and every environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynkr changes underneath the framework
&lt;/h2&gt;

&lt;p&gt;Lynkr is a self-hosted LLM gateway for Claude Code, Cursor, Codex, and general OpenAI-compatible workloads. In the current README and benchmark report, the grounded claims I can safely use here are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;version &lt;code&gt;9.5.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;13+ providers&lt;/li&gt;
&lt;li&gt;zero code changes at the app layer once the base URL points at the gateway&lt;/li&gt;
&lt;li&gt;benchmarked token reductions from smart tool selection and JSON compression&lt;/li&gt;
&lt;li&gt;semantic cache hits around 171ms in the published benchmark report&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The part that makes it useful for Microsoft Agent Framework is not “one more abstraction layer.” It is that the framework keeps its orchestration role while the gateway centralizes the three cost levers that matter most.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prompt and semantic caching
&lt;/h3&gt;

&lt;p&gt;Agent workflows repeat themselves more than most teams realize.&lt;/p&gt;

&lt;p&gt;A classification step comes back with the same shape.&lt;br&gt;
A retry asks nearly the same thing again.&lt;br&gt;
A second agent gets almost the same upstream context.&lt;br&gt;
A human-in-the-loop resume often replays the same state plus one decision.&lt;/p&gt;

&lt;p&gt;Caching is how you stop paying full price for near-duplicate work.&lt;/p&gt;

&lt;p&gt;In Lynkr’s published benchmark report, semantic cache hits returned in &lt;strong&gt;171ms&lt;/strong&gt;. That speed matters in production workflows because lower latency compounds with lower spend.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tool payload compression
&lt;/h3&gt;

&lt;p&gt;This is the least talked-about savings lever, and one of the most useful.&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework makes it easier to build workflows that use tools. But once tools start returning structured output, your bottleneck becomes payload size, not just model choice.&lt;/p&gt;

&lt;p&gt;Lynkr’s benchmark report shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;53% fewer tokens&lt;/strong&gt; on tool-heavy requests through smart tool selection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;87.6% compression&lt;/strong&gt; on large JSON tool results in the benchmarked scenario&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That maps well to framework workloads that push around logs, traces, extracted documents, or structured tool responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tier routing
&lt;/h3&gt;

&lt;p&gt;Not every orchestration step should hit the same model.&lt;/p&gt;

&lt;p&gt;A practical tiering setup looks like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;simple extraction or classification → cheaper fast model&lt;/li&gt;
&lt;li&gt;normal agent work → balanced model&lt;/li&gt;
&lt;li&gt;deep reasoning or hard refactors → premium model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between “we support multiple providers” and “we actively spend less.”&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework already gives you the orchestration surface. A gateway adds the policy layer under it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete use case: customer support triage agents
&lt;/h2&gt;

&lt;p&gt;This is the use case I think is under-covered and a very good fit for Lynkr.&lt;/p&gt;

&lt;p&gt;Imagine a support workflow built with Microsoft Agent Framework:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;ingest a new ticket&lt;/li&gt;
&lt;li&gt;classify product area and urgency&lt;/li&gt;
&lt;li&gt;summarize the issue&lt;/li&gt;
&lt;li&gt;search internal docs or run retrieval&lt;/li&gt;
&lt;li&gt;draft a response&lt;/li&gt;
&lt;li&gt;escalate only ambiguous or risky cases to a stronger model or a human&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most of those steps are &lt;strong&gt;not&lt;/strong&gt; equally hard.&lt;/p&gt;

&lt;p&gt;If every one of them uses the same premium model, you pay premium price for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;deduplication&lt;/li&gt;
&lt;li&gt;templated summaries&lt;/li&gt;
&lt;li&gt;known-answer lookups&lt;/li&gt;
&lt;li&gt;low-risk drafts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly where a gateway helps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why this works especially well
&lt;/h3&gt;

&lt;p&gt;Support triage has all three patterns a gateway can optimize:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated ticket shapes → cacheable&lt;/li&gt;
&lt;li&gt;structured tool results from retrieval/search → compressible&lt;/li&gt;
&lt;li&gt;mixed difficulty across workflow steps → routable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of baking cost logic into each agent, you let the framework orchestrate and let the gateway decide how expensive each turn should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dummy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8081/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Your Microsoft Agent Framework components can keep using an OpenAI-compatible endpoint
# while Lynkr handles routing, caching, and payload optimization underneath.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The point is not this exact snippet. The point is the boundary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your agents keep their workflow logic&lt;/li&gt;
&lt;li&gt;your framework keeps orchestration&lt;/li&gt;
&lt;li&gt;your gateway handles provider choice, caching, and token reduction&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I would route differently in this workload
&lt;/h2&gt;

&lt;p&gt;If I were wiring Microsoft Agent Framework for support triage, I would usually do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ticket classification&lt;/strong&gt; → cheap fast model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FAQ / known-issue matching&lt;/strong&gt; → cheap fast model plus cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;retrieval-grounded answer draft&lt;/strong&gt; → mid-tier model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;escalation for ambiguous, legal, or high-risk cases&lt;/strong&gt; → strongest model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;repeat follow-up questions on the same issue&lt;/strong&gt; → let cache catch them where possible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much stronger operating model than “default everything to the best model and hope prompt engineering saves us later.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Where competitors can still win
&lt;/h2&gt;

&lt;p&gt;Fairness note: if your top priority is enterprise dashboards, centralized governance, or deeper out-of-the-box observability, other gateway products can be stronger on those axes.&lt;/p&gt;

&lt;p&gt;But for Microsoft Agent Framework teams trying to reduce the cost of agentic workloads without rewriting the app, the combination I care about is simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keep the framework for orchestration&lt;/li&gt;
&lt;li&gt;insert a gateway once&lt;/li&gt;
&lt;li&gt;let caching, compression, and tier routing do the cost work globally&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The practical takeaway
&lt;/h2&gt;

&lt;p&gt;Microsoft Agent Framework makes it easier to build serious agent systems. That also means it makes it easier to accidentally overpay for them.&lt;/p&gt;

&lt;p&gt;The underused pattern is not “choose a cheaper model.” It is putting a gateway layer under the framework so repeated context, oversized tool payloads, and easy workflow steps stop being billed like hard reasoning.&lt;/p&gt;

&lt;p&gt;That is the real use case for Lynkr here: &lt;strong&gt;production multi-agent workflows where the waste comes from orchestration overhead, not just model price.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want, I can write a follow-up with a full Microsoft Agent Framework example using a support triage workflow and a concrete Lynkr routing setup.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>LiteLLM vs Lynkr for AI Coding Workflows: Where the Token Savings Actually Come From</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Wed, 10 Jun 2026 20:58:28 +0000</pubDate>
      <link>https://dev.to/lynkr/litellm-vs-lynkr-for-ai-coding-workflows-where-the-token-savings-actually-come-from-1482</link>
      <guid>https://dev.to/lynkr/litellm-vs-lynkr-for-ai-coding-workflows-where-the-token-savings-actually-come-from-1482</guid>
      <description>&lt;p&gt;Most LLM gateways promise the same thing: one endpoint, many providers. That part is useful, but it is not where the real savings come from in AI coding workflows.&lt;/p&gt;

&lt;p&gt;The expensive part is what happens inside repeated coding sessions: oversized tool schemas, large JSON tool results, repeated context, and using expensive models for turns that do not need them.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt;, so take this as a founder comparison. I’ll keep it honest: LiteLLM is a solid provider abstraction layer. But if your goal is specifically to reduce spend in Claude Code, Cursor, or Codex-style workflows, the difference is not “which gateway supports more providers.” The difference is whether the gateway cuts tokens &lt;em&gt;before they reach the model&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with most “gateway savings” claims
&lt;/h2&gt;

&lt;p&gt;There are a few common ways gateways claim to save money:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route to cheaper models&lt;/li&gt;
&lt;li&gt;add fallbacks&lt;/li&gt;
&lt;li&gt;centralize traffic&lt;/li&gt;
&lt;li&gt;track budgets&lt;/li&gt;
&lt;li&gt;cache exact repeated prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of that helps.&lt;/p&gt;

&lt;p&gt;But coding workflows have a different cost shape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the same repo context is sent over and over&lt;/li&gt;
&lt;li&gt;tool definitions balloon every request&lt;/li&gt;
&lt;li&gt;tool outputs can be huge&lt;/li&gt;
&lt;li&gt;not every turn deserves the strongest model&lt;/li&gt;
&lt;li&gt;agent loops magnify small inefficiencies into large bills&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why “multi-provider support” is not enough. You need token reduction at the gateway layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I benchmarked
&lt;/h2&gt;

&lt;p&gt;I recently ran a benchmark comparing Lynkr and LiteLLM on the &lt;strong&gt;same backend providers&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama local&lt;/li&gt;
&lt;li&gt;Moonshot&lt;/li&gt;
&lt;li&gt;Azure OpenAI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The benchmark covered 9 scenarios across 4 feature categories, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool-heavy requests&lt;/li&gt;
&lt;li&gt;large JSON tool outputs&lt;/li&gt;
&lt;li&gt;paraphrased cache hits&lt;/li&gt;
&lt;li&gt;simple vs complex routing decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full report:&lt;br&gt;
&lt;a href="https://github.com/Fast-Editor/Lynkr/blob/main/BENCHMARK_REPORT.md" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr/blob/main/BENCHMARK_REPORT.md&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Smart tool selection: 53% fewer tokens
&lt;/h2&gt;

&lt;p&gt;One of the easiest ways to waste tokens is forwarding every possible tool definition on every request.&lt;/p&gt;

&lt;p&gt;A read-only question does not need write, edit, bash, or git tools. But that still happens in a lot of setups.&lt;/p&gt;

&lt;p&gt;Lynkr classifies the request and strips irrelevant tool schemas before forwarding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark result
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Proxy&lt;/th&gt;
&lt;th&gt;Tokens billed&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr&lt;/td&gt;
&lt;td&gt;959&lt;/td&gt;
&lt;td&gt;$0.0044&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LiteLLM&lt;/td&gt;
&lt;td&gt;2,085&lt;/td&gt;
&lt;td&gt;$0.0091&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Result: 53% fewer tokens, 52% cheaper on the same model and prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That matters because coding sessions are not one-shot prompts. If every turn is carrying unnecessary tool baggage, your costs quietly double.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Large JSON tool results: 87.6% fewer tokens
&lt;/h2&gt;

&lt;p&gt;Another hidden cost is tool output.&lt;/p&gt;

&lt;p&gt;If a bash command, grep, file read, or agent step returns a large structured JSON payload, that payload gets forwarded to the model. And that gets expensive fast.&lt;/p&gt;

&lt;p&gt;Lynkr uses &lt;strong&gt;TOON compression&lt;/strong&gt; for large JSON tool results before sending them upstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark result
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Proxy&lt;/th&gt;
&lt;th&gt;Tokens billed&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr&lt;/td&gt;
&lt;td&gt;427&lt;/td&gt;
&lt;td&gt;$0.009&lt;/td&gt;
&lt;td&gt;12s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LiteLLM&lt;/td&gt;
&lt;td&gt;3,458&lt;/td&gt;
&lt;td&gt;$0.018&lt;/td&gt;
&lt;td&gt;12s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Result: 87.6% compression and 50% cheaper, with the same latency in this benchmark.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the kind of optimization that matters in real agent workflows, because those systems often generate verbose intermediate outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Semantic cache: 171ms responses, 0 billed tokens on cache hit
&lt;/h2&gt;

&lt;p&gt;Exact-match caching is useful, but coding workflows often produce near-duplicate prompts rather than byte-for-byte repeats.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Explain TCP vs UDP”&lt;/li&gt;
&lt;li&gt;“What is the difference between TCP and UDP?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr uses semantic caching, so paraphrased prompts can hit cache too.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark result
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Tokens billed&lt;/th&gt;
&lt;th&gt;Response time&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First call (cold)&lt;/td&gt;
&lt;td&gt;2,857&lt;/td&gt;
&lt;td&gt;1,891ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Second call (paraphrased cache hit)&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;171ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Result: 171ms response time and 0 billed tokens on cache hit.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the kind of win that changes the economics of repeated team usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Tier routing: not every prompt deserves the same model
&lt;/h2&gt;

&lt;p&gt;Routing to the cheapest available model is not the same thing as routing &lt;em&gt;correctly&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If someone asks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“What does git stash do?” → local/free model is fine&lt;/li&gt;
&lt;li&gt;“Design a secure JWT vs cookie architecture for banking auth” → that should escalate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr scores requests across &lt;strong&gt;15 dimensions&lt;/strong&gt; including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token count&lt;/li&gt;
&lt;li&gt;code complexity&lt;/li&gt;
&lt;li&gt;reasoning markers&lt;/li&gt;
&lt;li&gt;risk patterns&lt;/li&gt;
&lt;li&gt;agentic signals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it routes automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark result
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Request&lt;/th&gt;
&lt;th&gt;Lynkr&lt;/th&gt;
&lt;th&gt;LiteLLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“What does git stash do?”&lt;/td&gt;
&lt;td&gt;local/free tier&lt;/td&gt;
&lt;td&gt;local/free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JWT vs cookies security analysis&lt;/td&gt;
&lt;td&gt;cloud model&lt;/td&gt;
&lt;td&gt;cheapest local model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That difference matters. Cheap routing is only good when it is still the right call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monthly cost projection
&lt;/h2&gt;

&lt;p&gt;The benchmark includes a simple cost projection for &lt;strong&gt;100,000 requests/month&lt;/strong&gt; using a tool-heavy agentic workload:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Proxy&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LiteLLM&lt;/td&gt;
&lt;td&gt;~$818&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lynkr&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$409&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;That is roughly 50% cheaper on the same backend.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the key point: if you compare gateways fairly on equal footing, the savings do not come from magic. They come from removing waste before tokens ever hit the provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where LiteLLM is still strong
&lt;/h2&gt;

&lt;p&gt;LiteLLM is still a strong product if your main need is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provider abstraction&lt;/li&gt;
&lt;li&gt;budget controls&lt;/li&gt;
&lt;li&gt;standard proxy behavior&lt;/li&gt;
&lt;li&gt;existing Python-heavy infra&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want a broad proxy layer and do not care much about coding-workflow-specific token optimization, LiteLLM is a reasonable choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr is different
&lt;/h2&gt;

&lt;p&gt;Lynkr is built around AI coding and agent workflows specifically.&lt;/p&gt;

&lt;p&gt;That means it focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;smart tool selection&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TOON compression for large JSON outputs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;semantic cache&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;automatic complexity-based tier routing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MCP integration&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Code Mode&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;long-term memory&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;drop-in compatibility for Claude Code, Cursor, and Codex&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;13+ providers supported&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Mode&lt;/strong&gt; reduces MCP tool-definition overhead by &lt;strong&gt;~96%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 code changes required&lt;/strong&gt; for drop-in integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The real takeaway
&lt;/h2&gt;

&lt;p&gt;If all you want is “many providers behind one API,” a gateway like LiteLLM covers that.&lt;/p&gt;

&lt;p&gt;But if your actual goal is to make AI coding infrastructure materially cheaper, the important question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does the gateway reduce tokens before they reach the model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is where the biggest savings come from.&lt;/p&gt;

&lt;p&gt;For AI coding workflows, the biggest cost levers are usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;removing irrelevant tools&lt;/li&gt;
&lt;li&gt;compressing tool output&lt;/li&gt;
&lt;li&gt;caching semantically similar turns&lt;/li&gt;
&lt;li&gt;routing simple requests to cheap models and escalating only when needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the layer I built Lynkr around.&lt;/p&gt;

&lt;p&gt;If you want to look at the benchmark or try it yourself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Benchmark report: &lt;a href="https://github.com/Fast-Editor/Lynkr/blob/main/BENCHMARK_REPORT.md" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr/blob/main/BENCHMARK_REPORT.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building around Claude Code, Cursor, Codex, or MCP workflows, I’d be curious what your biggest source of token waste has been.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devtools</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How Efficient Model Routing can save upto 80% in AI costs without compromising the quality of the output</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Wed, 10 Jun 2026 04:01:33 +0000</pubDate>
      <link>https://dev.to/lynkr/explainable-llm-routing-is-the-missing-layer-in-agentic-systems-and-why-it-matters-for-lynkr-12ph</link>
      <guid>https://dev.to/lynkr/explainable-llm-routing-is-the-missing-layer-in-agentic-systems-and-why-it-matters-for-lynkr-12ph</guid>
      <description>&lt;p&gt;Why did this workflow get cheaper last week?&lt;/p&gt;

&lt;p&gt;Why did support quality drop after a routing change?&lt;/p&gt;

&lt;p&gt;Was the failure caused by the model, the router, or the task decomposition?&lt;/p&gt;

&lt;p&gt;Most multi-model systems can route for cost. Very few can explain why a task was sent to a specific model, what tradeoff was made, and whether the cheaper path was actually justified.&lt;/p&gt;

&lt;p&gt;That is not just a research gap. It is an operational one.&lt;/p&gt;

&lt;p&gt;Once an agent stack starts making economic decisions on every turn, developers need routing decisions they can inspect, replay, and override. In production, the only layer positioned to provide that is the gateway.&lt;/p&gt;

&lt;p&gt;I went through the paper &lt;em&gt;Explainable Model Routing for Agentic Workflows&lt;/em&gt; (&lt;a href="https://arxiv.org/abs/2604.03527v1" rel="noopener noreferrer"&gt;arXiv:2604.03527&lt;/a&gt;). It introduces &lt;strong&gt;Topaz&lt;/strong&gt;, a routing framework built around a useful idea: model routing should be interpretable by humans, not just optimized in the background.&lt;/p&gt;

&lt;p&gt;That matters because explainable routing is only valuable if it is attached to the layer that actually sees the real levers in production: cost, quality sensitivity, cache behavior, fallback paths, provider performance, and per-step policy decisions.&lt;/p&gt;

&lt;p&gt;That layer is the gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Topaz in one minute
&lt;/h2&gt;

&lt;p&gt;Topaz keeps the core routing loop simple and interpretable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Skill-based model profiles&lt;/strong&gt;: models are represented through capabilities like logic, code generation, tool use, factual knowledge, writing quality, instruction following, and summarization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit cost-quality optimization&lt;/strong&gt;: routing decisions are made through visible optimization logic instead of opaque heuristics alone.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer-facing explanations&lt;/strong&gt;: the system turns those decisions into plain-language reasoning a human can audit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the right direction. A routed system is only trustworthy if a developer can tell the difference between intelligent specialization and silent quality regression.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real production takeaway
&lt;/h2&gt;

&lt;p&gt;The paper is framed as a routing contribution, but the more important implication is where explainability has to live in practice.&lt;/p&gt;

&lt;p&gt;A router can score tasks. A gateway can explain the system.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;The gateway is the only layer with enough visibility to answer the questions teams actually ask after launch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which provider and model handled each step?&lt;/li&gt;
&lt;li&gt;did the system downgrade because the task was low risk or because the budget threshold fired?&lt;/li&gt;
&lt;li&gt;was there a cache hit or miss?&lt;/li&gt;
&lt;li&gt;did the request escalate because of tool complexity?&lt;/li&gt;
&lt;li&gt;did a fallback trigger because of timeout, rate limit, or policy?&lt;/li&gt;
&lt;li&gt;which step is safe to replay under a different routing policy?&lt;/li&gt;
&lt;li&gt;which user-visible step should be pinned to a stronger model no matter what?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If explainability stops at “the router chose model B because skill-match was 0.81,” it is not enough.&lt;/p&gt;

&lt;p&gt;In production, teams need a trace they can debug.&lt;/p&gt;

&lt;p&gt;They need to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;what happened&lt;/li&gt;
&lt;li&gt;why it happened&lt;/li&gt;
&lt;li&gt;what it cost&lt;/li&gt;
&lt;li&gt;what would have happened under a different policy&lt;/li&gt;
&lt;li&gt;what should be overridden next time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is gateway territory.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example
&lt;/h2&gt;

&lt;p&gt;Take a simple support workflow with four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Classify the incoming issue&lt;/strong&gt; → cheap model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate a fix plan&lt;/strong&gt; → strong reasoning model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute tool-heavy actions&lt;/strong&gt; → model optimized for tool use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write the final customer-facing response&lt;/strong&gt; → premium model&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A production-grade explanation layer should not just say “the system routed efficiently.” It should explain each step in operational terms.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue classification&lt;/strong&gt;: routed to a cheaper model because quality sensitivity was low and the task profile was narrow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix planning&lt;/strong&gt;: escalated because the task required stronger reasoning and a downgrade increased regression risk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-heavy execution&lt;/strong&gt;: assigned to a tool-optimized model because the step depended on multiple tool calls and fallback risk was higher on weaker models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Final response&lt;/strong&gt;: pinned to a premium model because it was user-visible and policy disallowed aggressive downgrades&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback event&lt;/strong&gt;: rerouted after timeout or rate-limit threshold was hit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost note&lt;/strong&gt;: cache miss on shared context increased input cost for this run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the kind of explanation developers can work with.&lt;/p&gt;

&lt;p&gt;It tells them whether the system behaved correctly, where cost increased, where quality was protected, and what policy they may want to change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing alone is not enough
&lt;/h2&gt;

&lt;p&gt;Routing is only one part of the cost stack.&lt;/p&gt;

&lt;p&gt;For real agent and coding workflows, the bigger savings usually come from three levers working together at the gateway layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prompt caching
&lt;/h3&gt;

&lt;p&gt;A lot of agent loops resend the same long context: repo maps, attached files, prior tool traces, or repeated instructions.&lt;/p&gt;

&lt;p&gt;If the gateway can preserve or inject provider-side caching correctly, it cuts repeated input cost before routing even starts.&lt;/p&gt;

&lt;p&gt;Without gateway visibility, teams cannot explain whether a run was cheaper because the router made a better choice or because the system got a cache hit.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tier routing
&lt;/h3&gt;

&lt;p&gt;Not every step deserves the expensive model.&lt;/p&gt;

&lt;p&gt;Low-risk classification, formatting, and shallow transformations can route down. Hard reasoning, recovery paths, and user-visible outputs should stay higher.&lt;/p&gt;

&lt;p&gt;But those choices need replay and override. A team has to be able to inspect a downgrade decision and say: this was safe, this was too aggressive, this customer-facing step should never go below tier X.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Tool-flow compression
&lt;/h3&gt;

&lt;p&gt;In agent systems, the tool loop itself becomes expensive. Every extra round trip can resend context, increase latency, and amplify token waste.&lt;/p&gt;

&lt;p&gt;That is why patterns like MCP Code Mode matter. Compressing tool-heavy work into fewer round trips changes the economics of the whole system.&lt;/p&gt;

&lt;p&gt;Again, the gateway is where that becomes observable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;round-trip count&lt;/li&gt;
&lt;li&gt;tool-heavy vs plain completion flow&lt;/li&gt;
&lt;li&gt;token growth across steps&lt;/li&gt;
&lt;li&gt;fallback behavior during execution&lt;/li&gt;
&lt;li&gt;total cost deltas after policy changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why explainable routing belongs next to gateway observability, not as a thin layer on top of a black-box router.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skepticism this space needs
&lt;/h2&gt;

&lt;p&gt;There is a real failure mode here: “explainable routing” can turn into theater.&lt;/p&gt;

&lt;p&gt;A few reasons to be skeptical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;skill taxonomies drift&lt;/strong&gt;: the categories used to profile models can stop matching real workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;explanations can become post-hoc&lt;/strong&gt;: a clean trace is useless if it is not faithful to the actual decision path&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;quality sensitivity is hard to label&lt;/strong&gt;: teams often underestimate which steps are truly user-visible or regression-sensitive&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pretty traces are not enough&lt;/strong&gt;: developers need replay, policy override, and audit logs, not just a narrative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why the standard should be higher.&lt;/p&gt;

&lt;p&gt;An explanation system should be judged on whether it helps a team debug regressions, justify cost changes, and safely tighten routing policy over time.&lt;/p&gt;

&lt;p&gt;If it cannot support replay and override, it is not operationally complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters for Lynkr
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt;, so the obvious disclosure is that I read Topaz through the lens of what an LLM gateway should expose in production.&lt;/p&gt;

&lt;p&gt;The core idea is straightforward: the gateway is where cost, quality, fallback, caching, and provider behavior meet. That makes it the natural home for explainable routing.&lt;/p&gt;

&lt;p&gt;For Lynkr specifically, that means explainability should connect to the things that actually drive outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provider/model selection&lt;/li&gt;
&lt;li&gt;prompt caching behavior&lt;/li&gt;
&lt;li&gt;tier routing policy&lt;/li&gt;
&lt;li&gt;tool-heavy vs standard completion paths&lt;/li&gt;
&lt;li&gt;fallback events&lt;/li&gt;
&lt;li&gt;cache hit/miss impact&lt;/li&gt;
&lt;li&gt;downgrade risk on user-visible steps&lt;/li&gt;
&lt;li&gt;replay and override of routing decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is also why routing by itself is not enough.&lt;/p&gt;

&lt;p&gt;The real win is stacking levers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt caching to cut repeated input cost&lt;/li&gt;
&lt;li&gt;tier routing to reserve premium models for the steps that justify them&lt;/li&gt;
&lt;li&gt;tool-flow compression to reduce waste across agent loops&lt;/li&gt;
&lt;li&gt;observability strong enough to explain where savings came from and where quality risk entered the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between “we routed to a cheaper model” and “we know exactly why this workflow cost less, where the risk moved, and which policy we want to change next.”&lt;/p&gt;

&lt;h2&gt;
  
  
  The actual shift
&lt;/h2&gt;

&lt;p&gt;The shift is not just from single-model apps to multi-model systems.&lt;/p&gt;

&lt;p&gt;It is from &lt;strong&gt;opaque orchestration&lt;/strong&gt; to &lt;strong&gt;auditable orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Topaz is useful because it pushes routing toward human-interpretable decisions. The stronger takeaway is that explainability belongs at the gateway layer, because that is the only place with enough visibility to audit cost, quality, fallback, caching, and provider behavior across the whole system.&lt;/p&gt;

&lt;p&gt;That is where production routing gets real.&lt;/p&gt;

&lt;p&gt;If you are building multi-model or agentic systems, this is the right question to ask next:&lt;/p&gt;

&lt;p&gt;not just &lt;em&gt;can the system route?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;but &lt;em&gt;can the system explain, replay, and override the route when something breaks?&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paper: &lt;a href="https://arxiv.org/abs/2604.03527v1" rel="noopener noreferrer"&gt;Explainable Model Routing for Agentic Workflows&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want, I can next turn this into a stronger LinkedIn post or write the follow-up piece on what explainable routing looks like for coding agents specifically.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Make PydanticAI Agents Cheaper with Lynkr</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Tue, 09 Jun 2026 04:59:04 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-make-pydanticai-agents-cheaper-with-lynkr-285m</link>
      <guid>https://dev.to/lynkr/how-to-make-pydanticai-agents-cheaper-with-lynkr-285m</guid>
      <description>&lt;p&gt;PydanticAI is one of the cleanest ways to build structured LLM agents in Python. But once those agents start doing real work — tool calls, validation retries, structured outputs, and multi-step flows — the token bill climbs faster than most teams expect.&lt;/p&gt;

&lt;p&gt;Lynkr fits underneath that stack as an &lt;strong&gt;LLM gateway&lt;/strong&gt;. It does not replace PydanticAI. It makes the model layer under it cheaper and easier to control with tier routing, prompt caching, and provider flexibility.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Founder disclosure: I built Lynkr, so take that into account. I’ll keep this practical and focus on where the fit is real.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why PydanticAI is compelling in the first place
&lt;/h2&gt;

&lt;p&gt;I spent time going through PydanticAI because it solves a problem a lot of Python agent frameworks make messy: keeping agent code structured without giving up flexibility.&lt;/p&gt;

&lt;p&gt;What stood out to me is that PydanticAI is built around the same things Python teams already care about in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;typed agents&lt;/li&gt;
&lt;li&gt;structured outputs&lt;/li&gt;
&lt;li&gt;dependency injection&lt;/li&gt;
&lt;li&gt;tool calling&lt;/li&gt;
&lt;li&gt;model/provider flexibility&lt;/li&gt;
&lt;li&gt;observability and eval-friendly workflows&lt;/li&gt;
&lt;li&gt;graph support for more complex control flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo positions it as a production-grade Python agent framework, and that shows up quickly in the design. The README emphasizes model-agnostic support across OpenAI, Anthropic, Gemini, Bedrock, Ollama, Groq, OpenRouter, LiteLLM, and more. It also leans heavily into typed outputs, MCP integration, durable execution, and validation-driven retries.&lt;/p&gt;

&lt;p&gt;That combination makes PydanticAI attractive for teams that want agent workflows to feel more like real Python systems and less like prompt spaghetti.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the token spend starts to leak
&lt;/h2&gt;

&lt;p&gt;The part that matters economically is not whether the framework is good. PydanticAI is good.&lt;/p&gt;

&lt;p&gt;The problem is that good structure does not automatically mean cheap execution.&lt;/p&gt;

&lt;p&gt;In practice, cost starts leaking in a few predictable places:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated system instructions across multiple runs&lt;/li&gt;
&lt;li&gt;the same output schema getting sent over and over&lt;/li&gt;
&lt;li&gt;validation failures triggering retries&lt;/li&gt;
&lt;li&gt;tools being selected or called in multiple rounds&lt;/li&gt;
&lt;li&gt;expensive models getting used for easy intermediate steps&lt;/li&gt;
&lt;li&gt;long workflows carrying too much repeated context forward&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PydanticAI’s strengths can actually make this more visible.&lt;/p&gt;

&lt;p&gt;If you use typed outputs, the model may need another pass when validation fails.&lt;br&gt;
If you use tools, there can be multiple model turns around those tools.&lt;br&gt;
If you use graphs or longer agent flows, repeated context starts compounding.&lt;br&gt;
If you keep one premium model as the default for everything, simple steps inherit premium-model pricing for no good reason.&lt;/p&gt;

&lt;p&gt;None of that is a PydanticAI flaw. It is just what happens when a framework makes it easier to build richer agent workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr fits
&lt;/h2&gt;

&lt;p&gt;The right way to understand Lynkr here is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;PydanticAI stays the application layer&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lynkr becomes the gateway layer underneath it&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means your Python agent logic does not need to become a mess of provider-specific conditionals just to get better economics.&lt;/p&gt;

&lt;p&gt;You keep using PydanticAI for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agent structure&lt;/li&gt;
&lt;li&gt;typed outputs&lt;/li&gt;
&lt;li&gt;tools&lt;/li&gt;
&lt;li&gt;graphs&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;application logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And you use Lynkr for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model routing&lt;/li&gt;
&lt;li&gt;prompt caching&lt;/li&gt;
&lt;li&gt;provider switching&lt;/li&gt;
&lt;li&gt;centralized cost control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That separation matters because most teams do not want to rebuild their agent code every time they want to try a cheaper provider, add routing, or move one class of requests off an expensive model.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Route easy turns to cheaper models
&lt;/h2&gt;

&lt;p&gt;One of the easiest ways to overspend in agent systems is to treat every turn like frontier reasoning.&lt;/p&gt;

&lt;p&gt;A lot of PydanticAI work is not actually frontier reasoning.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification before the main task&lt;/li&gt;
&lt;li&gt;extraction from predictable text&lt;/li&gt;
&lt;li&gt;tool selection&lt;/li&gt;
&lt;li&gt;formatting into a structured schema&lt;/li&gt;
&lt;li&gt;intermediate planning&lt;/li&gt;
&lt;li&gt;low-risk follow-up steps after a strong first pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those steps often do not need the best model in your stack.&lt;/p&gt;

&lt;p&gt;Lynkr helps by putting routing under the agent, so easier turns can go to cheaper models while harder turns still escalate when they need to.&lt;/p&gt;

&lt;p&gt;That is a much better cost shape than paying premium-model rates for every structured substep just because the app has one default model configured.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Stop paying repeatedly for the same context
&lt;/h2&gt;

&lt;p&gt;This is the biggest recurring waste pattern in real agent systems.&lt;/p&gt;

&lt;p&gt;A PydanticAI workflow often reuses a lot of stable prompt material:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;system instructions&lt;/li&gt;
&lt;li&gt;output schemas&lt;/li&gt;
&lt;li&gt;tool descriptions&lt;/li&gt;
&lt;li&gt;dependency-derived context&lt;/li&gt;
&lt;li&gt;conversation framing that barely changes between turns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If that prompt material is sent again and again, the system keeps paying for mostly the same input.&lt;/p&gt;

&lt;p&gt;This is where Lynkr’s caching layer matters.&lt;/p&gt;

&lt;p&gt;Instead of treating every call as fully fresh, the gateway can cut down repeated prompt spend underneath the workflow. That matters more as the workflow gets longer, as the schema gets larger, or as the tool surface grows.&lt;/p&gt;

&lt;p&gt;For small toy demos, this does not matter much.&lt;br&gt;
For real agent workloads, it matters a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Keep the app stable while changing the economics
&lt;/h2&gt;

&lt;p&gt;One reason teams tolerate waste for too long is that optimizing the stack usually means rewriting too much application code.&lt;/p&gt;

&lt;p&gt;PydanticAI already gives you a clean framework for the agent logic. The useful part of Lynkr is that it lets you change the economics without ripping that logic apart.&lt;/p&gt;

&lt;p&gt;That gives you room to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compare providers more easily&lt;/li&gt;
&lt;li&gt;reduce lock-in&lt;/li&gt;
&lt;li&gt;shift easy steps to cheaper models&lt;/li&gt;
&lt;li&gt;keep premium models for the parts that actually need them&lt;/li&gt;
&lt;li&gt;centralize model behavior across multiple agent workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the win is not just lower cost. It is lower cost &lt;strong&gt;without turning your Python codebase into provider-routing glue&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: structured extraction plus tools
&lt;/h2&gt;

&lt;p&gt;A simple example makes the fit clearer.&lt;/p&gt;

&lt;p&gt;Say you have a PydanticAI workflow that does this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;user submits messy unstructured text&lt;/li&gt;
&lt;li&gt;agent extracts typed fields into a schema&lt;/li&gt;
&lt;li&gt;validation fails on one field and triggers a retry&lt;/li&gt;
&lt;li&gt;agent calls a tool to enrich one part of the result&lt;/li&gt;
&lt;li&gt;final typed response is returned to the app&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is a perfectly reasonable workflow.&lt;/p&gt;

&lt;p&gt;It is also exactly the kind of flow where hidden waste appears:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the schema is repeated&lt;/li&gt;
&lt;li&gt;instructions are repeated&lt;/li&gt;
&lt;li&gt;the retry adds another paid turn&lt;/li&gt;
&lt;li&gt;the tool step adds more model interaction&lt;/li&gt;
&lt;li&gt;the same premium model may be used for all five stages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Under Lynkr, that workflow can be made cheaper in the places that usually do not need the strongest model every time.&lt;/p&gt;

&lt;p&gt;The extraction/classification layer can be routed down.&lt;br&gt;
Repeated prompt material can be cached.&lt;br&gt;
The harder step can still route up if needed.&lt;/p&gt;

&lt;p&gt;That is the real value: not changing what the workflow does, but changing how expensively it gets there.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the integration shape looks like
&lt;/h2&gt;

&lt;p&gt;I am intentionally keeping this part conceptual instead of pretending exact config syntax from memory.&lt;/p&gt;

&lt;p&gt;The practical setup is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PydanticAI points to the Lynkr base URL&lt;/li&gt;
&lt;li&gt;Lynkr handles provider and routing behavior underneath&lt;/li&gt;
&lt;li&gt;your agent code stays mostly the same&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the integration story that matters.&lt;/p&gt;

&lt;p&gt;The point is not “replace your framework.”&lt;br&gt;
The point is “keep your framework, improve the model layer under it.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr does not replace framework-level discipline
&lt;/h2&gt;

&lt;p&gt;This part matters because it is where a lot of gateway writing becomes dishonest.&lt;/p&gt;

&lt;p&gt;Lynkr can cut model cost and make provider switching easier, but it does not fix a badly designed agent workflow.&lt;/p&gt;

&lt;p&gt;If a PydanticAI app is looping too much, retrying too aggressively, or making unnecessary tool calls, those problems still exist. The gateway can reduce the price of those mistakes. It does not remove them.&lt;/p&gt;

&lt;p&gt;What Lynkr helps with is the economics and control layer around the workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route cheaper models to simpler steps&lt;/li&gt;
&lt;li&gt;keep expensive models for the calls that actually need them&lt;/li&gt;
&lt;li&gt;cache repeated work&lt;/li&gt;
&lt;li&gt;avoid getting locked to one provider&lt;/li&gt;
&lt;li&gt;standardize how requests move across providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What it does &lt;strong&gt;not&lt;/strong&gt; do on its own:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redesign weak prompts&lt;/li&gt;
&lt;li&gt;stop bad retry logic&lt;/li&gt;
&lt;li&gt;fix overly chatty agent graphs&lt;/li&gt;
&lt;li&gt;choose the right tool boundaries for your app&lt;/li&gt;
&lt;li&gt;replace evaluation and tracing discipline&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That matters because a lot of agent cost does not come from one expensive call. It comes from repeated mediocre decisions across a workflow.&lt;/p&gt;

&lt;p&gt;PydanticAI is useful because it gives structure to the application layer. Lynkr is useful because it gives control to the model-routing layer. They solve different problems, and they work better together than separately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should care
&lt;/h2&gt;

&lt;p&gt;PydanticAI + Lynkr is a strong fit if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you are running a meaningful number of agent calls&lt;/li&gt;
&lt;li&gt;you want structured workflows in Python&lt;/li&gt;
&lt;li&gt;you care about typed outputs and tool use&lt;/li&gt;
&lt;li&gt;your workflows retry or branch often enough for costs to become visible&lt;/li&gt;
&lt;li&gt;you want provider flexibility without constantly changing application code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing thought
&lt;/h2&gt;

&lt;p&gt;PydanticAI solves the structure problem well. Lynkr helps solve the economics problem underneath it.&lt;/p&gt;

&lt;p&gt;If you are building typed Python agents and starting to notice that retries, tools, and repeated context are quietly inflating cost, this is a very practical combination to test.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are already using PydanticAI, I’d be curious where the spend is showing up first in your workflow.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Run CrewAI With 50% Lower LLM Cost Using Lynkr</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sun, 07 Jun 2026 19:24:01 +0000</pubDate>
      <link>https://dev.to/lynkr/run-crewai-with-50-lower-llm-cost-using-lynkr-4ajh</link>
      <guid>https://dev.to/lynkr/run-crewai-with-50-lower-llm-cost-using-lynkr-4ajh</guid>
      <description>&lt;p&gt;If you are building multi-agent systems in Python, &lt;code&gt;CrewAI&lt;/code&gt; is one of the biggest frameworks you need to know.&lt;/p&gt;

&lt;p&gt;And if your CrewAI workloads are starting to get expensive, the simplest way to control that spend is to put an LLM gateway in front of them instead of wiring every agent directly to one provider.&lt;/p&gt;

&lt;p&gt;In this article, I’ll explain what CrewAI is, why it got so popular, and how to use it with &lt;strong&gt;Lynkr&lt;/strong&gt; so your agents can run with better model routing, caching, and lower cost.&lt;/p&gt;

&lt;p&gt;I built Lynkr, so that part comes with the obvious founder disclosure. Still, CrewAI is worth understanding on its own because it has become one of the main entry points for people building agent systems in Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is CrewAI?
&lt;/h2&gt;

&lt;p&gt;CrewAI is an open-source Python framework for orchestrating multiple AI agents.&lt;/p&gt;

&lt;p&gt;At the time of writing, the GitHub repo has &lt;strong&gt;53k stars&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The project describes itself as a:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Fast and Flexible Multi-Agent Automation Framework&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Its core idea is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define agents with roles and goals&lt;/li&gt;
&lt;li&gt;define tasks&lt;/li&gt;
&lt;li&gt;decide how they collaborate&lt;/li&gt;
&lt;li&gt;run them as a system instead of a single prompt chain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the mental model behind the name &lt;code&gt;CrewAI&lt;/code&gt;: not one agent, but a &lt;strong&gt;crew&lt;/strong&gt; of specialized agents working together.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why CrewAI matters
&lt;/h2&gt;

&lt;p&gt;A lot of agent demos are still just one prompt plus one tool call.&lt;/p&gt;

&lt;p&gt;CrewAI matters because it pushes people toward more structured systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;researcher agent&lt;/li&gt;
&lt;li&gt;writer agent&lt;/li&gt;
&lt;li&gt;reviewer agent&lt;/li&gt;
&lt;li&gt;planner agent&lt;/li&gt;
&lt;li&gt;execution agent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each one can have a different role, context, and tool setup.&lt;/p&gt;

&lt;p&gt;That makes it useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;research pipelines&lt;/li&gt;
&lt;li&gt;content workflows&lt;/li&gt;
&lt;li&gt;internal business automation&lt;/li&gt;
&lt;li&gt;data gathering + summarization flows&lt;/li&gt;
&lt;li&gt;agent handoff patterns&lt;/li&gt;
&lt;li&gt;more production-style orchestration than “just call the model again”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reason it got traction is that it sits in a nice middle ground:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;higher-level than wiring every agent loop yourself&lt;/li&gt;
&lt;li&gt;more concrete than vague "agent platform" marketing&lt;/li&gt;
&lt;li&gt;easy enough for Python developers to start with quickly&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The two big concepts in CrewAI: Crews and Flows
&lt;/h2&gt;

&lt;p&gt;From the current repo README, CrewAI emphasizes two core concepts.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Crews
&lt;/h3&gt;

&lt;p&gt;Crews are teams of agents collaborating with autonomy.&lt;/p&gt;

&lt;p&gt;This is the “multi-agent” part most people think of first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;specialized roles&lt;/li&gt;
&lt;li&gt;role-based collaboration&lt;/li&gt;
&lt;li&gt;delegation&lt;/li&gt;
&lt;li&gt;agents working together toward a result&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Flows
&lt;/h3&gt;

&lt;p&gt;Flows are the more controlled, event-driven side.&lt;/p&gt;

&lt;p&gt;This is where CrewAI becomes more production-friendly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;execution paths&lt;/li&gt;
&lt;li&gt;state management&lt;/li&gt;
&lt;li&gt;conditional logic&lt;/li&gt;
&lt;li&gt;integration with normal Python code&lt;/li&gt;
&lt;li&gt;more deterministic orchestration when you need it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is a big part of the pitch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Crews&lt;/strong&gt; for agent autonomy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flows&lt;/strong&gt; for production control&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why CrewAI gets expensive fast
&lt;/h2&gt;

&lt;p&gt;This part usually becomes obvious after the first real project.&lt;/p&gt;

&lt;p&gt;A single-agent script is one thing.&lt;/p&gt;

&lt;p&gt;A multi-agent system is different.&lt;/p&gt;

&lt;p&gt;Costs grow because you now have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple agents making separate LLM calls&lt;/li&gt;
&lt;li&gt;handoffs between agents&lt;/li&gt;
&lt;li&gt;intermediate summaries&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;reflection/replanning&lt;/li&gt;
&lt;li&gt;tool use across several steps&lt;/li&gt;
&lt;li&gt;repeated context being passed around the system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the problem is not just “what model am I using?”&lt;/p&gt;

&lt;p&gt;It becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;do all agents need the same expensive model?&lt;/li&gt;
&lt;li&gt;should the planner use the same model as the formatter?&lt;/li&gt;
&lt;li&gt;how much repeated context is being resent?&lt;/li&gt;
&lt;li&gt;can simple routing/classification work go to cheaper models?&lt;/li&gt;
&lt;li&gt;can repeated flows benefit from cache hits?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of workload where a gateway layer starts making sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr fits
&lt;/h2&gt;

&lt;p&gt;If CrewAI is the orchestration layer, Lynkr can sit underneath it as the &lt;strong&gt;LLM gateway&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means your architecture becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CrewAI agents / flows
        ↓
      Lynkr
        ↓
Ollama / OpenRouter / Bedrock / OpenAI / Azure / Databricks / others
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of wiring each agent stack directly to one provider, you point your model traffic at one gateway endpoint and let that layer decide what happens next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use Lynkr with CrewAI?
&lt;/h2&gt;

&lt;p&gt;This is the important part.&lt;/p&gt;

&lt;p&gt;The real benefit is &lt;strong&gt;not&lt;/strong&gt; just “use any provider.”&lt;/p&gt;

&lt;p&gt;That is table stakes now.&lt;/p&gt;

&lt;p&gt;The better reason is that Lynkr gives you three strong levers for agent workloads:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prompt caching
&lt;/h3&gt;

&lt;p&gt;Multi-agent systems resend a lot of context.&lt;/p&gt;

&lt;p&gt;That can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;system prompts&lt;/li&gt;
&lt;li&gt;task descriptions&lt;/li&gt;
&lt;li&gt;agent roles and backstories&lt;/li&gt;
&lt;li&gt;previous step context&lt;/li&gt;
&lt;li&gt;the same instructions reused across repeated runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr’s caching layer helps reduce the amount of repeated input you pay for.&lt;/p&gt;

&lt;p&gt;For agent systems, that matters a lot more than it does in one-off chat prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tier routing
&lt;/h3&gt;

&lt;p&gt;Not every step in a CrewAI workflow deserves your strongest model.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;p&gt;Use a cheaper/faster model for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;classification&lt;/li&gt;
&lt;li&gt;routing&lt;/li&gt;
&lt;li&gt;formatting&lt;/li&gt;
&lt;li&gt;deterministic transformation&lt;/li&gt;
&lt;li&gt;simple extraction&lt;/li&gt;
&lt;li&gt;narrow sub-tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a stronger model for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planning&lt;/li&gt;
&lt;li&gt;reasoning-heavy synthesis&lt;/li&gt;
&lt;li&gt;ambiguous task decomposition&lt;/li&gt;
&lt;li&gt;final high-stakes output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly what tier routing is for.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. One stable model endpoint
&lt;/h3&gt;

&lt;p&gt;Once your agents grow from a prototype into a system, you usually want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one model boundary&lt;/li&gt;
&lt;li&gt;one place to switch providers&lt;/li&gt;
&lt;li&gt;one place to add failover&lt;/li&gt;
&lt;li&gt;one place to add policy and cost control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is what a gateway layer gives you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynkr says it does well today
&lt;/h2&gt;

&lt;p&gt;From the current Lynkr README, the main cost/performance claims are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;53% fewer tokens on tool-heavy requests&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;87.6% compression on large JSON tool results&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;171ms semantic cache hits&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;automatic tier routing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;zero code changes at the client boundary once the endpoint is swapped&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those numbers come from coding-tool workloads, not specifically a published CrewAI benchmark.&lt;/p&gt;

&lt;p&gt;So the honest framing is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I am &lt;strong&gt;not&lt;/strong&gt; claiming a public CrewAI benchmark showing exactly 50% lower cost on every workload&lt;/li&gt;
&lt;li&gt;I &lt;strong&gt;am&lt;/strong&gt; saying CrewAI has the exact kind of multi-step agent workload where these levers matter most&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why “50% lower cost” is a fair headline shape for the category, but the actual result will depend on how your CrewAI system is built.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to get started with CrewAI
&lt;/h2&gt;

&lt;p&gt;From the current CrewAI README, installation starts like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install &lt;/span&gt;crewai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you also want the tools extras:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'crewai[tools]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The project also provides a CLI starter for creating a new crew project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;crewai create crew &amp;lt;project_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That scaffolds a project with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;main.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;crew.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agents.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tasks.yaml&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;.env&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So CrewAI is designed to be used as a real project structure, not just a single script.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple mental model for CrewAI code
&lt;/h2&gt;

&lt;p&gt;A better way to think about CrewAI is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;define &lt;strong&gt;who&lt;/strong&gt; each agent is&lt;/li&gt;
&lt;li&gt;define &lt;strong&gt;what&lt;/strong&gt; each task needs done&lt;/li&gt;
&lt;li&gt;define &lt;strong&gt;how&lt;/strong&gt; work moves between agents&lt;/li&gt;
&lt;li&gt;then execute the whole workflow as one coordinated system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the real shift from a normal single-agent app.&lt;/p&gt;

&lt;p&gt;You are not just prompting one model repeatedly.&lt;br&gt;
You are designing a small working system with roles, handoffs, and outputs.&lt;/p&gt;

&lt;p&gt;A minimal conceptual example looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;crewai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Crew&lt;/span&gt;

&lt;span class="n"&gt;researcher&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the best information on a topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are great at gathering relevant details&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;writer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Writer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Turn research into a clear output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;backstory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You write concise, structured summaries&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;research_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the latest browser agent frameworks&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;write_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short technical summary from the research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;writer&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;researcher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;research_task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;write_task&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is not copied from their exact starter file, but it reflects the basic CrewAI model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roles&lt;/li&gt;
&lt;li&gt;tasks&lt;/li&gt;
&lt;li&gt;orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to use CrewAI with Lynkr
&lt;/h2&gt;

&lt;p&gt;The practical pattern is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;install CrewAI&lt;/li&gt;
&lt;li&gt;install and start Lynkr&lt;/li&gt;
&lt;li&gt;point the model calls used by your CrewAI stack at Lynkr instead of directly at one provider&lt;/li&gt;
&lt;li&gt;let Lynkr handle routing/caching/provider flexibility underneath&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  1. Install Lynkr
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Configure Lynkr
&lt;/h2&gt;

&lt;p&gt;A simple cloud-backed setup from the current Lynkr README looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter
&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key
&lt;span class="nv"&gt;FALLBACK_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
&lt;/span&gt;&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;span class="nv"&gt;PROMPT_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;SEMANTIC_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start Lynkr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want local-first testing, Lynkr also supports local backends like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;llama.cpp&lt;/li&gt;
&lt;li&gt;LM Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is useful for CrewAI because some low-value steps can run cheaply or locally, while harder reasoning tasks can still escalate.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Route CrewAI’s model traffic through Lynkr
&lt;/h2&gt;

&lt;p&gt;The exact code depends on which model client you use with CrewAI.&lt;/p&gt;

&lt;p&gt;The architecture is the important part:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CrewAI model client → Lynkr base URL → actual provider(s)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because Lynkr gives you an OpenAI-compatible gateway surface, the integration is most natural when your CrewAI model configuration can target an OpenAI-style endpoint.&lt;/p&gt;

&lt;p&gt;That lets you keep CrewAI as the orchestration layer while Lynkr becomes the control plane for model choice and cost behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better way to think about model assignment in CrewAI
&lt;/h2&gt;

&lt;p&gt;Here is where most teams leave money on the table.&lt;/p&gt;

&lt;p&gt;They do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planner agent → expensive model&lt;/li&gt;
&lt;li&gt;researcher agent → same expensive model&lt;/li&gt;
&lt;li&gt;formatter agent → same expensive model&lt;/li&gt;
&lt;li&gt;reviewer agent → same expensive model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is easy, but wasteful.&lt;/p&gt;

&lt;p&gt;A better shape is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planner → strong reasoning model&lt;/li&gt;
&lt;li&gt;researcher → medium model&lt;/li&gt;
&lt;li&gt;summarizer → medium or cheap model&lt;/li&gt;
&lt;li&gt;formatter → cheap model&lt;/li&gt;
&lt;li&gt;repeated workflows → cached through gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The point is not that every step should be cheap.&lt;/p&gt;

&lt;p&gt;The point is that &lt;strong&gt;different agent roles have different model requirements&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;CrewAI already encourages role specialization.&lt;/p&gt;

&lt;p&gt;Lynkr makes it easier to pair that with &lt;strong&gt;cost specialization&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  A concrete example
&lt;/h2&gt;

&lt;p&gt;Imagine a CrewAI workflow for market research.&lt;/p&gt;

&lt;p&gt;You have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one agent gathering raw sources&lt;/li&gt;
&lt;li&gt;one agent extracting facts&lt;/li&gt;
&lt;li&gt;one agent writing the report&lt;/li&gt;
&lt;li&gt;one agent reviewing for quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a gateway, teams often default to one premium model for all four.&lt;/p&gt;

&lt;p&gt;With Lynkr underneath, the better pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gather/extract → cheaper tier&lt;/li&gt;
&lt;li&gt;writing → medium tier&lt;/li&gt;
&lt;li&gt;review/final reasoning → stronger tier&lt;/li&gt;
&lt;li&gt;repeated report skeleton/context → cache where possible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much more rational cost shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more for CrewAI than normal apps
&lt;/h2&gt;

&lt;p&gt;A normal app may only hit the LLM a few times.&lt;/p&gt;

&lt;p&gt;A CrewAI system can explode the number of calls because the framework is designed around multiple agents and structured orchestration.&lt;/p&gt;

&lt;p&gt;So the value of a gateway grows with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;number of agents&lt;/li&gt;
&lt;li&gt;number of task handoffs&lt;/li&gt;
&lt;li&gt;amount of repeated context&lt;/li&gt;
&lt;li&gt;number of production runs&lt;/li&gt;
&lt;li&gt;number of providers you want to evaluate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why CrewAI is such a good fit for the “put a gateway underneath it” pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynkr does not replace
&lt;/h2&gt;

&lt;p&gt;Important distinction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CrewAI is still the orchestration framework&lt;/li&gt;
&lt;li&gt;Lynkr is still the LLM gateway&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr does &lt;strong&gt;not&lt;/strong&gt; replace CrewAI’s agent/task/flow model.&lt;/p&gt;

&lt;p&gt;It complements it by making the model layer cheaper and more flexible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest tradeoffs
&lt;/h2&gt;

&lt;p&gt;It is worth being direct here.&lt;/p&gt;

&lt;p&gt;A gateway adds another infrastructure layer.&lt;/p&gt;

&lt;p&gt;That is worth it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you have multiple agents&lt;/li&gt;
&lt;li&gt;you care about spend&lt;/li&gt;
&lt;li&gt;you want provider flexibility&lt;/li&gt;
&lt;li&gt;you are moving toward production usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It may not be worth it when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you are just learning CrewAI&lt;/li&gt;
&lt;li&gt;you are running a toy example once&lt;/li&gt;
&lt;li&gt;simplicity matters more than control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I would not tell every beginner to add a gateway on day one.&lt;/p&gt;

&lt;p&gt;But once your CrewAI project becomes real, the gateway question shows up quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;CrewAI is one of the most important open-source frameworks in the multi-agent Python ecosystem right now.&lt;/p&gt;

&lt;p&gt;It gives you a useful structure for building agent systems with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;roles&lt;/li&gt;
&lt;li&gt;tasks&lt;/li&gt;
&lt;li&gt;crews&lt;/li&gt;
&lt;li&gt;flows&lt;/li&gt;
&lt;li&gt;production-style orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And if those systems are getting expensive, Lynkr is a practical way to put a cost-and-routing layer underneath them.&lt;/p&gt;

&lt;p&gt;That gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one stable model endpoint&lt;/li&gt;
&lt;li&gt;provider flexibility&lt;/li&gt;
&lt;li&gt;caching for repeated context&lt;/li&gt;
&lt;li&gt;tier routing for different agent roles&lt;/li&gt;
&lt;li&gt;a better chance of keeping multi-agent systems affordable as they scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to try the stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CrewAI: &lt;code&gt;https://github.com/crewAIInc/crewAI&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr: &lt;code&gt;https://github.com/Fast-Editor/Lynkr&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are already running CrewAI in production, I think the right question is not:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“What is the best model?”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;“Which parts of my agent system actually deserve the expensive model?”&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>What Is browser-use? And How to save 50% of tokens while using it.</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sun, 07 Jun 2026 07:31:07 +0000</pubDate>
      <link>https://dev.to/lynkr/what-is-browser-use-and-how-to-run-it-through-lynkr-9fn</link>
      <guid>https://dev.to/lynkr/what-is-browser-use-and-how-to-run-it-through-lynkr-9fn</guid>
      <description>&lt;p&gt;If you are building AI agents that can actually &lt;em&gt;do things on websites&lt;/em&gt;, &lt;code&gt;browser-use&lt;/code&gt; is one of the most important open-source projects to understand right now.&lt;/p&gt;

&lt;p&gt;And if you want to use it without being locked into a single model path, &lt;code&gt;Lynkr&lt;/code&gt; is a clean way to put a gateway between your browser agent and whichever LLMs you want behind it.&lt;/p&gt;

&lt;p&gt;I built Lynkr, so take the integration section with that disclosure in mind. Still, &lt;code&gt;browser-use&lt;/code&gt; is genuinely one of the most interesting repos in the agent stack right now, and it is worth understanding on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is browser-use?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;browser-use&lt;/code&gt; is an open-source framework for giving LLM agents access to a real browser.&lt;/p&gt;

&lt;p&gt;In plain English:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it opens a browser&lt;/li&gt;
&lt;li&gt;lets an agent inspect the current page state&lt;/li&gt;
&lt;li&gt;click buttons&lt;/li&gt;
&lt;li&gt;type into inputs&lt;/li&gt;
&lt;li&gt;extract information&lt;/li&gt;
&lt;li&gt;navigate across sites&lt;/li&gt;
&lt;li&gt;and complete real browser workflows from a prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project’s GitHub description is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Make websites accessible for AI agents. Automate tasks online with ease.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At the time of writing, the repo has &lt;strong&gt;97.5k stars&lt;/strong&gt;, which tells you this is not some niche experiment anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why browser-use blew up
&lt;/h2&gt;

&lt;p&gt;A lot of “AI agents” stop at text generation.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;browser-use&lt;/code&gt; matters because it pushes into the next step: &lt;strong&gt;agents that can interact with software the same way a user does&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That means you can build workflows like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;filling out forms&lt;/li&gt;
&lt;li&gt;pulling data out of dashboards&lt;/li&gt;
&lt;li&gt;logging into tools and clicking through UI flows&lt;/li&gt;
&lt;li&gt;checking prices, calendars, tickets, or inventory&lt;/li&gt;
&lt;li&gt;testing internal tools&lt;/li&gt;
&lt;li&gt;handling repetitive browser tasks that don’t have a clean API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the real appeal: many businesses do not need another chatbot. They need automation for systems that only really exist behind a browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  What browser-use gives you
&lt;/h2&gt;

&lt;p&gt;From the repo and quickstart, the project gives you a few things that make it practical:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an open-source Python agent framework&lt;/li&gt;
&lt;li&gt;a browser abstraction the agent can control&lt;/li&gt;
&lt;li&gt;examples for common browser tasks&lt;/li&gt;
&lt;li&gt;a CLI for persistent browser automation&lt;/li&gt;
&lt;li&gt;optional cloud/browser infrastructure from the Browser Use team&lt;/li&gt;
&lt;li&gt;support for multiple LLM backends in its quickstart examples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Their human quickstart shows the core pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatBrowserUse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find the number of stars of the browser-use repo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatBrowserUse&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important concept is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Browser()&lt;/code&gt; handles the browser session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Agent(...)&lt;/code&gt; handles the goal and step-by-step decisions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;llm=...&lt;/code&gt; controls which model layer is making those decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last part is exactly where Lynkr becomes useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr fits
&lt;/h2&gt;

&lt;p&gt;If &lt;code&gt;browser-use&lt;/code&gt; is the browser-side execution layer, Lynkr can sit under it as the &lt;strong&gt;LLM gateway&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That gives you one stable endpoint between your browser agent and the actual providers behind it.&lt;/p&gt;

&lt;p&gt;Instead of hard-wiring one provider path everywhere, you can put this in the middle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;browser-use agent
      ↓
    Lynkr
      ↓
Ollama / OpenRouter / Bedrock / OpenAI / Azure / Databricks / others
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That matters because browser agents are usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multi-step&lt;/li&gt;
&lt;li&gt;tool-heavy&lt;/li&gt;
&lt;li&gt;iterative&lt;/li&gt;
&lt;li&gt;expensive when they retry or explore a page&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And those are exactly the workloads where routing and token optimization matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why use Lynkr with browser-use?
&lt;/h2&gt;

&lt;p&gt;The basic answer is: &lt;strong&gt;browser agents create lots of LLM calls, and Lynkr helps you control that cost and flexibility&lt;/strong&gt;.&lt;br&gt;
Lynkr has tiered routing which can help you save 50-60% of your token usage.&lt;/p&gt;

&lt;p&gt;From the current Lynkr README, the relevant levers are:&lt;br&gt;
 ---- all these values are compared to LiteLLM&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;53% fewer tokens on tool-heavy requests&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;87.6% compression on large JSON/tool outputs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;171ms semantic cache hits&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;automatic tier routing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;zero code changes at the client boundary once the endpoint is swapped&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though those numbers come from coding-tool workloads, the shape maps well to browser agents too:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;page-state dumps can get large&lt;/li&gt;
&lt;li&gt;repeated task loops can benefit from cache hits&lt;/li&gt;
&lt;li&gt;simple browser steps do not always need your most expensive model&lt;/li&gt;
&lt;li&gt;hard navigation/reasoning steps can be escalated to a stronger model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the win is not just “use another model.”&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;one gateway endpoint&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;provider flexibility&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;routing cheap vs expensive work differently&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;lower spend on repetitive agent loops&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  When this combination makes sense
&lt;/h2&gt;

&lt;p&gt;Using &lt;code&gt;browser-use&lt;/code&gt; with Lynkr makes the most sense if you are doing any of these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;running browser agents repeatedly in production&lt;/li&gt;
&lt;li&gt;experimenting with multiple providers for reliability or cost&lt;/li&gt;
&lt;li&gt;mixing local and cloud models&lt;/li&gt;
&lt;li&gt;trying to avoid hard vendor lock-in&lt;/li&gt;
&lt;li&gt;building internal automations where cost per workflow matters&lt;/li&gt;
&lt;li&gt;wanting one OpenAI-compatible gateway for several agent systems, not just browser-use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are just trying one script once, direct provider setup is fine.&lt;/p&gt;

&lt;p&gt;If you are building a real browser-agent workflow that you will run over and over, putting a gateway in front of it starts to make more sense.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to use browser-use
&lt;/h2&gt;

&lt;p&gt;The project’s quickstart uses &lt;code&gt;uv&lt;/code&gt; and Python 3.11+.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Install browser-use
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv init
uv add browser-use
uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;If Chromium is not already installed, their repo also mentions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uvx browser-use &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Create a simple browser-use script
&lt;/h2&gt;

&lt;p&gt;Start with a minimal example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;browser_use&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatBrowserUse&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Browser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Open GitHub and find the number of stars on the browser-use repository&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ChatBrowserUse&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This verifies that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python is set up correctly&lt;/li&gt;
&lt;li&gt;the browser launches&lt;/li&gt;
&lt;li&gt;the agent can take a goal and act on it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gets you the baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Install Lynkr
&lt;/h2&gt;

&lt;p&gt;Now add the gateway layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; lynkr
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Start Lynkr with a provider behind it
&lt;/h2&gt;

&lt;p&gt;For a simple cloud setup, the current Lynkr README shows OpenRouter like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openrouter
&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-key
&lt;span class="nv"&gt;FALLBACK_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
&lt;/span&gt;&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;span class="nv"&gt;PROMPT_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;SEMANTIC_CACHE_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start Lynkr:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lynkr start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a free/local path, Lynkr also supports local providers like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ollama&lt;/li&gt;
&lt;li&gt;llama.cpp&lt;/li&gt;
&lt;li&gt;LM Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means you can test browser agents locally first, then move harder tasks to cloud models later.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Point browser-use at Lynkr
&lt;/h2&gt;

&lt;p&gt;This is the part that depends on which LLM wrapper you use inside &lt;code&gt;browser-use&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The repo’s README shows examples like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ChatBrowserUse()&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ChatGoogle(...)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ChatAnthropic(...)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The general pattern is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if your selected browser-use model wrapper supports a custom base URL / OpenAI-compatible endpoint, point it at Lynkr&lt;/li&gt;
&lt;li&gt;Lynkr then forwards the request to the actual backend provider you configured&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The integration idea is the same as any other app using a gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;browser-use LLM client → Lynkr base URL → chosen providers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because Lynkr exposes an OpenAI-compatible surface and already supports routing clients like Claude Code, Cursor, Codex, Cline, and Continue, the practical fit is strongest when your browser-use stack can talk through an OpenAI-style endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical architecture to think about
&lt;/h2&gt;

&lt;p&gt;If you are building a serious browser automation system, this is the architecture I would use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Your app / worker
      ↓
 browser-use
      ↓
   Lynkr
      ↓
Simple tasks → cheap/local model
Hard tasks   → stronger cloud model
Retries      → cached/routed through same gateway
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a few operational wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one place to change providers&lt;/li&gt;
&lt;li&gt;one place to add caching/routing&lt;/li&gt;
&lt;li&gt;one place to enforce model policy&lt;/li&gt;
&lt;li&gt;one place to swap local/cloud behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What kinds of browser-use tasks benefit most?
&lt;/h2&gt;

&lt;p&gt;The biggest benefit is not “every browser step becomes cheap.”&lt;/p&gt;

&lt;p&gt;The biggest benefit is that &lt;strong&gt;not every step deserves the same model&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;h3&gt;
  
  
  Good candidates for cheaper tiers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;page classification&lt;/li&gt;
&lt;li&gt;checking whether an element exists&lt;/li&gt;
&lt;li&gt;extracting a small piece of text&lt;/li&gt;
&lt;li&gt;moving through obvious deterministic UI steps&lt;/li&gt;
&lt;li&gt;repeated workflows you run every day&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Good candidates for stronger models
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;ambiguous navigation&lt;/li&gt;
&lt;li&gt;dense multi-step forms&lt;/li&gt;
&lt;li&gt;recovery after unexpected UI changes&lt;/li&gt;
&lt;li&gt;reasoning-heavy extraction tasks&lt;/li&gt;
&lt;li&gt;flows with messy instructions from users&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is exactly why a gateway helps. Browser agents are not one homogeneous workload.&lt;/p&gt;

&lt;h2&gt;
  
  
  A realistic example
&lt;/h2&gt;

&lt;p&gt;Say you are automating a support workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log into admin panel&lt;/li&gt;
&lt;li&gt;search user account&lt;/li&gt;
&lt;li&gt;open billing page&lt;/li&gt;
&lt;li&gt;check subscription state&lt;/li&gt;
&lt;li&gt;update a field&lt;/li&gt;
&lt;li&gt;confirm success&lt;/li&gt;
&lt;li&gt;export some result back to your app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a gateway, every step may go to the same expensive provider.&lt;/p&gt;

&lt;p&gt;With Lynkr in the middle, you can move toward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cheap model for straightforward navigation&lt;/li&gt;
&lt;li&gt;stronger model when the page layout becomes ambiguous&lt;/li&gt;
&lt;li&gt;cache/reuse repeated context patterns&lt;/li&gt;
&lt;li&gt;preserve one integration point in your app&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a much better shape as soon as workflows become frequent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynkr does &lt;em&gt;not&lt;/em&gt; replace here
&lt;/h2&gt;

&lt;p&gt;Important distinction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;browser-use&lt;/code&gt; is still the browser automation layer&lt;/li&gt;
&lt;li&gt;Lynkr is still the LLM gateway layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lynkr does &lt;strong&gt;not&lt;/strong&gt; replace the actual browser agent runtime.&lt;/p&gt;

&lt;p&gt;It sits underneath it and makes the model side more flexible.&lt;/p&gt;

&lt;p&gt;That is why this pairing is interesting: they are complementary, not redundant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tradeoffs and honesty section
&lt;/h2&gt;

&lt;p&gt;Since I built Lynkr, it is worth stating the tradeoffs plainly.&lt;/p&gt;

&lt;p&gt;Using a gateway adds another layer to operate.&lt;/p&gt;

&lt;p&gt;That is worth it when you care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provider control&lt;/li&gt;
&lt;li&gt;cost routing&lt;/li&gt;
&lt;li&gt;caching&lt;/li&gt;
&lt;li&gt;consistent integration across multiple tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is &lt;em&gt;not&lt;/em&gt; automatically worth it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one-off experiments&lt;/li&gt;
&lt;li&gt;tiny local scripts you run once a week&lt;/li&gt;
&lt;li&gt;very early prototypes where simplicity matters more than control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the right mental model is not “everyone needs a gateway.”&lt;/p&gt;

&lt;p&gt;It is “browser agents become more infrastructure-like very quickly, and gateway control starts paying off once that happens.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Why browser-use is worth learning even if you do not use Lynkr
&lt;/h2&gt;

&lt;p&gt;Even without the Lynkr angle, &lt;code&gt;browser-use&lt;/code&gt; matters because it represents a bigger shift:&lt;/p&gt;

&lt;p&gt;we are moving from LLMs that answer questions to LLM systems that can operate software.&lt;/p&gt;

&lt;p&gt;That changes the shape of automation.&lt;/p&gt;

&lt;p&gt;The future stack is not just:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prompt in&lt;/li&gt;
&lt;li&gt;text out&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is increasingly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;goal in&lt;/li&gt;
&lt;li&gt;browser actions&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;retries&lt;/li&gt;
&lt;li&gt;extraction&lt;/li&gt;
&lt;li&gt;completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And &lt;code&gt;browser-use&lt;/code&gt; is one of the clearest open-source projects showing that shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;If you want to understand modern browser agents, start with &lt;code&gt;browser-use&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you want to run those agents with more control over cost, routing, and provider choice, put &lt;code&gt;Lynkr&lt;/code&gt; underneath them as the LLM gateway.&lt;/p&gt;

&lt;p&gt;That combination gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser automation on top&lt;/li&gt;
&lt;li&gt;provider flexibility underneath&lt;/li&gt;
&lt;li&gt;one stable endpoint for your model layer&lt;/li&gt;
&lt;li&gt;a cleaner path to scaling beyond a single hard-wired provider&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to try it, start here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser-use: &lt;code&gt;https://github.com/browser-use/browser-use&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr: &lt;code&gt;https://github.com/Fast-Editor/Lynkr&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re already using browser-use, I’d be curious about one thing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;would you rather optimize for the strongest possible model on every step, or route browser-agent work by difficulty and cost?&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>tutorial</category>
      <category>devops</category>
    </item>
    <item>
      <title>I Benchmarked Lynkr Against LiteLLM on the Same Backends.</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sat, 06 Jun 2026 00:14:18 +0000</pubDate>
      <link>https://dev.to/lynkr/i-benchmarked-lynkr-against-litellm-on-the-same-backends-lynkr-was-cheaper-for-tool-heavy-workloads-2onf</link>
      <guid>https://dev.to/lynkr/i-benchmarked-lynkr-against-litellm-on-the-same-backends-lynkr-was-cheaper-for-tool-heavy-workloads-2onf</guid>
      <description>&lt;h2&gt;
  
  
  I Benchmarked Lynkr Against LiteLLM on the Same Backends. Lynkr Was Cheaper for Tool-Heavy Workloads
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Founder disclosure: I built Lynkr, so take this as a technical benchmark write-up, not a neutral industry report. The numbers below come from the same backend providers on both gateways.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you're routing AI coding traffic through a gateway, just switching providers is not enough. The real savings come from reducing the tokens that ever reach the model in the first place.&lt;/p&gt;

&lt;p&gt;I ran Lynkr and LiteLLM against the same backends — Ollama locally, Moonshot, and Azure OpenAI — across 9 scenarios. On the scenarios that actually look like agentic coding work, Lynkr was cheaper because it does three things before forwarding the request upstream: smart tool selection, TOON compression, and semantic caching.&lt;/p&gt;

&lt;h2&gt;
  
  
  The short version
&lt;/h2&gt;

&lt;p&gt;Lynkr was measurably better on the cost-sensitive parts of the workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Smart tool selection:&lt;/strong&gt; 53% fewer input tokens, 52% lower cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON JSON compression:&lt;/strong&gt; 87.6% fewer billed tokens on a large tool result, 50% lower cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic cache:&lt;/strong&gt; 171ms cache-hit response vs 3,282ms on the repeat query path&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier routing:&lt;/strong&gt; escalated hard prompts to stronger models instead of blindly sending everything to the cheapest route&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Lynkr result&lt;/th&gt;
&lt;th&gt;Why it mattered&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool selection&lt;/td&gt;
&lt;td&gt;53% fewer tokens&lt;/td&gt;
&lt;td&gt;Removes irrelevant tool schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TOON compression&lt;/td&gt;
&lt;td&gt;87.6% fewer tokens&lt;/td&gt;
&lt;td&gt;Shrinks large JSON tool outputs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic cache&lt;/td&gt;
&lt;td&gt;171ms cache hit&lt;/td&gt;
&lt;td&gt;Avoids repeat model calls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tier routing&lt;/td&gt;
&lt;td&gt;Escalates hard prompts&lt;/td&gt;
&lt;td&gt;Doesn’t over-optimize for cheapest path&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This matters if you're running Claude Code, Codex, Cursor, or similar agent workflows where tools, file reads, grep output, and repeated context dominate your token bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;Same benchmark inputs, same providers, same request shape.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine:&lt;/strong&gt; macOS on Apple Silicon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; v9.3.2 on Node 20&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM:&lt;/strong&gt; v1.87.1 on Python 3.12&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backends used:&lt;/strong&gt; Ollama local, Moonshot, Azure OpenAI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scenarios:&lt;/strong&gt; 9 total across simple prompts, tools, history, cache, and routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each scenario sent the same HTTP request to both gateways at &lt;code&gt;POST /v1/messages&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Lynkr wins
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1) Smart tool selection
&lt;/h2&gt;

&lt;p&gt;A lot of coding requests are read-only, but the model still gets handed the full tool universe: write, edit, bash, git, file ops, everything.&lt;/p&gt;

&lt;p&gt;Lynkr classifies the request first and strips irrelevant tool schemas before forwarding upstream. So a read-only question does not pay to carry write-capable tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark setup:&lt;/strong&gt; 14 tool definitions attached to every request, which is pretty realistic for a Claude Code or Cursor style session.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; 959 billed input tokens, $0.0044&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM:&lt;/strong&gt; 2,085 billed input tokens, $0.0091&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 53% fewer input tokens and 52% lower cost on the same model and prompt.&lt;/p&gt;

&lt;p&gt;This is the kind of optimization that compounds because it happens before every downstream model call.&lt;/p&gt;

&lt;h2&gt;
  
  
  2) TOON compression for tool results
&lt;/h2&gt;

&lt;p&gt;Tool-heavy workflows often blow up because of structured JSON, not because the user wrote a long prompt.&lt;/p&gt;

&lt;p&gt;Lynkr's TOON path compresses large JSON payloads before they hit the provider. Plain text goes through unchanged. The useful effect is that file reads, grep arrays, tool traces, and other structured outputs stop dominating the request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benchmark setup:&lt;/strong&gt; a Bash tool returning 60 grep results as a JSON array, roughly 3,400 tokens unoptimized.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; 427 billed input tokens, $0.009, 12s latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM:&lt;/strong&gt; 3,458 billed input tokens, $0.018, 12s latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; 87.6% token reduction and 50% lower cost at the same latency.&lt;/p&gt;

&lt;p&gt;That last part matters. This was not a tradeoff where cost improved because the request got slower. Compression happened in-process and the wall-clock result stayed flat.&lt;/p&gt;

&lt;h2&gt;
  
  
  3) Semantic cache
&lt;/h2&gt;

&lt;p&gt;The easiest cheap request is the one that never reaches the model.&lt;/p&gt;

&lt;p&gt;Lynkr computes embeddings for the incoming prompt and returns a cached response when a semantically similar request shows up again. In the benchmark, the second prompt was just a paraphrase of the first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Explain TCP vs UDP"&lt;/li&gt;
&lt;li&gt;"What is the difference between TCP and UDP?"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cold run vs cache hit
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr cold:&lt;/strong&gt; 2,857 tokens, 1,891ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr cache hit:&lt;/strong&gt; served from cache in 171ms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM repeat path:&lt;/strong&gt; 54 tokens, 3,282ms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important part is not just token avoidance. The response time dropped from 1.9s to 171ms, about &lt;strong&gt;11x faster&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For interactive tooling, that difference is felt immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  4) Tier routing that looks at complexity, not just price
&lt;/h2&gt;

&lt;p&gt;LiteLLM has routing. But in this benchmark configuration it was using &lt;code&gt;cost-based-routing&lt;/code&gt;, which means the gateway optimizes for cheap first.&lt;/p&gt;

&lt;p&gt;That works for simple questions. It breaks when the prompt genuinely needs a stronger model.&lt;/p&gt;

&lt;p&gt;Lynkr scores requests across 15 dimensions — token size, reasoning markers, code complexity, risk signals, and agentic traits — then routes automatically.&lt;/p&gt;

&lt;p&gt;In the benchmark:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple prompt:&lt;/strong&gt; "What does git stash do?"

&lt;ul&gt;
&lt;li&gt;Lynkr routed to &lt;code&gt;minimax-m2.5&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;LiteLLM routed to local Ollama&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex prompt:&lt;/strong&gt; JWT vs cookies security analysis for a banking architecture

&lt;ul&gt;
&lt;li&gt;Lynkr escalated to &lt;code&gt;moonshot-v1-auto&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;LiteLLM still sent it to local Ollama&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between "cheap by default" and "cheap when appropriate."&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this benchmark matters more than a generic proxy comparison
&lt;/h2&gt;

&lt;p&gt;A lot of gateway comparisons collapse into "who can talk to more providers." That is table stakes now.&lt;/p&gt;

&lt;p&gt;The more important question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What does the gateway do to reduce spend before the request hits the model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is where Lynkr is different in practice.&lt;/p&gt;

&lt;p&gt;It stacks three cost levers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tool pruning&lt;/strong&gt; so irrelevant tool schemas do not ride along&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON compression&lt;/strong&gt; so large structured tool output stops inflating prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic cache&lt;/strong&gt; so repeated or near-repeated requests do not call the model again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then it adds &lt;strong&gt;tier routing&lt;/strong&gt; on top, so the remaining requests go to the right model for the job.&lt;/p&gt;

&lt;p&gt;That stack is why the benchmark result is interesting. It is not just "Lynkr can route too." It is that Lynkr changes the size and shape of the request before routing even happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost projection at 100,000 requests/month
&lt;/h2&gt;

&lt;p&gt;Using the large JSON tool-result test as a representative tool-heavy scenario:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LiteLLM:&lt;/strong&gt; about &lt;strong&gt;$818/month&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lynkr:&lt;/strong&gt; about &lt;strong&gt;$409/month&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So on equal footing, same backend, same model class, Lynkr came out roughly &lt;strong&gt;50% cheaper&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is the distinction I'd care about if I were evaluating an LLM gateway for coding agents. Not whether the gateway has another provider adapter, but whether it reduces the number of tokens my provider ever sees.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about Portkey?
&lt;/h2&gt;

&lt;p&gt;Portkey is good at a different layer of the stack.&lt;/p&gt;

&lt;p&gt;It is stronger on managed observability, prompt management, and governance. But this benchmark was not measuring dashboarding or policy UX. It was measuring request-path optimization.&lt;/p&gt;

&lt;p&gt;On that axis, Lynkr is doing something Portkey does not really center on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;automatic complexity detection&lt;/li&gt;
&lt;li&gt;semantic caching&lt;/li&gt;
&lt;li&gt;token compression&lt;/li&gt;
&lt;li&gt;drop-in routing for coding-tool workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I would not frame this as "Portkey but cheaper." They solve different primary problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Important caveats
&lt;/h2&gt;

&lt;p&gt;To keep this honest, there are a few things worth stating clearly.&lt;/p&gt;

&lt;h3&gt;
  
  
  1) This is not a neutral benchmark
&lt;/h3&gt;

&lt;p&gt;I built Lynkr. So the burden is on me to be explicit about methodology and where the numbers come from.&lt;/p&gt;

&lt;h3&gt;
  
  
  2) LiteLLM can look cheaper in headline totals
&lt;/h3&gt;

&lt;p&gt;If LiteLLM routes everything to a free local model, the raw total can look lower. But that is not the useful comparison.&lt;/p&gt;

&lt;p&gt;The fair comparison is &lt;strong&gt;same backend, same prompt, same model class&lt;/strong&gt;. On those apples-to-apples paths, Lynkr was cheaper because it sent fewer tokens upstream.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Lynkr adds system-level context
&lt;/h3&gt;

&lt;p&gt;In this benchmark, Lynkr injected a system prompt with memory and agent instructions, which added about 2,800 tokens of overhead in some scenarios. That is why comparing estimated raw request size to billed tokens can be misleading.&lt;/p&gt;

&lt;p&gt;The correct comparison is billed tokens between Lynkr and LiteLLM on the same scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who this is for
&lt;/h2&gt;

&lt;p&gt;Lynkr is for teams running things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Claude Code&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cursor&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;custom agents using an OpenAI-compatible endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your real problem is reducing spend on coding workflows without rewriting client-side integrations, the benchmark result is pretty simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lynkr wins when the workload includes tools, structured outputs, repeated prompts, and mixed-complexity requests.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is exactly what real coding-agent traffic looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducibility
&lt;/h2&gt;

&lt;p&gt;The benchmark script is reproducible from the Lynkr repo root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node benchmark-tier-routing.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Versions used in this run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lynkr v9.3.2&lt;/li&gt;
&lt;li&gt;LiteLLM v1.87.1&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final takeaway
&lt;/h2&gt;

&lt;p&gt;If all you want is a gateway that forwards requests, Lynkr is not interesting.&lt;/p&gt;

&lt;p&gt;If you want a gateway that makes coding traffic cheaper &lt;strong&gt;before&lt;/strong&gt; it reaches the model, that is where Lynkr starts to separate.&lt;/p&gt;

&lt;p&gt;The three levers that mattered in this benchmark were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool selection&lt;/li&gt;
&lt;li&gt;TOON compression&lt;/li&gt;
&lt;li&gt;semantic cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And on top of that, tier routing kept the hard prompts from being sent to the wrong model just because it was cheaper.&lt;/p&gt;

&lt;p&gt;If you want to dig into it, the repo is here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you test it against your own coding workload, I would genuinely like to know where it holds up and where it doesn't.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How a Gateway Layer Could Reduce LLM Costs in TradingAgents</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Tue, 02 Jun 2026 23:02:53 +0000</pubDate>
      <link>https://dev.to/lynkr/how-tradingagents-82k-could-slash-70-of-llm-costs-with-a-gateway-layer-40p5</link>
      <guid>https://dev.to/lynkr/how-tradingagents-82k-could-slash-70-of-llm-costs-with-a-gateway-layer-40p5</guid>
      <description>&lt;p&gt;Multi-agent AI systems are impressive, but they can also become expensive fast.&lt;/p&gt;

&lt;p&gt;That’s especially true for projects like &lt;strong&gt;TradingAgents&lt;/strong&gt;, where multiple agents may gather information, summarize findings, compare signals, and synthesize outputs before arriving at a final result.&lt;/p&gt;

&lt;p&gt;The instinctive way to build systems like this is simple: use one strong model for everything.&lt;/p&gt;

&lt;p&gt;It works — but it’s often wasteful.&lt;/p&gt;

&lt;p&gt;That’s where a gateway layer starts to matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real problem isn’t model cost — it’s overprovisioning
&lt;/h2&gt;

&lt;p&gt;When people talk about LLM cost in agent systems, they often focus on the price of the “main” model.&lt;/p&gt;

&lt;p&gt;But in practice, the bigger issue is usually &lt;strong&gt;overprovisioning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A multi-agent system often sends many different kinds of tasks through the same premium model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;intermediate summaries&lt;/li&gt;
&lt;li&gt;lightweight transformations&lt;/li&gt;
&lt;li&gt;retrieval-adjacent reasoning&lt;/li&gt;
&lt;li&gt;orchestration steps&lt;/li&gt;
&lt;li&gt;final synthesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those tasks don’t all need the same level of capability.&lt;/p&gt;

&lt;p&gt;And once every step uses the most expensive model in the stack, costs rise much faster than they need to.&lt;/p&gt;

&lt;p&gt;That’s not a criticism of TradingAgents specifically. It’s a common pattern in multi-agent design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TradingAgents is a good example
&lt;/h2&gt;

&lt;p&gt;TradingAgents is exactly the kind of system where this matters.&lt;/p&gt;

&lt;p&gt;A workflow like this usually contains several layers of work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;collecting or interpreting market information&lt;/li&gt;
&lt;li&gt;comparing different signals or perspectives&lt;/li&gt;
&lt;li&gt;generating intermediate summaries&lt;/li&gt;
&lt;li&gt;combining outputs into a final view&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of those steps are relatively lightweight.&lt;br&gt;&lt;br&gt;
Some are more reasoning-heavy.&lt;br&gt;&lt;br&gt;
Some likely matter more for output quality than others.&lt;/p&gt;

&lt;p&gt;That creates a natural opportunity: &lt;strong&gt;not every step has to run on the same model tier&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a gateway layer changes
&lt;/h2&gt;

&lt;p&gt;A gateway layer sits between the application and the underlying model providers.&lt;/p&gt;

&lt;p&gt;Its job is not to “make the model better.”&lt;br&gt;&lt;br&gt;
Its job is to give the system more control over &lt;strong&gt;where different requests go&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In a setup like TradingAgents, that could mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lightweight summarization goes to a cheaper model&lt;/li&gt;
&lt;li&gt;intermediate analysis goes to a balanced mid-tier model&lt;/li&gt;
&lt;li&gt;final synthesis or high-stakes reasoning goes to a stronger premium model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s the key idea.&lt;/p&gt;

&lt;p&gt;The savings do not come from magic.&lt;br&gt;&lt;br&gt;
They come from &lt;strong&gt;routing tasks based on complexity instead of defaulting everything to the same expensive backend&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where cost savings might actually come from
&lt;/h2&gt;

&lt;p&gt;The interesting thing about systems like TradingAgents is that a lot of model usage may happen before the “final” answer is even produced.&lt;/p&gt;

&lt;p&gt;If multiple agents are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reading inputs&lt;/li&gt;
&lt;li&gt;generating their own interpretations&lt;/li&gt;
&lt;li&gt;refining intermediate outputs&lt;/li&gt;
&lt;li&gt;exchanging context&lt;/li&gt;
&lt;li&gt;contributing to a final synthesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;then the system can accumulate a large number of calls very quickly.&lt;/p&gt;

&lt;p&gt;If all of those calls hit the same premium model, the cost profile becomes hard to justify.&lt;/p&gt;

&lt;p&gt;A gateway layer helps by letting you separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;cheap, repeatable steps&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;moderately complex reasoning&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;high-value final decision steps&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you a more rational stack.&lt;/p&gt;

&lt;p&gt;If a large share of the workflow is made up of summarization, orchestration, and intermediate transformations, then routing those steps to cheaper models could produce substantial savings.&lt;/p&gt;

&lt;p&gt;The exact percentage depends on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how many agents are involved&lt;/li&gt;
&lt;li&gt;how often they call models&lt;/li&gt;
&lt;li&gt;prompt sizes&lt;/li&gt;
&lt;li&gt;context sizes&lt;/li&gt;
&lt;li&gt;whether outputs are recursive or chained&lt;/li&gt;
&lt;li&gt;which steps truly need premium reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real insight is:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;multi-agent systems create natural routing opportunities, and those opportunities often go unused.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where a gateway layer like &lt;strong&gt;Lynkr&lt;/strong&gt; becomes relevant.&lt;/p&gt;

&lt;p&gt;Lynkr is useful in this kind of stack because it can make the model layer more flexible without forcing the application to be rewritten around one provider.&lt;/p&gt;

&lt;p&gt;That means systems like TradingAgents can potentially:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;route cheaper tasks to lower-cost models&lt;/li&gt;
&lt;li&gt;reserve premium models for the hardest reasoning steps&lt;/li&gt;
&lt;li&gt;swap providers without changing the whole application layer&lt;/li&gt;
&lt;li&gt;mix local, cloud, or enterprise backends more cleanly&lt;/li&gt;
&lt;li&gt;introduce fallback behavior if one backend is slow or unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes the architecture more practical, not just cheaper.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bigger takeaway
&lt;/h2&gt;

&lt;p&gt;The point is not that TradingAgents is “too expensive” or designed incorrectly.&lt;/p&gt;

&lt;p&gt;The point is that &lt;strong&gt;multi-agent systems naturally create different classes of work&lt;/strong&gt;, and those classes should not automatically be priced the same.&lt;/p&gt;

&lt;p&gt;A gateway layer is valuable because it introduces policy into the model layer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which tasks go where&lt;/li&gt;
&lt;li&gt;which tasks deserve premium reasoning&lt;/li&gt;
&lt;li&gt;which tasks can be handled more cheaply&lt;/li&gt;
&lt;li&gt;how the system behaves when one provider fails&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a much more durable idea than simply trying to find the single “best” model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;TradingAgents is a useful example because it shows how quickly multi-agent systems can compound model usage.&lt;/p&gt;

&lt;p&gt;Once multiple agents are generating intermediate work before a final result, using one expensive model for everything becomes the easy default — but not always the right one.&lt;/p&gt;

&lt;p&gt;That’s why a gateway layer matters.&lt;/p&gt;

&lt;p&gt;Not because it magically reduces costs.&lt;/p&gt;

&lt;p&gt;But because it gives systems like TradingAgents a way to stop overpaying for the parts of the workflow that don’t need premium intelligence in the first place.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
      <category>devtools</category>
    </item>
    <item>
      <title>How to Self-Host UI-TARS Desktop Without Vendor Lock-In</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Tue, 02 Jun 2026 05:27:44 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-self-host-ui-tars-desktop-without-vendor-lock-in-2pie</link>
      <guid>https://dev.to/lynkr/how-to-self-host-ui-tars-desktop-without-vendor-lock-in-2pie</guid>
      <description>&lt;p&gt;The next interesting wave of AI tools isn't just about coding assistants.&lt;/p&gt;

&lt;p&gt;It's about &lt;strong&gt;agents that can actually operate software&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's why &lt;strong&gt;UI-TARS Desktop&lt;/strong&gt; is worth paying attention to. It's an open-source multimodal desktop agent from ByteDance's broader TARS ecosystem, designed around a simple but powerful idea: let an AI agent see the interface, understand what's on screen, and interact with the computer like a user would.&lt;/p&gt;

&lt;p&gt;After looking through the GitHub repo, the positioning is pretty clear. UI-TARS Desktop is a native GUI agent with support for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local and remote computer operators&lt;/li&gt;
&lt;li&gt;browser operators&lt;/li&gt;
&lt;li&gt;screenshot-based visual understanding&lt;/li&gt;
&lt;li&gt;mouse and keyboard control&lt;/li&gt;
&lt;li&gt;cross-platform usage&lt;/li&gt;
&lt;li&gt;a broader agent stack that connects vision, GUI actions, and MCP-style tool integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That already makes it interesting.&lt;/p&gt;

&lt;p&gt;But the part that matters most for real-world use is what sits &lt;strong&gt;underneath&lt;/strong&gt; it: the model layer.&lt;/p&gt;

&lt;p&gt;And that's where &lt;strong&gt;Lynkr&lt;/strong&gt; becomes useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Desktop agents are powerful — and expensive to get wrong
&lt;/h2&gt;

&lt;p&gt;Desktop agents are a different category from coding copilots.&lt;/p&gt;

&lt;p&gt;A coding tool mostly works inside text: source files, terminals, prompts, diffs.&lt;/p&gt;

&lt;p&gt;A desktop agent has to deal with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;screenshots&lt;/li&gt;
&lt;li&gt;dynamic UI state&lt;/li&gt;
&lt;li&gt;clicking the right target&lt;/li&gt;
&lt;li&gt;retrying after failure&lt;/li&gt;
&lt;li&gt;latency between action and feedback&lt;/li&gt;
&lt;li&gt;reasoning over visual context&lt;/li&gt;
&lt;li&gt;sometimes switching between browser and desktop flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the model setup matters a lot.&lt;/p&gt;

&lt;p&gt;If the backend is too weak, the agent makes bad decisions.&lt;/p&gt;

&lt;p&gt;If it's too expensive, experimentation becomes painful.&lt;/p&gt;

&lt;p&gt;If it's tied to one provider, the whole stack becomes brittle.&lt;/p&gt;

&lt;p&gt;For teams trying to use tools like UI-TARS Desktop seriously, the bottleneck is not just "is the model smart enough?"&lt;/p&gt;

&lt;p&gt;It's also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;can we run it locally when needed?&lt;/li&gt;
&lt;li&gt;can we swap providers without rewriting the setup?&lt;/li&gt;
&lt;li&gt;can we use cheap models for lighter tasks and stronger ones for harder steps?&lt;/li&gt;
&lt;li&gt;can we fit this into enterprise infra without locking into a single vendor?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of problem Lynkr is built for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynkr adds beneath UI-TARS Desktop
&lt;/h2&gt;

&lt;p&gt;Lynkr's core value is straightforward: it acts as a universal LLM gateway for AI tools.&lt;/p&gt;

&lt;p&gt;Instead of tying one tool to one provider, Lynkr makes it possible to route requests across different model backends while keeping the tool-facing interface stable.&lt;/p&gt;

&lt;p&gt;That matters a lot for a desktop agent stack.&lt;/p&gt;

&lt;p&gt;A UI-TARS Desktop + Lynkr setup could make it possible to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;test different providers without changing the whole workflow&lt;/li&gt;
&lt;li&gt;use local models for cheaper experimentation&lt;/li&gt;
&lt;li&gt;route more difficult reasoning steps to stronger cloud models&lt;/li&gt;
&lt;li&gt;keep enterprise traffic inside approved backends like Bedrock, Azure, or Databricks&lt;/li&gt;
&lt;li&gt;reduce provider lock-in as the desktop agent ecosystem evolves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In other words: UI-TARS Desktop gives you the &lt;strong&gt;agent interface&lt;/strong&gt;, and Lynkr gives you the &lt;strong&gt;model control plane&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That's a much better architecture than hardwiring one expensive model setup into a fast-moving agent product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more for multimodal agents
&lt;/h2&gt;

&lt;p&gt;The more multimodal a tool gets, the more useful backend flexibility becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Lynkr Fits Under UI-TARS
&lt;/h2&gt;

&lt;p&gt;The cleanest mental model is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;UI-TARS Desktop / Agent TARS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Lynkr&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Ollama, OpenRouter, Bedrock, Azure, Databricks, OpenAI, or another backend&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That gives you one stable endpoint for the agent layer while keeping the actual model choice flexible.&lt;/p&gt;

&lt;p&gt;At a high level, the goal is to point UI-TARS or Agent TARS at Lynkr instead of binding the stack directly to a single vendor.&lt;/p&gt;

&lt;p&gt;In practice, that usually means configuring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a custom model endpoint or base URL&lt;/li&gt;
&lt;li&gt;a model name that Lynkr can route internally&lt;/li&gt;
&lt;li&gt;an API key placeholder or Lynkr-managed credential path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the runtime supports an OpenAI-compatible endpoint, the setup conceptually looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8081/v1&lt;/span&gt;
&lt;span class="py"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;dummy&lt;/span&gt;
&lt;span class="py"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Lynkr can then translate and route that request to the provider you actually want to use.&lt;/p&gt;

&lt;p&gt;That setup makes it easier to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run cheaper local models during experimentation&lt;/li&gt;
&lt;li&gt;send harder multimodal tasks to stronger cloud models&lt;/li&gt;
&lt;li&gt;avoid rewriting agent config every time you change providers&lt;/li&gt;
&lt;li&gt;keep traffic inside enterprise-approved infrastructure&lt;/li&gt;
&lt;li&gt;add fallback behavior when one provider is degraded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One important caveat: the exact configuration path depends on whether UI-TARS Desktop or Agent TARS exposes a custom compatible endpoint directly, or only vendor-specific settings. So this is best understood as the intended integration pattern unless you validate the exact runtime path in a live setup.&lt;/p&gt;

&lt;p&gt;A desktop agent doesn't just answer a question. It has to perceive, decide, act, and recover.&lt;/p&gt;

&lt;p&gt;Some steps need raw speed.&lt;/p&gt;

&lt;p&gt;Some need stronger reasoning.&lt;/p&gt;

&lt;p&gt;Some may need privacy or local execution.&lt;/p&gt;

&lt;p&gt;Some may need enterprise compliance.&lt;/p&gt;

&lt;p&gt;A single-model strategy is often the wrong fit.&lt;/p&gt;

&lt;p&gt;That's why a gateway layer matters more here than it does for a simple chatbot.&lt;/p&gt;

&lt;p&gt;With a Lynkr-style routing layer, you can imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lighter steps going to cheaper or local models&lt;/li&gt;
&lt;li&gt;harder planning steps going to stronger reasoning models&lt;/li&gt;
&lt;li&gt;fallback behavior when one provider degrades&lt;/li&gt;
&lt;li&gt;fast experimentation across multiple backends as UI-TARS evolves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes desktop agents much more practical to run, not just more impressive in a demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  UI-TARS Desktop points to a bigger shift
&lt;/h2&gt;

&lt;p&gt;The most interesting thing about UI-TARS Desktop is that it represents a shift in what users expect from AI.&lt;/p&gt;

&lt;p&gt;People are moving from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"answer my question"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"operate the software for me"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a much bigger leap than most AI product copy admits.&lt;/p&gt;

&lt;p&gt;Once an agent is controlling browsers, settings panels, apps, and workflows, the underlying infrastructure starts to matter a lot more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latency matters&lt;/li&gt;
&lt;li&gt;cost matters&lt;/li&gt;
&lt;li&gt;control matters&lt;/li&gt;
&lt;li&gt;provider flexibility matters&lt;/li&gt;
&lt;li&gt;observability and fallback matter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's why tools like UI-TARS Desktop and Lynkr feel complementary.&lt;/p&gt;

&lt;p&gt;One is pushing upward into &lt;strong&gt;computer use&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The other is stabilizing the messy model layer underneath.&lt;/p&gt;

&lt;p&gt;That combination is more interesting than either product in isolation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is a strong direction for Lynkr
&lt;/h2&gt;

&lt;p&gt;Lynkr already makes sense as a universal LLM gateway for coding tools.&lt;/p&gt;

&lt;p&gt;But tools like UI-TARS Desktop suggest a bigger opportunity.&lt;/p&gt;

&lt;p&gt;The next generation of AI products won't just be IDE assistants. They'll include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;desktop agents&lt;/li&gt;
&lt;li&gt;browser agents&lt;/li&gt;
&lt;li&gt;multimodal workflow tools&lt;/li&gt;
&lt;li&gt;hybrid systems that combine GUI interaction with tool use and automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those tools are going to need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model portability&lt;/li&gt;
&lt;li&gt;cost optimization&lt;/li&gt;
&lt;li&gt;fallback routing&lt;/li&gt;
&lt;li&gt;local/cloud flexibility&lt;/li&gt;
&lt;li&gt;enterprise-friendly deployment paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a very natural place for Lynkr to sit.&lt;/p&gt;

&lt;p&gt;Not as the flashy top-layer app.&lt;/p&gt;

&lt;p&gt;As the infrastructure that makes those apps more usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;UI-TARS Desktop is interesting because it pushes AI beyond text and into direct computer interaction.&lt;/p&gt;

&lt;p&gt;Lynkr is interesting because it makes the model layer behind those interactions more portable, flexible, and cost-aware.&lt;/p&gt;

&lt;p&gt;Put them together, and the story is bigger than just "support another tool."&lt;/p&gt;

&lt;p&gt;It becomes a real argument for why &lt;strong&gt;desktop agents should not be locked to a single provider stack&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And honestly, that feels like the right direction for this whole ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;UI-TARS Desktop GitHub repo: &lt;a href="https://github.com/bytedance/UI-TARS-desktop" rel="noopener noreferrer"&gt;https://github.com/bytedance/UI-TARS-desktop&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;UI-TARS model repo: &lt;a href="https://github.com/bytedance/UI-TARS" rel="noopener noreferrer"&gt;https://github.com/bytedance/UI-TARS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent TARS quick start: &lt;a href="https://agent-tars.com/guide/get-started/quick-start.html" rel="noopener noreferrer"&gt;https://agent-tars.com/guide/get-started/quick-start.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Agent TARS introduction/docs: &lt;a href="https://agent-tars.com/guide/get-started/introduction.html" rel="noopener noreferrer"&gt;https://agent-tars.com/guide/get-started/introduction.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;UI-TARS Desktop quick start: &lt;a href="https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md" rel="noopener noreferrer"&gt;https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/quick-start.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;UI-TARS Desktop SDK docs: &lt;a href="https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/sdk.md" rel="noopener noreferrer"&gt;https://github.com/bytedance/UI-TARS-desktop/blob/main/docs/sdk.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr GitHub repo: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Lynkr docs: &lt;a href="https://fast-editor.github.io/Lynkr/" rel="noopener noreferrer"&gt;https://fast-editor.github.io/Lynkr/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>🐍 How to Use Open Interpreter for Free — With the Latest Models</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sun, 31 May 2026 06:54:52 +0000</pubDate>
      <link>https://dev.to/lynkr/how-to-use-open-interpreter-for-free-with-the-latest-models-3chp</link>
      <guid>https://dev.to/lynkr/how-to-use-open-interpreter-for-free-with-the-latest-models-3chp</guid>
      <description>&lt;h2&gt;
  
  
  The GPT-4 Code Interpreter You Can Actually Own — And Run for Free
&lt;/h2&gt;

&lt;p&gt;If you've ever used ChatGPT's Code Interpreter (now "Advanced Data Analysis"), you know the feeling: &lt;em&gt;"This is incredible... but why can't I run it locally? Why can't I install my own packages? Why do files disappear after 2 hours?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Open Interpreter&lt;/strong&gt; fixes all of that. It's the open-source version of what ChatGPT's Code Interpreter &lt;em&gt;should&lt;/em&gt; have been — and it runs on &lt;em&gt;your&lt;/em&gt; machine, with &lt;em&gt;your&lt;/em&gt; data, for &lt;em&gt;as long as you want&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But there's always been one painful trade-off:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud models&lt;/strong&gt; (GPT-4o, Claude Sonnet) → fast and smart, but costs add up fast&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local models&lt;/strong&gt; (Ollama, Qwen) → free, but slow and less capable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What if you could have &lt;strong&gt;both&lt;/strong&gt; — latest models, near-zero cost?&lt;/p&gt;

&lt;p&gt;That's what this guide covers. Let me show you how.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Open Interpreter?
&lt;/h2&gt;

&lt;p&gt;Open Interpreter (53k★ GitHub) gives LLMs a &lt;strong&gt;natural-language interface to your entire computer&lt;/strong&gt;. Install it with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;open-interpreter
interpreter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can say things like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Analyze this CSV, find outliers, build a dashboard, and email it to me."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And it will — writing Python, running shell commands, installing packages on the fly, and showing you the results, &lt;strong&gt;all in real time&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Makes It Special vs ChatGPT Code Interpreter
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;ChatGPT Code Interpreter&lt;/th&gt;
&lt;th&gt;Open Interpreter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Internet access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Full access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom packages&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ 300 pre-installed only&lt;/td&gt;
&lt;td&gt;✅ Any pip/npm/shell package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;File size limit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100 MB upload limit&lt;/td&gt;
&lt;td&gt;✅ Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime limit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2 minutes max&lt;/td&gt;
&lt;td&gt;✅ Unlimited — runs until done&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Your data stays local&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Uploaded to OpenAI&lt;/td&gt;
&lt;td&gt;✅ Everything runs on your machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model choice&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GPT-4o only&lt;/td&gt;
&lt;td&gt;✅ Any model — local or cloud&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Real Things You Can Do With Open Interpreter
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Analysis That Actually Finishes
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;interpreter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Download my last 6 months of Stripe transactions,
clean the data, find churn patterns, and build a retention dashboard&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It runs Python, Pandas, Plotly — no runtime limit, no upload cap. Your data never leaves your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Full System Automation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="s2"&gt;"Find all duplicate files over 100MB in ~/Downloads,
ask me before deleting each one, then log what I chose"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can browse directories, run bash, and ask for confirmation before destructive operations.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Multi-Step Research Pipelines
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scrape the top 10 HN posts about AI agents,
summarize each, then save a markdown report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Browser control + Python + file I/O — chained together in one conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Video/Photo Processing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extract audio from every .mp4 in this folder,
transcribe it with Whisper, then save transcripts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It installs &lt;code&gt;ffmpeg&lt;/code&gt;, &lt;code&gt;whisper&lt;/code&gt;, whatever it needs — no manual setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Free Models Are Slow, Paid Models Are Expensive
&lt;/h2&gt;

&lt;p&gt;Open Interpreter is &lt;strong&gt;token-hungry by nature&lt;/strong&gt;. Every multi-step task generates a long conversation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model proposes a plan → tokens&lt;/li&gt;
&lt;li&gt;It writes code → tokens&lt;/li&gt;
&lt;li&gt;The output comes back → tokens&lt;/li&gt;
&lt;li&gt;It iterates → more tokens&lt;/li&gt;
&lt;li&gt;It hits an error and fixes it → even more tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single analysis session can burn &lt;strong&gt;50,000–200,000 input tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option A: Use GPT-4o / Claude Sonnet Directly
&lt;/h3&gt;

&lt;p&gt;You get speed and quality — but at full retail price. A 30-minute session costs &lt;strong&gt;$1-3&lt;/strong&gt;. Do this daily and you're spending $60-90/month &lt;em&gt;on one tool&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option B: Run Locally With Ollama (The "Free" Way)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;interpreter &lt;span class="nt"&gt;--local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is truly free — but painfully slow. A local Qwen 2.5-Coder 14B takes &lt;strong&gt;15-30 seconds per response&lt;/strong&gt;. For Open Interpreter's interactive back-and-forth loop, that kills the flow.&lt;/p&gt;

&lt;p&gt;Worse: local models just can't handle complex multi-step tasks as reliably. The analysis I described earlier? It breaks down on a 14B model.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: Latest Models, Almost Free
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt; is an open-source LLM gateway that solves this exact problem. It lets you use the &lt;strong&gt;latest and best models&lt;/strong&gt; — DeepSeek V4, Claude Sonnet 4.5, Gemini 2.5 Pro, GPT-5.5 — while paying &lt;strong&gt;80-90% less&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Open Interpreter uses LiteLLM under the hood, so pointing it at Lynkr is trivial:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;interpreter &lt;span class="nt"&gt;--api_base&lt;/span&gt; &lt;span class="s2"&gt;"http://localhost:3000/v1"&lt;/span&gt; &lt;span class="nt"&gt;--api_key&lt;/span&gt; &lt;span class="s2"&gt;"anything"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Here's what Lynkr does behind the scenes.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Lynkr Makes Open Interpreter Free (Almost)
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Tier Routing: Smart Models for Smart Work
&lt;/h4&gt;

&lt;p&gt;Not every Open Interpreter step needs GPT-5.5. Listing files? Go to DeepSeek V3 (free). Writing a Python script? Use Sonnet 4.5 or GPT-5.5.&lt;/p&gt;

&lt;p&gt;Lynkr automatically routes each request to the &lt;strong&gt;cheapest capable model&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simple tasks&lt;/strong&gt; (ls, grep, file ops) → GPT-4o Mini / Gemini Flash / DeepSeek V3 ($0-0.15/M)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code generation&lt;/strong&gt; → DeepSeek V4 / Sonnet 4.5 ($1-3/M)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex reasoning&lt;/strong&gt; → GPT-5.5 / Opus 4.5 ($10-15/M — but only used when actually needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; That $2.40 naive GPT-4o session? Drops to &lt;strong&gt;$0.30-0.50&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Prompt Caching: Don't Pay Twice for the Same Work
&lt;/h4&gt;

&lt;p&gt;Open Interpreter repeats the same system context on every turn. Lynkr's &lt;strong&gt;Semantic Cache&lt;/strong&gt; detects repeated prompts and returns cached results.&lt;/p&gt;

&lt;p&gt;For batch operations like "process file X in folder Y" — where only the filename changes between calls — &lt;strong&gt;cache hit rate hits 60-70%&lt;/strong&gt;. That's real money staying in your pocket.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Local Fallback: Never Get Stuck
&lt;/h4&gt;

&lt;p&gt;Rate limited on OpenAI? Key expired? Lynkr &lt;strong&gt;automatically fails over&lt;/strong&gt; to Ollama or another working provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same config — just works&lt;/span&gt;
interpreter &lt;span class="nt"&gt;--api_base&lt;/span&gt; &lt;span class="s2"&gt;"http://localhost:3000/v1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No crashes, no context loss, no retyping your request.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. MCP Code Mode: Fewer Retries = Less Tokens
&lt;/h4&gt;

&lt;p&gt;Lynkr reformats code prompts to produce cleaner output. Fewer syntax errors → fewer retries → fewer tokens burnt on error recovery. Each retry avoided saves 3,000-10,000 tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before vs After: Real Cost Breakdown
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Session Type&lt;/th&gt;
&lt;th&gt;Naive GPT-4o&lt;/th&gt;
&lt;th&gt;Lynkr (Tier Routing + Cache)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1-hour data analysis&lt;/td&gt;
&lt;td&gt;~$2.40&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.35-0.60&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Batch file processing (100 files)&lt;/td&gt;
&lt;td&gt;~$3.50&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.12-0.30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-step research pipeline&lt;/td&gt;
&lt;td&gt;~$5.00&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$0.60-1.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Daily use for a month&lt;/td&gt;
&lt;td&gt;~$75-150&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$10-20&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's &lt;strong&gt;85-95% cheaper&lt;/strong&gt; — and you're using &lt;em&gt;better&lt;/em&gt; models than GPT-4o alone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setup: Open Interpreter + Lynkr in 3 Minutes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Install Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lynkr@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It auto-detects your setup, creates a config, and starts the proxy on port 3000.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Install Open Interpreter
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;open-interpreter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Point Open Interpreter to Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;interpreter &lt;span class="nt"&gt;--api_base&lt;/span&gt; &lt;span class="s2"&gt;"http://localhost:3000/v1"&lt;/span&gt; &lt;span class="nt"&gt;--api_key&lt;/span&gt; &lt;span class="s2"&gt;"anything"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Done.&lt;/strong&gt; Open Interpreter now routes through Lynkr — latest models, tiered routing, prompt caching, local fallback.&lt;/p&gt;




&lt;h2&gt;
  
  
  What About the Latest Models Specifically?
&lt;/h2&gt;

&lt;p&gt;Here's the models you can route through today with Lynkr + Open Interpreter:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Cost via Lynkr&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepSeek V4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code gen, multi-step reasoning&lt;/td&gt;
&lt;td&gt;~$0.50/M tokens (cheapest top-tier)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Balanced code + analysis&lt;/td&gt;
&lt;td&gt;~$3/M tokens (used sparingly via tier routing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex debugging, architecture&lt;/td&gt;
&lt;td&gt;~$15/M tokens (only for hard steps)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen 3-Coder 32B (local)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Freefall backup&lt;/td&gt;
&lt;td&gt;$0 (via Ollama)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini 2.5 Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast code, vision tasks&lt;/td&gt;
&lt;td&gt;~$1.25/M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPT-4o Mini / DeepSeek V3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple file ops&lt;/td&gt;
&lt;td&gt;$0-0.15/M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Lynkr picks the right one per step automatically. &lt;strong&gt;You don't think about it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Open Interpreter is the most underrated open-source AI tool of 2026.&lt;/strong&gt; It does what ChatGPT Code Interpreter &lt;em&gt;promised&lt;/em&gt; — but on your machine, with your data, at any scale.&lt;/p&gt;

&lt;p&gt;The old trade-off was: use GPT-4o and pay up, or use a local model and deal with the slowness.&lt;/p&gt;

&lt;p&gt;With Lynkr that trade-off is gone. Latest models. Intelligent routing. Local fallback. &lt;strong&gt;85-95% cost savings.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can run Open Interpreter for essentially free — with models that beat GPT-4o.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt; — the open-source LLM gateway that makes every AI tool cheaper. Drop a ⭐ if this helped.&lt;/em&gt; ⚡&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How I Cut Aider's Token Bill 80%: Prompt Caching, MCP Code Mode, and Tier Routing</title>
      <dc:creator>Lynkr</dc:creator>
      <pubDate>Sat, 30 May 2026 15:56:21 +0000</pubDate>
      <link>https://dev.to/lynkr/run-aider-on-ollama-bedrock-or-any-llm-provider-one-gateway-every-model-3jm4</link>
      <guid>https://dev.to/lynkr/run-aider-on-ollama-bedrock-or-any-llm-provider-one-gateway-every-model-3jm4</guid>
      <description>&lt;p&gt;Aider is the best terminal AI coding tool I've used. But by default it sends every diff through your OpenAI or Anthropic key, which gets expensive fast on real refactors — a single 100-file repo map can torch a few dollars before Aider even reads your prompt.&lt;/p&gt;

&lt;p&gt;This post shows how to run Aider against &lt;strong&gt;any LLM provider&lt;/strong&gt; — Ollama for free local runs, OpenRouter for mixed-provider routing, AWS Bedrock for the enterprise plate — through a single OpenAI-compatible endpoint, with &lt;strong&gt;prompt caching&lt;/strong&gt; and &lt;strong&gt;MCP Code Mode&lt;/strong&gt; layered on top to slash the bill further. I'll use &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;Lynkr&lt;/a&gt;, the self-hosted gateway I maintain.&lt;/p&gt;

&lt;p&gt;Full disclosure: I build Lynkr. I'm going to make the case for why the combination — gateway + caching + code-mode tools — is the real cost lever, not just "swap your provider."&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup in three commands
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Start the gateway&lt;/span&gt;
npx lynkr@latest

&lt;span class="c"&gt;# 2. Point Aider at it&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;any-value

&lt;span class="c"&gt;# 3. Run Aider with any model name Lynkr knows about&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Aider speaks the OpenAI Chat Completions protocol; Lynkr speaks it back and quietly translates the call to whichever upstream provider you've configured (Ollama, Bedrock, Anthropic, Azure, OpenRouter, Databricks, llama.cpp, LM Studio, ...). Aider has no idea it's talking to a router.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the money actually leaks in Aider
&lt;/h2&gt;

&lt;p&gt;Most "save money on AI coding" posts focus on swapping GPT-4o for a cheaper model. That's table stakes. The real spend in an Aider session breaks down roughly like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Call type&lt;/th&gt;
&lt;th&gt;Share of total tokens&lt;/th&gt;
&lt;th&gt;Where it goes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Repo map (system context, sent every turn)&lt;/td&gt;
&lt;td&gt;~50–60%&lt;/td&gt;
&lt;td&gt;Same prefix, every single request&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;File contents you've /add'd&lt;/td&gt;
&lt;td&gt;~20–30%&lt;/td&gt;
&lt;td&gt;Same prefix until you change the files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The actual diff / instruction&lt;/td&gt;
&lt;td&gt;~5–10%&lt;/td&gt;
&lt;td&gt;Genuinely new each turn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit messages, summarization&lt;/td&gt;
&lt;td&gt;~5%&lt;/td&gt;
&lt;td&gt;Cheap model anyway&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Look at that table. &lt;strong&gt;Most of your Aider bill is the same bytes being re-sent over and over.&lt;/strong&gt; Swapping models helps a little. Caching that repetitive prefix helps a lot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lever 1: Prompt caching — cuts the repeated-prefix tax
&lt;/h2&gt;

&lt;p&gt;Anthropic, Bedrock, Gemini, and OpenRouter all support prompt caching now, but Aider doesn't speak any of their cache-control protocols natively (it speaks one — OpenAI's — and only partially). Lynkr sits in the middle and injects &lt;code&gt;cache_control: ephemeral&lt;/code&gt; breakpoints on the right blocks before forwarding upstream.&lt;/p&gt;

&lt;p&gt;What that means in practice: the second Aider request in a session — same repo map, same /added files — only pays for the few hundred tokens of new instruction. Cached input tokens are &lt;strong&gt;10% the price of fresh input&lt;/strong&gt; on Anthropic, &lt;strong&gt;25%&lt;/strong&gt; on Bedrock, free for 5 minutes on Gemini.&lt;/p&gt;

&lt;p&gt;On a 4-hour Aider session against Claude Opus 4 or GPT-5, this single lever has cut my own input bill by &lt;strong&gt;~70%&lt;/strong&gt; before I even start tier-routing.&lt;/p&gt;

&lt;p&gt;Lynkr enables it automatically when the upstream provider supports it. No Aider config change.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env&lt;/span&gt;
&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
&lt;span class="nv"&gt;PROMPT_CACHE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;    &lt;span class="c"&gt;# default on, but explicit is good&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lever 2: MCP Code Mode — collapse N tool calls into 1
&lt;/h2&gt;

&lt;p&gt;Aider doesn't use tool calls itself (it parses code blocks from plain Markdown). But the moment you start composing Aider with other MCP tools — file search, web fetch, sandboxed execution — the round-trip cost explodes. Every tool call is a full request/response cycle through the LLM.&lt;/p&gt;

&lt;p&gt;Lynkr's &lt;strong&gt;MCP Code Mode&lt;/strong&gt; (borrowed from Cloudflare's pattern) flips this. Instead of advertising each MCP tool as a separate function the model can call, Lynkr exposes them as a small TypeScript API that the model writes a single program against. The program runs in a sandbox, hits all the tools it needs, and returns the result in one LLM round trip.&lt;/p&gt;

&lt;p&gt;Example: "find every file that imports &lt;code&gt;redis&lt;/code&gt;, check if any still use the v3 API, and print a migration TODO list."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool-call mode (default everywhere else):&lt;/strong&gt; 5 file_search calls + 12 file_read calls + 1 grep call = 18 round trips. Each round trip re-sends the conversation history.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP Code Mode (Lynkr):&lt;/strong&gt; model writes ~20 lines of TS using &lt;code&gt;mcp.fileSearch()&lt;/code&gt; and &lt;code&gt;mcp.fileRead()&lt;/code&gt;, executes once, returns the result.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For coding-heavy sessions where Aider is composed with other MCP tools, this is a 5–15x reduction in tokens spent on tool plumbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lever 3: Tier routing — match model to task
&lt;/h2&gt;

&lt;p&gt;Aider's &lt;a href="https://aider.chat/docs/leaderboards/" rel="noopener noreferrer"&gt;own polyglot leaderboard&lt;/a&gt; in May 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;% correct&lt;/th&gt;
&lt;th&gt;Copilot cost ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Opus 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;89.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 (high reasoning)&lt;/td&gt;
&lt;td&gt;88.0%&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;o3-pro (high)&lt;/td&gt;
&lt;td&gt;84.9%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Pro (32k think)&lt;/td&gt;
&lt;td&gt;83.1%&lt;/td&gt;
&lt;td&gt;1×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1×&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.1&lt;/td&gt;
&lt;td&gt;82.1%&lt;/td&gt;
&lt;td&gt;10×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok 4 (high)&lt;/td&gt;
&lt;td&gt;79.6%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.2 Reasoner&lt;/td&gt;
&lt;td&gt;74.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;73.5%&lt;/td&gt;
&lt;td&gt;0.33×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4o&lt;/td&gt;
&lt;td&gt;72.9%&lt;/td&gt;
&lt;td&gt;0×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.5 (no-think)&lt;/td&gt;
&lt;td&gt;70.7%&lt;/td&gt;
&lt;td&gt;3×&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.2 Chat&lt;/td&gt;
&lt;td&gt;70.2%&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two things actually worth knowing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Claude Sonnet 4.5 at 82.4% is the practical pick.&lt;/strong&gt; It's within 7 points of the absolute top at 1× Copilot pricing — i.e. one-third the cost of Opus 4.5 for ~92% of the capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V3.2 Reasoner at 74% is the budget workhorse.&lt;/strong&gt; Costs a fraction of any Claude tier, still beats GPT-4o on Aider's own bench.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You don't need Opus 4.5 to rename a variable. You need Sonnet 4.5 for almost everything, Opus 4.5 for the hardest 10% (multi-file architecture, refactor planning), and Haiku 4.5 or local Ollama for the trivial 30% (commit messages, repo map summarization).&lt;/p&gt;

&lt;p&gt;Lynkr's tier routing splits the work by prompt complexity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aider call type&lt;/th&gt;
&lt;th&gt;Routes to&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Repo map summarization, commit messages&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;qwen2.5-coder:7b&lt;/code&gt; (Ollama, local)&lt;/td&gt;
&lt;td&gt;Free, runs on your laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-file edits, small diffs&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-haiku-4.5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;73.5% on Aider, 0.33× Copilot cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Default coding workhorse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;claude-sonnet-4.5&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82.4% on Aider, 1× Copilot cost&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardest 10% — architecture, multi-file refactor&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;claude-opus-4.5&lt;/code&gt; or &lt;code&gt;gpt-5&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Used sparingly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env additions&lt;/span&gt;
&lt;span class="nv"&gt;TIER_SIMPLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama:qwen2.5-coder:7b
&lt;span class="nv"&gt;TIER_MEDIUM&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic:claude-haiku-4-5
&lt;span class="nv"&gt;TIER_COMPLEX&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic:claude-sonnet-4-5
&lt;span class="nv"&gt;TIER_REASONING&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic:claude-opus-4-5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point Aider at &lt;code&gt;--model lynkr-auto&lt;/code&gt; and Lynkr scores each prompt before picking the tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stacking the three levers
&lt;/h2&gt;

&lt;p&gt;Each lever on its own is meaningful. Stacked, they compound:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Caching alone:&lt;/strong&gt; ~70% input-token cut on a stable session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;+ Tier routing:&lt;/strong&gt; another ~40% by pushing routine calls to Flash/Ollama&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;+ MCP Code Mode&lt;/strong&gt; (if you compose with other MCP tools): another 5–15x on tool-plumbing tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In my own Aider workflow — heavy refactors against a 200k-LOC monorepo — this combination has dropped a session that used to cost ~$8 in Claude calls down to under $1.50. Not because Claude got cheaper. Because most of the work is now happening on cached prefixes, free local models, or in-sandbox code execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration walkthrough
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1 — Install and start Lynkr
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx lynkr@latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First run creates a &lt;code&gt;.env&lt;/code&gt; file. Minimal config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
&lt;span class="nv"&gt;PROMPT_CACHE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true
&lt;/span&gt;&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For full local + free:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;MODEL_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nv"&gt;OLLAMA_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434
&lt;span class="nv"&gt;OLLAMA_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qwen2.5-coder:latest
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8081
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then &lt;code&gt;ollama pull qwen2.5-coder:latest&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 — Point Aider at the gateway
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_BASE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:8081/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;dummy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop those in your shell rc file.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Pick a model (or let Lynkr pick)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Direct pass-through&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-5

&lt;span class="c"&gt;# Or let Lynkr tier-route&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; lynkr-auto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4 — Verify
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; http://localhost:8081/v1/models | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool | &lt;span class="nb"&gt;head&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Lynkr with &lt;code&gt;LOG_LEVEL=info&lt;/code&gt; and watch the cache-hit lines on your second Aider request — that's where the savings show up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aider-specific gotchas
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Weak model for commits / summarization.&lt;/strong&gt; Aider uses a cheaper model for non-code calls; default is &lt;code&gt;gpt-4o-mini&lt;/code&gt;. Override to a free local one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--model&lt;/span&gt; openai/gpt-4o &lt;span class="nt"&gt;--weak-model&lt;/span&gt; ollama/qwen2.5-coder:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Long context.&lt;/strong&gt; Local Ollama models will OOM on 200k+ token repo maps. Either set &lt;code&gt;--map-tokens 0&lt;/code&gt;, or route long-context calls to Gemini Flash 1M-token contexts via the &lt;code&gt;TIER_REASONING&lt;/code&gt; line above.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming.&lt;/strong&gt; Aider expects streaming responses. Lynkr streams by default. If you're on a non-streaming Databricks endpoint, set &lt;code&gt;STREAM_PASSTHROUGH=false&lt;/code&gt; and Lynkr buffers + simulates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache hit rate.&lt;/strong&gt; Prompt caching only fires when the prefix is byte-identical across requests. If your repo map changes (you edit a /added file), the cache for that block invalidates and rebuilds. Lynkr logs cache-hit ratios per session — watch them; if hit rate is below 60% something in your workflow is busting the prefix.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickref
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aider env var&lt;/th&gt;
&lt;th&gt;Lynkr role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENAI_API_BASE=http://localhost:8081/v1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Where Lynkr listens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENAI_API_KEY=dummy&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Required by Aider, ignored by Lynkr&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--model claude-sonnet-4-5&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Forwarded as-is to the configured upstream&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--model lynkr-auto&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Triggers Lynkr's complexity-based tier routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--weak-model ollama/qwen2.5-coder:7b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Free local model for commit messages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;The default Aider setup pays full price for the same repo-map bytes on every turn. The fix isn't "use a cheaper model" — it's:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cache the repetitive prefix&lt;/strong&gt; (prompt caching).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collapse tool plumbing into one call&lt;/strong&gt; (MCP Code Mode).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Match model size to task complexity&lt;/strong&gt; (tier routing).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Stacked, those three levers have taken my Aider sessions from ~$8 to ~$1.50 without changing how I work. Lynkr is one gateway that does all three; it's Apache 2.0, single Node binary, drop-in OpenAI base URL.&lt;/p&gt;

&lt;p&gt;Aider's GitHub: &lt;a href="https://github.com/Aider-AI/aider" rel="noopener noreferrer"&gt;https://github.com/Aider-AI/aider&lt;/a&gt;&lt;br&gt;
Lynkr's GitHub: &lt;a href="https://github.com/Fast-Editor/Lynkr" rel="noopener noreferrer"&gt;https://github.com/Fast-Editor/Lynkr&lt;/a&gt; — star to follow next integration writeups (OpenHands, Vercel AI SDK, Open Interpreter queued).&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
