<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ad Man</title>
    <description>The latest articles on DEV Community by Ad Man (@ad_man_cf946186dc71743c9b).</description>
    <link>https://dev.to/ad_man_cf946186dc71743c9b</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3934562%2F47fb34e9-ec85-4bd6-ab9c-3ba5f8f4c16e.png</url>
      <title>DEV Community: Ad Man</title>
      <link>https://dev.to/ad_man_cf946186dc71743c9b</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ad_man_cf946186dc71743c9b"/>
    <language>en</language>
    <item>
      <title>I Stopped Paying GPT-4 for Simple Queries — Here's the Router I Built</title>
      <dc:creator>Ad Man</dc:creator>
      <pubDate>Sun, 17 May 2026 07:52:30 +0000</pubDate>
      <link>https://dev.to/ad_man_cf946186dc71743c9b/i-stopped-paying-gpt-4-for-simple-queries-heres-the-router-i-built-2f78</link>
      <guid>https://dev.to/ad_man_cf946186dc71743c9b/i-stopped-paying-gpt-4-for-simple-queries-heres-the-router-i-built-2f78</guid>
      <description>&lt;h1&gt;
  
  
  I Stopped Paying GPT-4 for Simple Queries
&lt;/h1&gt;

&lt;p&gt;You know that feeling when you send "What's 2+2?" to GPT-4 and watch $0.03 vanish?&lt;/p&gt;

&lt;p&gt;I was burning through my OpenAI budget like it was 2021 crypto. So I built something that fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Most AI apps use &lt;strong&gt;one model for everything&lt;/strong&gt;. GPT-4 for simple lookups. Claude for code reviews. Gemini for translations.&lt;/p&gt;

&lt;p&gt;That's like using a Ferrari to deliver pizza.&lt;/p&gt;

&lt;p&gt;I tracked my LLM spending for 2 weeks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;% of Queries&lt;/th&gt;
&lt;th&gt;Model Used&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Optimal Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;Groq (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code review&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;Claude Sonnet ($0.015)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creative writing&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;GPT-4o-mini ($0.00015)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex analysis&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;GPT-4 ($0.03)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;80% of my queries didn't need GPT-4.&lt;/strong&gt; But I was paying for it anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: A3M Router
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;A&lt;/strong&gt;daptive &lt;strong&gt;M&lt;/strong&gt;emory &lt;strong&gt;M&lt;/strong&gt;ulti-&lt;strong&gt;M&lt;/strong&gt;odel Router — an open-source LLM router that automatically picks the right model for each query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createA3MRouter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adaptive-memory-multi-model-router&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Learns from your patterns&lt;/span&gt;
  &lt;span class="na"&gt;costBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;     &lt;span class="c1"&gt;// Max cost per request&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Simple query → fast + cheap model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cheap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What is 2+2?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;qa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// → Provider: groq, Cost: $0.00000, Latency: 89ms&lt;/span&gt;

&lt;span class="c1"&gt;// Complex query → best model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;complex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Debug this 10k line Python codebase&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;coding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;python&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// → Provider: openai, Cost: $0.003, Latency: 1200ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The router &lt;strong&gt;learns&lt;/strong&gt;. After a few hundred requests, it knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple Q&amp;amp;A → Groq/Cerebras (free, fast)&lt;/li&gt;
&lt;li&gt;Code review → Claude/GPT-4 (best quality)&lt;/li&gt;
&lt;li&gt;Summarization → GPT-4o-mini (cheap, good enough)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query → Memory Tree → RouteLLM Scoring → Provider Selection → Response
              ↑                                         ↓
         Learns from                            Records result
         past queries                           in memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three research-backed techniques make it work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RouteLLM&lt;/strong&gt; (arXiv:2404.06035) — Cost-quality routing that balances price vs performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RadixAttention&lt;/strong&gt; (arXiv:2312.07104) — Prefix caching for repeated prompt patterns (5-10x speedup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Compression&lt;/strong&gt; (arXiv:2403.12968) — Compresses context before sending (20-40% fewer tokens)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Real Numbers
&lt;/h2&gt;

&lt;p&gt;I ran A3M Router for 2 weeks on my production workload:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Daily cost&lt;/td&gt;
&lt;td&gt;$12.40&lt;/td&gt;
&lt;td&gt;$7.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg latency&lt;/td&gt;
&lt;td&gt;1800ms&lt;/td&gt;
&lt;td&gt;650ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries/day&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failed requests&lt;/td&gt;
&lt;td&gt;23/day&lt;/td&gt;
&lt;td&gt;0.4/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency improvement surprised me most. When you route simple queries to Groq (which runs Llama at 800 tok/s), the average drops dramatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  14 Providers, 116 Integrations
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// All providers available out of the box&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// GPT-4o, GPT-4o-mini&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// Claude 3.5 Sonnet&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;groq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Llama-3.3-70B (fastest)&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cerebras&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// Llama-3.3-70B (ultra-fast)&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// Gemini Pro/Flash&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// Coding/Math specialist&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ollama&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// Local models (free)&lt;/span&gt;
    &lt;span class="c1"&gt;// ... 7 more&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus 116 integrations for GitHub, Slack, Stripe, Pinecone, Notion, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;adaptive-memory-multi-model-router
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createA3MRouter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adaptive-memory-multi-model-router&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Your prompt here&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// The response&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// Which model was chosen&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// How much it cost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx a3m-router route &lt;span class="s2"&gt;"Explain quantum computing"&lt;/span&gt;
npx a3m-router parallel &lt;span class="s2"&gt;"task1"&lt;/span&gt; &lt;span class="s2"&gt;"task2"&lt;/span&gt; &lt;span class="s2"&gt;"task3"&lt;/span&gt;
npx a3m-router cost  &lt;span class="c"&gt;# Show cost tracking&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Honest Limitations
&lt;/h2&gt;

&lt;p&gt;A3M Router isn't perfect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory is local&lt;/strong&gt; — no distributed memory sharing between instances yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First requests are slower&lt;/strong&gt; — it needs ~50 queries to learn your patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No streaming support&lt;/strong&gt; — working on it for v2.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression is lossy&lt;/strong&gt; — ~80% reduction but some nuance gets lost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm actively working on all of these. PRs welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v2.0&lt;/strong&gt;: Streaming support, distributed memory, WebSocket provider updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark dashboard&lt;/strong&gt;: Real-time cost/latency/quality tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain integration&lt;/strong&gt;: Drop-in replacement for their router&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📦 &lt;a href="https://www.npmjs.com/package/adaptive-memory-multi-model-router" rel="noopener noreferrer"&gt;npm: adaptive-memory-multi-model-router&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;a href="https://github.com/Das-rebel/adaptive-memory-multi-model-router" rel="noopener noreferrer"&gt;GitHub: Das-rebel/adaptive-memory-multi-model-router&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;⭐ Star the repo if this is useful!&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What's your LLM routing strategy? Do you manually pick models, or do you use something automated? I'd love to hear what others are doing — drop a comment below.&lt;/em&gt; 👇&lt;/p&gt;

</description>
      <category>programming</category>
    </item>
    <item>
      <title>I Stopped Paying GPT-4 for Simple Queries — Here's the Router I Built</title>
      <dc:creator>Ad Man</dc:creator>
      <pubDate>Sun, 17 May 2026 07:15:00 +0000</pubDate>
      <link>https://dev.to/ad_man_cf946186dc71743c9b/i-stopped-paying-gpt-4-for-simple-queries-heres-the-router-i-built-36d5</link>
      <guid>https://dev.to/ad_man_cf946186dc71743c9b/i-stopped-paying-gpt-4-for-simple-queries-heres-the-router-i-built-36d5</guid>
      <description>&lt;h1&gt;
  
  
  I Stopped Paying GPT-4 for Simple Queries
&lt;/h1&gt;

&lt;p&gt;You know that feeling when you send "What's 2+2?" to GPT-4 and watch $0.03 vanish?&lt;/p&gt;

&lt;p&gt;I was burning through my OpenAI budget like it was 2021 crypto. So I built something that fixed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;Most AI apps use &lt;strong&gt;one model for everything&lt;/strong&gt;. GPT-4 for simple lookups. Claude for code reviews. Gemini for translations.&lt;/p&gt;

&lt;p&gt;That's like using a Ferrari to deliver pizza.&lt;/p&gt;

&lt;p&gt;I tracked my LLM spending for 2 weeks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query Type&lt;/th&gt;
&lt;th&gt;% of Queries&lt;/th&gt;
&lt;th&gt;Model Used&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Optimal Model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;45%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;Groq (free)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code review&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;Claude Sonnet ($0.015)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Creative writing&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;GPT-4o-mini ($0.00015)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex analysis&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;GPT-4&lt;/td&gt;
&lt;td&gt;$0.03/req&lt;/td&gt;
&lt;td&gt;GPT-4 ($0.03)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;80% of my queries didn't need GPT-4.&lt;/strong&gt; But I was paying for it anyway.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: A3M Router
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;A&lt;/strong&gt;daptive &lt;strong&gt;M&lt;/strong&gt;emory &lt;strong&gt;M&lt;/strong&gt;ulti-&lt;strong&gt;M&lt;/strong&gt;odel Router — an open-source LLM router that automatically picks the right model for each query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createA3MRouter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adaptive-memory-multi-model-router&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Learns from your patterns&lt;/span&gt;
  &lt;span class="na"&gt;costBudget&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;     &lt;span class="c1"&gt;// Max cost per request&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Simple query → fast + cheap model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cheap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;What is 2+2?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;qa&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// → Provider: groq, Cost: $0.00000, Latency: 89ms&lt;/span&gt;

&lt;span class="c1"&gt;// Complex query → best model&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;complex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Debug this 10k line Python codebase&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;coding&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;python&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// → Provider: openai, Cost: $0.003, Latency: 1200ms&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The router &lt;strong&gt;learns&lt;/strong&gt;. After a few hundred requests, it knows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple Q&amp;amp;A → Groq/Cerebras (free, fast)&lt;/li&gt;
&lt;li&gt;Code review → Claude/GPT-4 (best quality)&lt;/li&gt;
&lt;li&gt;Summarization → GPT-4o-mini (cheap, good enough)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query → Memory Tree → RouteLLM Scoring → Provider Selection → Response
              ↑                                         ↓
         Learns from                            Records result
         past queries                           in memory
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three research-backed techniques make it work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;RouteLLM&lt;/strong&gt; (arXiv:2404.06035) — Cost-quality routing that balances price vs performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RadixAttention&lt;/strong&gt; (arXiv:2312.07104) — Prefix caching for repeated prompt patterns (5-10x speedup)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Compression&lt;/strong&gt; (arXiv:2403.12968) — Compresses context before sending (20-40% fewer tokens)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Real Numbers
&lt;/h2&gt;

&lt;p&gt;I ran A3M Router for 2 weeks on my production workload:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Daily cost&lt;/td&gt;
&lt;td&gt;$12.40&lt;/td&gt;
&lt;td&gt;$7.20&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;42% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg latency&lt;/td&gt;
&lt;td&gt;1800ms&lt;/td&gt;
&lt;td&gt;650ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;64% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Queries/day&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;~400&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Failed requests&lt;/td&gt;
&lt;td&gt;23/day&lt;/td&gt;
&lt;td&gt;0.4/day&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98% reduction&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The latency improvement surprised me most. When you route simple queries to Groq (which runs Llama at 800 tok/s), the average drops dramatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  14 Providers, 116 Integrations
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// All providers available out of the box&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// GPT-4o, GPT-4o-mini&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// Claude 3.5 Sonnet&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;groq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;        &lt;span class="c1"&gt;// Llama-3.3-70B (fastest)&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cerebras&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// Llama-3.3-70B (ultra-fast)&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;google&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// Gemini Pro/Flash&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;deepseek&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// Coding/Math specialist&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ollama&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;// Local models (free)&lt;/span&gt;
    &lt;span class="c1"&gt;// ... 7 more&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Plus 116 integrations for GitHub, Slack, Stripe, Pinecone, Notion, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;adaptive-memory-multi-model-router
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createA3MRouter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;adaptive-memory-multi-model-router&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createA3MRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Your prompt here&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// The response&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;    &lt;span class="c1"&gt;// Which model was chosen&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// How much it cost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx a3m-router route &lt;span class="s2"&gt;"Explain quantum computing"&lt;/span&gt;
npx a3m-router parallel &lt;span class="s2"&gt;"task1"&lt;/span&gt; &lt;span class="s2"&gt;"task2"&lt;/span&gt; &lt;span class="s2"&gt;"task3"&lt;/span&gt;
npx a3m-router cost  &lt;span class="c"&gt;# Show cost tracking&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Honest Limitations
&lt;/h2&gt;

&lt;p&gt;A3M Router isn't perfect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory is local&lt;/strong&gt; — no distributed memory sharing between instances yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;First requests are slower&lt;/strong&gt; — it needs ~50 queries to learn your patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No streaming support&lt;/strong&gt; — working on it for v2.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression is lossy&lt;/strong&gt; — ~80% reduction but some nuance gets lost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm actively working on all of these. PRs welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;v2.0&lt;/strong&gt;: Streaming support, distributed memory, WebSocket provider updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark dashboard&lt;/strong&gt;: Real-time cost/latency/quality tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangChain integration&lt;/strong&gt;: Drop-in replacement for their router&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📦 &lt;a href="https://www.npmjs.com/package/adaptive-memory-multi-model-router" rel="noopener noreferrer"&gt;npm: adaptive-memory-multi-model-router&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;a href="https://github.com/Das-rebel/adaptive-memory-multi-model-router" rel="noopener noreferrer"&gt;GitHub: Das-rebel/adaptive-memory-multi-model-router&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;⭐ Star the repo if this is useful!&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;What's your LLM routing strategy? Do you manually pick models, or do you use something automated? I'd love to hear what others are doing — drop a comment below.&lt;/em&gt; 👇&lt;/p&gt;

</description>
      <category>programming</category>
    </item>
  </channel>
</rss>
