<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Devansh</title>
    <description>The latest articles on DEV Community by Devansh (@devansh365).</description>
    <link>https://dev.to/devansh365</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F679755%2F9dc6ebfe-a1d9-4613-8192-f2854324ea75.png</url>
      <title>DEV Community: Devansh</title>
      <link>https://dev.to/devansh365</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devansh365"/>
    <language>en</language>
    <item>
      <title>I built an OpenAI-compatible gateway that routes across 5 free LLM providers</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:22:07 +0000</pubDate>
      <link>https://dev.to/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</link>
      <guid>https://dev.to/devansh365/i-built-an-openai-compatible-gateway-that-routes-across-5-free-llm-providers-6jo</guid>
      <description>&lt;p&gt;Every LLM provider has a free tier.&lt;/p&gt;

&lt;p&gt;Groq gives you 30 requests per minute. Gemini gives you 15. Cerebras gives you 30. Mistral gives you 5.&lt;/p&gt;

&lt;p&gt;Combined, that's about 80 requests per minute. Enough for prototyping, internal tools, and side projects where you don't want to pay for API access yet.&lt;/p&gt;

&lt;p&gt;The problem: each provider has its own SDK, its own rate limits, its own auth, and its own downtime. You end up writing provider-switching logic, catching 429 errors, and managing API keys across five different dashboards.&lt;/p&gt;

&lt;p&gt;I got tired of this while building &lt;a href="https://trymetis.app" rel="noopener noreferrer"&gt;Metis&lt;/a&gt;, an AI stock analysis tool. Kept hitting Groq's limits while Gemini had capacity sitting idle. So I built FreeLLM.&lt;/p&gt;

&lt;h2&gt;
  
  
  What FreeLLM does
&lt;/h2&gt;

&lt;p&gt;One endpoint. Five providers. Twenty models. All free.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:3000/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model": "free-fast", "messages": [{"role": "user", "content": "Hello!"}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your existing OpenAI SDK code works. Just change the base URL. That's the whole migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the routing works
&lt;/h2&gt;

&lt;p&gt;When a request comes in, FreeLLM:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Checks which providers are healthy (circuit breakers track this automatically)&lt;/li&gt;
&lt;li&gt;Picks the best available provider based on your model choice&lt;/li&gt;
&lt;li&gt;If that provider returns a 429 or fails, it tries the next one&lt;/li&gt;
&lt;li&gt;You get a response&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three meta-models handle routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;free-fast   → lowest latency (usually Groq or Cerebras)
free-smart  → most capable model (usually Gemini 2.5)
free        → maximum availability across all providers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Providers and their free tiers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Free Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B, Llama 4 Scout, Qwen3 32B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;2.5 Flash, 2.5 Pro, 2.0 Flash&lt;/td&gt;
&lt;td&gt;~15 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cerebras&lt;/td&gt;
&lt;td&gt;Llama 3.1 8B, Qwen3 235B, GPT-OSS 120B&lt;/td&gt;
&lt;td&gt;~30 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral&lt;/td&gt;
&lt;td&gt;Small, Medium, Nemo&lt;/td&gt;
&lt;td&gt;~5 req/min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;Any local model&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdwx65ruk33vv4zqkl0q5.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's under the hood
&lt;/h2&gt;

&lt;p&gt;This isn't a simple round-robin proxy. The routing layer handles real production concerns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sliding-window rate limiter.&lt;/strong&gt; Each provider's limits are tracked independently. FreeLLM knows how many requests you've sent to Groq in the last 60 seconds and won't send another if you're near the cap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Circuit breakers.&lt;/strong&gt; If Gemini starts returning 500s, FreeLLM pulls it from rotation. Every 30 seconds, it sends a test request. When the provider recovers, it goes back in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-client rate limiting.&lt;/strong&gt; If you expose this to a team, each client gets their own limit. Admin auth protects the config endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zod validation.&lt;/strong&gt; Every request is validated before it hits any provider. Bad payloads fail fast with clear error messages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-time dashboard.&lt;/strong&gt; React frontend showing provider health, request logs, and latency. You can see which providers are healthy at a glance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it running in 30 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/devansh-365/freellm.git
&lt;span class="nb"&gt;cd &lt;/span&gt;freellm
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env   &lt;span class="c"&gt;# add your free API keys&lt;/span&gt;
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API on &lt;code&gt;localhost:3000&lt;/code&gt;. Dashboard on &lt;code&gt;localhost:3000/dashboard&lt;/code&gt;. Done.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using it with the OpenAI SDK
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;http://localhost:3000/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;not-needed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;free-fast&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain circuit breakers in 2 sentences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No new SDK to learn. No migration effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I was building Metis and kept running into the same pattern: burn through Groq's free tier in 20 minutes of testing, switch to Gemini manually, hit their limit, switch to Mistral. Repeat.&lt;/p&gt;

&lt;p&gt;Wrote a quick proxy to automate the switching. Added failover because providers go down randomly. Added circuit breakers because I didn't want to wait for timeouts. Added a dashboard because I wanted to see what was happening.&lt;/p&gt;

&lt;p&gt;It grew into a proper tool. Open-sourced it because every developer prototyping with LLMs has this exact problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;p&gt;TypeScript, Express 5, React 19, Zod, Docker. MIT licensed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/devansh-365/freellm" rel="noopener noreferrer"&gt;github.com/devansh-365/freellm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>typescript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>react native animation</title>
      <dc:creator>Devansh</dc:creator>
      <pubDate>Wed, 10 Sep 2025 16:11:18 +0000</pubDate>
      <link>https://dev.to/devansh365/react-native-animation-46f2</link>
      <guid>https://dev.to/devansh365/react-native-animation-46f2</guid>
      <description></description>
      <category>reactnative</category>
      <category>react</category>
      <category>animation</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
