The Problem Nobody Talks About
Every major AI lab now offers a free tier. Gemini, Groq, Mistral, Cerebras — they all give you a few million tokens a month, a few thousand requests a day.
On paper, that's generous. In practice, you end up juggling 14 different SDKs, 14 rate limits, and 14 places a request can silently fail.
FreeLLMAPI solves exactly that.
What It Does
It's a self-hosted proxy that aggregates free tiers from 14 providers behind a single /v1/chat/completions endpoint — fully compatible with the OpenAI SDK.
Supported providers:
| Provider | Notable Models |
|---|---|
| Google Gemini | 2.5 Pro / Flash |
| Groq | Llama 4, Qwen, Kimi |
| Cerebras | Llama 3.3, Qwen |
| SambaNova | Llama 3.3 70B |
| NVIDIA NIM | Full catalog |
| Mistral | La Plateforme |
| OpenRouter | Free-tier models |
| GitHub Models | GPT-4o, Llama, Phi |
| Hugging Face | Inference Providers |
| Cloudflare | Workers AI |
| Zhipu | GLM-4 series |
| Moonshot | Kimi |
| MiniMax | abab / hailuo |
Combined: roughly ~800M tokens/month across all providers.
Zero Code Changes
Point your existing OpenAI SDK at localhost:3001/v1:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key",
)
resp = client.chat.completions.create(
model="auto", # router picks the best available
messages=[{"role": "user", "content": "Summarise the fall of Rome in one sentence."}],
)
print(resp.choices[0].message.content)
print("Routed via:", resp.headers.get("x-routed-via"))
That's it. Every response includes an X-Routed-Via header so you know which provider actually served the request.
Technical Highlights
Automatic failover — On 429 / timeout / 5xx, the router cools down the key and retries the next provider in your chain, up to 20 attempts.
Sticky sessions — Multi-turn conversations stay on the same model for 30 minutes. This matters more than it sounds — switching models mid-conversation causes subtle hallucination spikes.
Per-key rate tracking — RPM, RPD, TPM, and TPD counters per (platform, model, key). The router always picks a key that's under its caps.
Encrypted key storage — AES-256-GCM before hitting SQLite. Upstream provider keys never leave your machine.
Admin dashboard — React + Vite UI to manage keys, reorder the fallback chain, inspect analytics, and test prompts in a playground.
Lightweight — Runs on a Raspberry Pi 4 at ~40MB RAM idle.
Setup in 3 Lines
git clone https://github.com/tashfeenahmed/freellmapi
cd freellmapi && npm install
cp .env.example .env && npm run dev
Open localhost:5173, add your provider API keys, grab your unified key → done.
The Honest Part
A few things the README says clearly, and you should know upfront:
Intelligence degrades throughout the day. Gemini 2.5 Pro and GPT-4o (via GitHub Models) have the lowest daily caps. Once they're exhausted, the router falls back to smaller models. Expect effective quality to drop in the late hours — then reset at UTC midnight.
Tool calling and vision are not yet supported. Text-only for now. PRs are welcome.
Latency is unpredictable. Cerebras and Groq are extremely fast. Others are not. You get whichever one is available.
Personal use only. No multi-tenant auth. Don't expose this to the internet.
Free tiers change without notice. When a provider tightens limits, you'll see 429s until the catalog is updated.
Who This Is For
✅ Building AI agents or coding assistants and want to prototype without spending money upfront
✅ Researchers and students who hit rate limits on one provider and want seamless fallback
✅ Anyone tired of maintaining multiple SDK integrations
❌ Production workloads — use a paid API with an SLA
Quick ToS Note
The project includes a detailed review of each provider's terms. Most are fine for single-user personal use. Notable exceptions: Cohere's trial ToS explicitly forbids personal/household use, and NVIDIA NIM's free tier is scoped to evaluation only.
Read the full table in the README before adding keys.
FreeLLMAPI is MIT licensed and actively welcoming contributors — especially for adding embeddings, tool calling, and new providers.
Top comments (0)