Save money on AI using those permanent free LLM APIs

#ai #api #llm

Those LLM APIs offer permanent free tiers for text inference (no trial or initial credits, permanent tier only).

Provider APIs
Inference providers

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere 🇺🇸 - Command A, Command R+, Aya Expanse 32B +9 more. 20 RPM, 1K/mo.
Google Gemini 🇺🇸 - Gemini 2.5 Pro, Flash, Flash-Lite +4 more. 5-15 RPM, 100-1K RPD.
Mistral AI 🇪🇺 - Mistral Large 3, Small 3.1, Ministral 8B +3 more. 1 req/s, 1B tok/mo.
Zhipu AI 🇨🇳 - GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash. Limits undocumented.

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras 🇺🇸 - Llama 3.3 70B, Qwen3 235B, GPT-OSS-120B +3 more. 30 RPM, 14,400 RPD.
Cloudflare Workers AI 🇺🇸 - Llama 3.3 70B, Qwen QwQ 32B +47 more. 10K neurons/day.
GitHub Models 🇺🇸 - GPT-4o, Llama 3.3 70B, DeepSeek-R1 +more. 10-15 RPM, 50-150 RPD.
Groq 🇺🇸 - Llama 3.3 70B, Llama 4 Scout, Kimi K2 +17 more. 30 RPM, 1K RPD (14,400 for Llama 3.1 8B).
Hugging Face 🇺🇸 - Llama 3.3 70B, Qwen2.5 72B, Mistral 7B +many more. $0.10/mo in free credits.
Kluster AI 🇺🇸 - DeepSeek-R1, Llama 4 Maverick, Qwen3-235B +2 more. Limits undocumented.
LLM7.io 🇬🇧 - DeepSeek R1, Flash-Lite, Qwen2.5 Coder +27 more. 30 RPM (120 with token).
NVIDIA NIM 🇺🇸 - Llama 3.3 70B, Mistral Large, Qwen3 235B +more. 40 RPM.
Ollama Cloud 🇺🇸 - DeepSeek-V3.2, Qwen3.5, Kimi-K2.5 +17 more. 1 concurrent model, light usage.
OpenRouter 🇺🇸 - DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B +29 more. 20 RPM, 50 RPD (1K with $10+ in purchased credits).
SiliconFlow 🇨🇳 - Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GLM-4.1V-9B-Thinking +10 more. 1K RPM, 50K TPM.

Contributing

This list changes fast. Star the GitHub repo to get notified when we add providers, and open a PR if you spot one we missed.

Cheers!

Top comments (1)

Botánica Andina • Mar 28

This is a fantastic breakdown! I'm particularly interested in the Inference providers section – it's awesome to see so many open-weight models accessible for free. Makes experimenting with different architectures much more feasible without initial budget concerns.

DEV Community

Save money on AI using those permanent free LLM APIs

Contents

Provider APIs

Inference providers

Contributing

Top comments (1)