DEV Community

Cover image for 9 Free LLM APIs in 2026 You Can Use Without a Credit Card
Alex Spinov
Alex Spinov

Posted on • Originally published at blog.spinov.online

9 Free LLM APIs in 2026 You Can Use Without a Credit Card

A year ago I rounded up 11 free AI APIs you could use without paying OpenAI. It became the most-read thing I've published. So this is the 2026 re-run, with a stricter filter and a different question.

Not "is there a free tier?" Everyone has a free tier now. The real question, the one that decides whether you can actually prototype on a Saturday without reaching for a card: does it sign you up with zero payment details, and does it survive a real job?

I checked nine of them by reading the provider docs line by line on 2026-05-30 and pinging every endpoint. Two things shifted since last year. Mistral stopped asking for a card and started asking for a phone number. And Google quietly tightened the Gemini free tier — the old 1,500-requests-a-day Flash number people still quote is not what you get on 2.5 Flash today. Details below, with the receipts.

A word on where I'm coming from, because it matters for one of the columns. I run data-extraction pipelines for a living: 32 published scrapers, 2,190 production runs lifetime across them, the busiest one (a Trustpilot review scraper) sitting at 962 runs (apify.com/knotless_cadence, raw lifetime counter as of May 2026). That's not an LLM benchmark. It's HTTP scraping. But it taught me one thing that every "best free LLM API" listicle skips: a published rate limit and your real throughput are different animals. More on that in the limitations section, honestly labeled.

TL;DR

  • Want speed? Groq and Cerebras. Both OpenAI-compatible, both card-free.
  • Want a big context window for stuffing a whole HTML page in? Gemini Flash — 1M tokens, still the only free one at that size.
  • Want many models behind one key? OpenRouter. Its :free suffix gives you a pile of models, but only 50 requests/day until you add credits.
  • Worried about your data? Read the §"What I won't pretend" section. Most free tiers may train on your prompts. Gemini's free tier says so out loud.
  • The full comparison table is below. Bookmark that, skip the prose if you're in a hurry.

How I checked this

No vibes. For each provider I did three things on 2026-05-30:

  1. Opened the official rate-limits / pricing page and copied the exact number — not a number from someone's blog.
  2. Confirmed the OpenAI-compatible base URL is reachable (every one returned a live HTTP status — 401/403 without a key counts as alive).
  3. Checked the signup flow's stated requirement: card, phone, or nothing.

Every number in the table links to the doc page it came from. If a provider had moved to requiring a card, it was getting cut and replaced. None of the nine did, though two now want a phone number or a GitHub account instead, which I've flagged.

One caveat I can't wave away: free tiers change. This is a snapshot dated 2026-05-30, not a contract. Check the provider's own page before you build anything on top of it.

The table

Provider Free limit Credit card? OpenAI-compatible? Best for
Google Gemini (AI Studio) ~10 RPM, ~250 RPD on 2.5 Flash; 250K tokens/min; 1M-token context No Partial (/v1beta/openai/) Long inputs — whole pages, whole docs
Groq 30 RPM, 14,400 RPD, 500K tokens/day (llama-3.1-8b-instant) No Yes Low latency, instant signup
Cerebras 5 RPM, 30K tokens/min, 1M tokens/day (gpt-oss-120b) No Yes Raw speed on short prompts
OpenRouter 20 RPM; 50 RPD with no credits (1,000 if you add $10) No Yes Many models, one key
Mistral La Plateforme ~1B tokens/month (Experiment tier), rate-limited No — phone required Yes EU hosting, codegen (Codestral)
GitHub Models Preview; ~50 RPD on high-tier models, more on low No — GitHub account Yes If you already live in GitHub
Cohere 20 req/min chat, 1,000 calls/month; rerank 10/min; embed 2,000 inputs/min No (trial key) Partial (/v2/) Rerank + embeddings in a pipeline
Hugging Face Inference Monthly credit allowance on free accounts No (HF token) Yes (router.huggingface.co/v1) One router across many providers
Together AI Trial credit at signup (finite) No Yes Trying 200+ open models before committing

Sources, one per row, in the provider sections below.

The nine, with the catch on each

1. Google Gemini (AI Studio) — the big-context one

Gemini's free tier is still the only card-free option giving you a 1M-token context window, which is the whole reason I keep it at the top for extraction work. You can drop an entire messy HTML page into the prompt and not think about chunking.

The catch, and this is the biggest change since last year, is that the limits got tighter. The "1,500 requests/day" figure people copy-paste was for older Flash. On Gemini 2.5 Flash today you're looking at roughly 10 RPM and a couple hundred requests per day, with a 250K tokens-per-minute ceiling shared across models. Numbers vary by exact model and Google nudges them, so check your project's quota in AI Studio. No card. (rate limits doc)

OpenAI-compatible? Partially — via generativelanguage.googleapis.com/v1beta/openai/. Your OpenAI client works with a base-URL swap, but a few features don't map cleanly.

2. Groq — the one I'd reach for first

Groq is the easiest yes on this list. Signup takes a minute, no card, and the API is a true drop-in for the OpenAI SDK. For llama-3.1-8b-instant the free tier is 30 RPM, 14,400 requests/day, 500,000 tokens/day — copied straight from their docs on 2026-05-30. (rate limits doc)

Groq's whole pitch is speed. Their custom inference hardware is what gets quoted at hundreds of tokens per second. I'm not going to print a tok/s number here as if I'd benchmarked it under my own load — I haven't, and I'll explain why that distinction matters below. But for a free, card-free endpoint that returns fast on short prompts, this is the one I demo with. Which is exactly what I do at the end of this post.

3. Cerebras — fast, but watch the context cap

Cerebras gives you 1 million tokens per day for free, no card, on models like gpt-oss-120b. The per-minute side is tighter than it looks: their docs list the free trial at 5 RPM, 30K tokens/min, 1M tokens/day. (rate limits doc)

Two honest notes. The "1M tokens/day" sounds huge until you remember a single long HTML page can be tens of thousands of tokens — token-based budgets vanish faster than request-based ones when your inputs are big. And free-tier context length has been capped well below the headline on some models while they expand infrastructure. OpenAI-compatible at api.cerebras.ai/v1. Great for short, fast calls; less ideal for stuffing whole documents.

4. OpenRouter — many models, one key, one footnote

OpenRouter is the convenience play: one OpenAI-compatible key, and any model with a :free suffix costs nothing. On 2026-05-30 there were over 350 models in the catalog with a couple dozen carrying that :free tag — DeepSeek, Gemma, Qwen variants, and more — so it's a cheap way to A/B models without juggling keys.

The footnote that trips people up: free models are 20 RPM but only 50 requests per day if your account has under $10 in credits. Add $10 once and that jumps to 1,000/day. No card to start — the cap is the price of staying free. (docs) (free models list)

5. Mistral La Plateforme — now wants your phone, not your card

Last year Mistral was a clean "no card." It still is. But the Experiment tier now requires a verified phone number to activate. That's the trade. In exchange you get a genuinely generous ~1 billion tokens/month, rate-limited, across all their models including Codestral for code. OpenAI-compatible at api.mistral.ai/v1. French hosting, which matters if EU data residency is on your list. (tier docs)

6. GitHub Models — free if you already have a GitHub account

If you write code you already have the credential. GitHub Models is in public preview, free, no card, OpenAI-compatible at models.github.ai/inference, and carries a real spread — GPT, Llama, Phi, DeepSeek, Mistral, Cohere all behind one token. Rate limits are tiered: roughly 50 requests/day on the heavier models, more on the lighter ones, and they're explicit that preview limits can change. (docs) Good for wiring AI into a repo or Action without leaving the ecosystem.

7. Cohere — the rerank-and-embed specialist

Cohere isn't where you'd run a chatbot, but it's the one I'd keep around for a pipeline. Its trial key is card-free and the rerank and embedding endpoints are genuinely strong for the retrieval half of an extraction system. Limits on the trial: 20 chat req/min, 1,000 API calls/month, rerank at 10/min, embed at 2,000 inputs/min. (rate limits doc) Partial OpenAI compatibility via their /v2/ API.

8. Hugging Face Inference — one router, many backends

Hugging Face's Inference Providers put a single OpenAI-compatible router in front of a dozen backends (Groq, Cerebras, Together, SambaNova, and more) at router.huggingface.co/v1. Free accounts get a monthly credit allowance; you authenticate with an HF token, no card. (docs) The neat part is provider failover — append :fastest or :cheapest to a model id and it routes for you. The free credit is modest, so treat it as a sampler, not a fountain.

9. Together AI — the trial-credit honest mention

I'm including Together AI with an asterisk, because it's a different kind of free. The first eight give you a recurring free tier. Together gives you a finite trial credit at signup (the amount has bounced around between promotions) with no card, OpenAI-compatible, and access to 200+ open-weight models — Llama 4, DeepSeek, Qwen, Mixtral. (rate limits doc) Once the credit's gone, it's gone until you pay. So: brilliant for a weekend of trying models, not a standing free tap. Calling that out because pretending otherwise is how these lists lose your trust.

What to pick, by what you're actually doing

Skip the "it depends." Here's the decision I'd make:

  • I need it fast and I need it now → Groq. Instant key, drop-in SDK.
  • I need to feed in a whole page or document → Gemini Flash. The 1M context is the only free one that fits.
  • I want to compare ten models cheaply → OpenRouter :free, one key. Just know it's 50/day until you drop $10.
  • My data has to stay in the EU → Mistral (FR hosting). Phone number is the cost of entry.
  • I need rerank or embeddings inside a retrieval pipeline → Cohere trial, or Hugging Face's router.
  • I live in GitHub already → GitHub Models. The credential's in your pocket.
  • I just want to taste 200 open models before paying → Together AI's trial credit.

What I won't pretend (the limitations)

This is the section the generic listicles skip, so it's the one I'd read.

"No card today" is not "no card forever." Free tiers move. Mistral added a phone gate. Google tightened Gemini's request caps. Both happened in the last twelve months. Everything above is true as of 2026-05-30 and I'd re-check the provider's own page before I built a product on it.

Most free tiers may train on your prompts. Gemini's free tier states plainly that free-tier content can be used to improve Google's products — the paid tier turns that off. Others have similar clauses buried in the terms. If you're extracting client data or anything sensitive, a free tier is the wrong tool. Read each provider's data policy, not just its rate limit.

A request limit is a ceiling, not a throughput. This is where my scraping years actually inform the LLM question. When you run extraction across hundreds of pages, the published "14,400 requests/day" and what you can sustain on a long-input job are different numbers — the daily cap tends to bind exactly when the task became useful. Token-based budgets like Cerebras's 1M/day disappear even faster, because one long HTML page is tens of thousands of tokens, not one "request." I'm being careful here on purpose: I have not run a controlled tok/s benchmark of these nine providers, and I won't quote one as if I had. The speed numbers vendors advertise are measured on short outputs; a long-context extraction job behaves differently. Treat free tiers as a prototype and burst lever, not a production foundation.

Free models rotate and get deprecated. OpenRouter's :free list and the preview providers (GitHub Models especially) change models without much notice. The list above is correct for the date on it, not for next quarter.

If you want zero cloud at all, that's a different post. I've covered Cloudflare Workers AI and running models locally with Ollama elsewhere. Edge and local are the bonus track, not part of this nine.

Try it in 60 seconds

Here's the smallest useful thing: ask a free, card-free model to turn a scrap of HTML into clean JSON, the exact job an extraction pipeline does all day. Groq, because signup is instant and the SDK swap is trivial.

import json
import os
from openai import OpenAI

# Same SDK as OpenAI — only base_url and api_key change.
client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],  # from console.groq.com/keys, no card
)

HTML = """
<div class="product">
  <h1>Aeropress Go Travel Coffee Press</h1>
  <span class="price">$39.95</span>
  <span class="stock">In stock</span>
  <div class="rating" data-score="4.7">4.7 out of 5 (2,317 reviews)</div>
</div>
"""

resp = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    response_format={"type": "json_object"},  # forces valid JSON out
    messages=[
        {"role": "system", "content": (
            "Extract product data from HTML. Return JSON with keys: "
            "name, price_usd (number), in_stock (bool), rating (number), "
            "review_count (int)."
        )},
        {"role": "user", "content": HTML},
    ],
    temperature=0,
)

print(json.dumps(json.loads(resp.choices[0].message.content), indent=2))
Enter fullscreen mode Exit fullscreen mode

Run it with pip install openai, an API key in GROQ_API_KEY, and python. You get back structured JSON parsed straight from raw HTML — no regex, no scraping rules, on a tier that never asked for a card.

I'm deliberately not pasting a captured response here. Two reasons. The exact bytes depend on your key, and on whichever model the provider happens to be routing that day — free tiers swap weights without telling you. So a screenshot of my JSON would be theater. Run the snippet yourself and you'll get back a clean object with the same five keys — name, price_usd, in_stock, rating, review_count. That's the claim. Not "look at my output," but "this actually runs on a tier that never asked for a card."

That response_format={"type": "json_object"} line is the one that earns its place: it forces the model to return parseable JSON, so you skip the brittle cleanup step. It's an OpenAI-API feature, and it works here precisely because Groq is OpenAI-compatible — which is the whole reason these nine are interchangeable in the first place.

The one I'm still unsure about

Here's my open question, and I'd genuinely like other people's data on it: across these free tiers, which one actually holds up under a sustained extraction batch — long inputs, hundreds of pages — before the daily cap or a quiet 429 wall stops you? The vendor tok/s charts don't answer that, and I haven't run a clean head-to-head yet. If you have, I want to hear the numbers.

Everything here was verified on 2026-05-30. Free tiers drift, so the snapshot has a date stamp for a reason.


Follow for the next batch. I'm planning an actual head-to-head of these free tiers under a real extraction load. And tell me in the comments: which free LLM tier have you hit a wall on, and at what point did it break?


Written by Alexey Spinov. I run production data-extraction pipelines — 32 published scrapers, 2,190 lifetime runs, 962 on one Trustpilot source (apify.com/knotless_cadence). Every free-tier number here was read off the provider's own docs on 2026-05-30, not someone else's blog. Drafted with AI assistance and edited/verified by me — I don't publish numbers or output I haven't checked myself, which is exactly why there's no fabricated API response in the demo above.

Top comments (0)