Free Whisper API: Groq, Deepgram, AssemblyAI Compared
OpenAI’s Whisper changed speech-to-text the same way Llama changed open chat models: a frontier-grade ASR model the entire industry could host, fine-tune, and run on commodity hardware. Two years later, the question for most developers is no longer which model to use — it is which hosted API gives me Whisper-quality transcription without a bill.
Three providers dominate the answer in 2026: Groq, Deepgram, and AssemblyAI. All three give you Whisper (or a Whisper-class model) behind a hosted API with a free path to first transcription. None of them require you to spin up a GPU instance, manage CUDA drivers, or fight a Python audio dependency tree. But the meaning of “free” varies wildly between them, and the right pick depends entirely on what you are building.
This guide compares the three on the metrics that actually matter — real free-tier ceilings, per-hour cost once you pass them, supported languages, latency, file-size limits, and the engineering trade-offs you will hit when traffic grows. Every number cited links back to the provider’s own pricing or docs page; nothing here is fabricated benchmark theatre.
The 30-Second Answer
| Provider | Free path | Whisper model | Paid rate (cheapest) | Best for |
|---|---|---|---|---|
| Groq | True free tier, no card | whisper-large-v3 + turbo | $0.04/hr (turbo) | Fast batch transcription, hackathons, side projects |
| Deepgram | $200 signup credit | Whisper Cloud (whisper-large) | ~$0.48/hr Whisper · $0.258/hr Nova-3 | Production transcription with diarization and SLAs |
| AssemblyAI | $50 signup credit | Whisper-Streaming | $0.30/hr Whisper · $0.15/hr Universal | Production pipelines that need Whisper + summary/sentiment in one call |
If you want a no-strings, no-card free tier you can ship a real side project on, Groq is the only one that fits. If you want a high-quality production transcription stack with $200 of runway to evaluate it on, Deepgram wins. If you want Whisper plus a stack of additional NLP features (chapter detection, sentiment, entity extraction, summarization) in the same request, AssemblyAI is the cleanest single-API choice.
The rest of this article unpacks why.
Why “Free Whisper API” Is Worth Searching For
The official OpenAI Whisper API costs $0.006 per minute of audio, which works out to $0.36 per hour. That sounds cheap until you do the math on a real workload:
- A podcast transcription tool processing 1,000 hours/month = $360/month on OpenAI
- A meeting-bot SaaS averaging 50 hours/customer/month at 200 customers = $3,600/month
- A user-generated content platform with 10,000 hours of audio/month = $3,600/month
Self-hosting Whisper on your own GPU is cheaper at scale, but only if you actually have the GPU, the DevOps capacity to keep it running, and a workload large enough that the instance never sits idle. For the 90% of projects that don’t, the question becomes: which hosted API offers the cheapest entry path? That is exactly what the providers below compete on.
What “free” actually means in this market
There are two distinct shapes of “free Whisper API” on offer in 2026:
- Genuine free tier: A permanently free quota every account gets, refilled daily or monthly, no credit card required. Groq is the only major provider doing this for speech-to-text.
- Free credits at signup: A one-time wallet of credits ($50–$200) you spend down at paid rates. Once gone, you pay or stop. Deepgram and AssemblyAI use this model.
Both are useful — they just suit different stages of a project. A free-tier API is ideal for a personal tool, a demo, or a workload with predictable low volume. Free credits are better for prototypes that need higher concurrency or premium features (diarization, summarization) up front, with a clean ramp into paid usage when the product is real.
Groq Whisper API: The Only True Free Tier
Groq built its reputation around Language Processing Units (LPUs) that serve Llama and DeepSeek faster than any GPU cloud. In 2025 they extended that infrastructure to OpenAI’s Whisper models — and unlike every other Whisper host, they gave it a real, no-card free tier that anyone with an email address can use.
Models on offer
| Model ID | Paid price | Description |
|---|---|---|
whisper-large-v3 |
$0.111/hour | OpenAI’s flagship Whisper checkpoint, highest accuracy |
whisper-large-v3-turbo |
$0.04/hour | Distilled, ~8× faster, small accuracy drop on long audio |
Both models are multilingual (99+ languages for transcription), and both support a separate translation endpoint that returns English text from any source language. The minimum billed length is 10 seconds — even a 2-second clip charges as 10.
Free-tier ceiling (the real one)
Groq’s published rate limits for the free tier on either Whisper model are:
- 20 requests per minute
- 2,000 requests per day
- 7,200 audio seconds per hour (2 hours of audio every hour)
- 28,800 audio seconds per day (8 hours of audio every day)
- 25 MB max file size on free tier, 100 MB on the paid Dev tier
That ceiling is unusually generous for a “free” tier. Eight hours of transcribed audio per day, every day, with no card and no expiry, is enough to run a real podcast transcription side project or a daily meeting-notes tool for one person indefinitely. If you cross the 25 MB file limit, chunk the audio with ffmpeg before sending; Groq’s docs include a recommended chunking snippet.
Code: transcribe a file with Groq
curl https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-F "file=@meeting.mp3" \
-F "model=whisper-large-v3-turbo" \
-F "response_format=verbose_json"
Python with the OpenAI SDK (Groq is OpenAI-compatible on this endpoint):
from openai import OpenAI
client = OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ["GROQ_API_KEY"],
)
with open("meeting.mp3", "rb") as audio:
result = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=audio,
response_format="verbose_json",
timestamp_granularities=["segment"],
)
print(result.text)
for seg in result.segments:
print(f"[{seg.start:.1f} – {seg.end:.1f}] {seg.text}")
The verbose_json response includes word- or segment-level timestamps you can use for captions, search indexing, or feeding into LLM summarization. If you only need the transcript string, response_format=text drops the JSON envelope.
Where Groq is a poor fit
- No built-in speaker diarization. Whisper itself doesn’t predict speaker turns; Deepgram and AssemblyAI run a separate diarization model alongside transcription. If you need “Speaker 1 / Speaker 2” output, plug pyannote.audio or a hosted diarizer in front of Groq, or pick a different provider.
- No long-running async jobs. Every request is synchronous. For files over ~60 minutes, chunk and merge yourself.
- No production SLA on the free tier. Limits change occasionally; production workloads should sit on the paid Dev tier.
Deepgram Whisper Cloud: The $200 Production Path
Deepgram has been one of the dominant production speech-to-text vendors since well before Whisper existed. They run their own ASR model family (Nova-3, the current flagship; Nova-2; and the real-time Flux model) and also host Whisper as a managed product called Whisper Cloud. Whisper Cloud sits alongside their proprietary models behind one API key, so you can A/B both on the same audio and pick whichever wins for your data.
The free path: $200 of credit
Deepgram gives every new account $200 of API credit at signup, no card required. Their pricing page describes it as “free $200 credit, then pay as you go.” There is no fixed expiry on the credit, which is unusual — most competitors expire credits at 30–90 days.
At Whisper Cloud’s published rate (~$0.0048/minute, or roughly $0.288/hour at the time of writing, with concurrency capped at 5 streams on the free tier), $200 of credit gives you something like ~700 hours of Whisper transcription to evaluate the product before you commit. If you decide Deepgram’s own Nova-3 model is good enough — and for English audio it usually is — $200 stretches further because Nova-3 is cheaper per minute and faster.
Whisper Cloud vs Nova-3: the trade-off Deepgram wants you to make
Whisper Cloud is positioned as a compatibility option for teams who already pipe through Whisper and want a hosted replacement for self-hosted inference. Deepgram’s real recommendation for new builds is Nova-3, because:
- Nova-3 is cheaper per minute
- Nova-3 has built-in speaker diarization, smart formatting, language detection, and profanity filtering in the same request
- Nova-3 supports real-time streaming as a first-class feature; Whisper is fundamentally batch
For most production English transcription pipelines in 2026, Nova-3 is the better answer — and if you arrived here searching “free Whisper API,” it’s worth pricing both before you commit. Whisper Cloud remains the right pick if you specifically need Whisper’s multilingual behavior or you’re benchmarking a model swap.
Code: transcribe with Deepgram (Whisper or Nova)
curl -X POST \
-H "Authorization: Token $DEEPGRAM_API_KEY" \
-H "Content-Type: audio/wav" \
--data-binary @meeting.wav \
"https://api.deepgram.com/v1/listen?model=whisper-large&punctuate=true"
Swap the model to Nova-3 by changing model=whisper-large to model=nova-3. The Python SDK is a thin wrapper:
from deepgram import DeepgramClient, PrerecordedOptions
dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
with open("meeting.wav", "rb") as f:
payload = {"buffer": f.read()}
options = PrerecordedOptions(
model="whisper-large", # or "nova-3"
punctuate=True,
diarize=True, # Nova-3 only; ignored on whisper-large
smart_format=True,
)
response = dg.listen.rest.v("1").transcribe_file(payload, options)
print(response.results.channels[0].alternatives[0].transcript)
Where Deepgram is a poor fit
- Once the $200 runs out, you’re paying. No free tier waits behind it. Budget the runway accordingly.
- Higher concurrency requires paid plans. The five-stream cap on the trial is enough to evaluate, not to ship a real concurrent batch pipeline.
- Whisper Cloud is not Deepgram’s strategic priority. Expect Nova to get the new features first; Whisper Cloud is a compatibility-and-evaluation product.
AssemblyAI: Whisper Plus the Full NLP Stack
AssemblyAI takes a different approach. Instead of competing on “we host Whisper cheaply,” they sell a layered speech intelligence platform where transcription is the foundation and the value is everything stacked on top — chapter detection, sentiment analysis, named-entity extraction, content moderation, summarization, topic classification. All available in the same request that produces the transcript.
The free path: $50 of credit
AssemblyAI gives new accounts $50 of credit on signup, no credit card required. The two relevant models:
- Universal-3 Pro (Async) — their current flagship pre-recorded model, $0.15/hr at the time of writing. Recommended for new builds.
- Whisper-Streaming — the open-source Whisper model hosted on AssemblyAI’s infrastructure, $0.30/hr, supports 99+ languages.
$50 of credit covers roughly 166 hours of Whisper-Streaming or 333 hours of Universal-3 Pro — plenty to prototype, demo, or transcribe a backlog of meeting recordings before you have to pay.
Why pick AssemblyAI’s Whisper over Groq’s
The answer is almost always: because you also want the layered features. If you only need transcript text, Groq’s free tier is strictly better — same model family, no card, no credit clock. The reason to buy AssemblyAI is that adding sentiment_analysis: true or auto_chapters: true to a single API call returns:
- Per-sentence sentiment (positive / negative / neutral with confidence)
- Auto-generated chapter boundaries with headlines for long-form audio
- Named entities (PERSON, ORG, LOCATION, etc.) with timestamps
- Topic categories from the IAB taxonomy
- PII redaction in the transcript
Reproducing that stack on top of Groq means a second LLM call, your own entity-extraction prompt, and your own chaptering logic. For one project that’s fine. For a SaaS product, the integration cost of doing it yourself rapidly exceeds the price difference per hour.
Code: transcribe with AssemblyAI
AssemblyAI’s API is two-step (upload + transcribe) rather than a single multipart POST:
import os, requests, time
API_KEY = os.environ["ASSEMBLYAI_API_KEY"]
headers = {"Authorization": API_KEY}
# 1. Upload audio
with open("meeting.mp3", "rb") as f:
upload = requests.post(
"https://api.assemblyai.com/v2/upload",
headers=headers,
data=f,
).json()
audio_url = upload["upload_url"]
# 2. Request transcription (with optional features)
job = requests.post(
"https://api.assemblyai.com/v2/transcript",
headers=headers,
json={
"audio_url": audio_url,
"speech_model": "universal", # or "whisper-streaming"
"speaker_labels": True,
"auto_chapters": True,
"sentiment_analysis": True,
},
).json()
# 3. Poll until done
while True:
status = requests.get(
f"https://api.assemblyai.com/v2/transcript/{job['id']}",
headers=headers,
).json()
if status["status"] in ("completed", "error"):
break
time.sleep(3)
print(status["text"])
for chapter in status.get("chapters", []):
print(f"[{chapter['start']/1000:.0f}s] {chapter['headline']}")
Where AssemblyAI is a poor fit
- Free credit runs out fast on heavy workloads. $50 is roughly a quarter of Deepgram’s $200.
- Two-step upload adds latency. Bigger files take longer to upload than to transcribe in some cases.
- Universal-3 Pro is not Whisper. If your codebase or contracts specifically mandate Whisper output, choose Whisper-Streaming explicitly, accept the higher per-hour rate, and don’t drift toward Universal “because it’s cheaper.”
Honorable Mentions: Other Ways to Get Free Whisper
The three above are the practical answers for hosted Whisper in 2026, but a few alternatives are worth knowing about.
Self-host with faster-whisper or whisper.cpp
If you already have a GPU box (or even a recent MacBook), faster-whisper (CTranslate2) and whisper.cpp deliver real-time-or-better transcription on hardware you already own. Truly free at the marginal level. The catch: you own the operational complexity (driver updates, OOM crashes, queueing). For a personal tool this is fine; for a SaaS, the time it costs you is rarely worth the API savings until volume passes ~500 hours/month.
Hugging Face Inference API
Hugging Face’s free Inference API can call OpenAI Whisper checkpoints, but rate limits are aggressive and request latency on the free tier is unpredictable. Useful for one-off testing in a notebook; not a production option.
Cloudflare Workers AI Whisper
Cloudflare Workers AI includes Whisper among its 47+ free models, billed in “neurons” rather than minutes. If you already run your stack on Cloudflare Workers, it integrates very cleanly and the free daily neuron quota is generous. Less compelling as a standalone choice if you’re not on Cloudflare.
The official OpenAI Whisper API
$0.006/minute, billed against your OpenAI usage. Not free, but worth listing as the reference price every other provider competes against. If you already have OpenAI usage running and don’t want a third API key in your codebase, it’s the path of least integration friction.
Side-by-Side Spec Sheet
| Feature | Groq | Deepgram | AssemblyAI |
|---|---|---|---|
| Free tier shape | Permanent free tier, no card | $200 signup credit | $50 signup credit |
| Whisper model | large-v3, large-v3-turbo | whisper-large (Whisper Cloud) | Whisper-Streaming |
| Native non-Whisper model | — | Nova-3, Nova-2, Flux | Universal-3 Pro, Universal-2 |
| Cheapest paid rate | $0.04/hr (turbo) | ~$0.258/hr (Nova-3) | $0.15/hr (Universal-2) |
| Speaker diarization | No | Yes (Nova-3) | Yes |
| Real-time streaming | No | Yes (Flux, Nova) | Yes |
| Summarization / chapters | No (DIY via LLM) | Limited | Yes (auto-chapters) |
| Sentiment / entities | No | Limited | Yes |
| Max file size (single request) | 25 MB free / 100 MB dev | 2 GB | 2.2 GB (URL) / 5 GB (upload) |
| API style | Synchronous, OpenAI-compatible | Synchronous + streaming | Async upload + poll |
| Languages | 99+ (Whisper) | 30+ (Nova) / 99+ (Whisper) | 99+ (Whisper) / 17+ (Universal) |
Decision Tree: Which One Should You Pick?
Run through this list top to bottom. The first row that matches your situation is your answer.
- I am building a side project / hackathon entry / personal tool. → Groq. No card, real free tier, fastest to first transcription.
- I need speaker diarization (who said what) in the output. → Deepgram Nova-3 if production-bound, AssemblyAI if you also need chapters/summary.
- I need Whisper specifically — same model my self-hosted setup uses now — as a hosted swap. → Deepgram Whisper Cloud, then evaluate Nova-3 as a downgrade test.
- I need transcript + sentiment + chapters + entities from one API call. → AssemblyAI. The integration cost saved is worth the higher per-hour rate.
- I need real-time streaming transcription for a voice agent. → Deepgram Flux/Nova or AssemblyAI Universal Streaming. Groq is batch-only.
- I have heavy multilingual audio (Spanish, Mandarin, Hindi, Arabic, etc.). → Groq whisper-large-v3 for cost, AssemblyAI Whisper-Streaming for accuracy + post-processing.
- I already run my backend on Cloudflare Workers. → Cloudflare Workers AI Whisper — integration savings beat per-hour savings here.
- I already have an OpenAI key wired in and don’t want a third vendor. → Official OpenAI Whisper API, $0.006/min. Don’t optimize what you don’t need to.
Combining Free Whisper with a Free LLM
The real productivity unlock isn’t transcription on its own — it’s transcription plus an LLM pass on the resulting text. A reasonable free stack for a side-project transcription tool in 2026 looks like:
-
Audio in: Groq
whisper-large-v3-turbo(free, fast). - LLM pass on the transcript: Groq Llama 3.3 70B, Cohere Command R+, or Together AI Llama 3.3 70B Free for summarization, action-item extraction, or speaker attribution via prompt.
- Embedding for search: Cohere Embed v3 or another free embedding tier.
Three free API keys, zero cards, end-to-end speech-to-search. The same architecture that costs $0.36/min on commercial offerings can run free as long as you stay within each provider’s daily ceiling.
FAQ
Is OpenAI’s Whisper actually free?
The Whisper model weights are MIT-licensed and free to self-host. The OpenAI Whisper API ($0.006/min) is not free — there is no free tier and you need a credit card on file. When people say “free Whisper API” they almost always mean a third-party host (Groq, Deepgram, AssemblyAI) that runs Whisper for you with a free path in.
Which Whisper API is the most accurate?
All three host the same underlying whisper-large-v3 checkpoint (or a distilled variant of it), so transcription accuracy on identical audio is comparable. Differences in real-world output come from preprocessing (audio normalization, VAD), post-processing (smart formatting, punctuation), and whether diarization is layered on top. Groq runs the cleanest “raw Whisper” output; Deepgram and AssemblyAI add post-processing that usually helps for English business audio.
Can I use these APIs for real-time transcription?
Whisper itself is a batch model — it ingests a complete audio file and returns a transcript. Groq is batch-only. Deepgram offers real-time streaming via Nova and the Flux model (not Whisper). AssemblyAI offers Universal Streaming and Whisper-Streaming for real-time use. For voice-agent latency budgets, Nova-3 and AssemblyAI Universal Streaming are the practical picks; Whisper itself is not ideal for sub-second response.
What’s the difference between whisper-large-v3 and whisper-large-v3-turbo?
Turbo is a distilled version of large-v3 — fewer decoder layers, ~8× faster, and substantially cheaper to serve. The accuracy gap on standard benchmarks is small (a few percent WER) and only meaningful on long, noisy, or accented audio. For most use cases turbo is the right default; reach for large-v3 only when you’ve benchmarked turbo on your data and found it lacking.
Can I use the free tier commercially?
Groq permits commercial use on the free tier within the published rate limits; their paid Dev tier exists to lift those limits and add SLA, not to gate commercial access. Deepgram and AssemblyAI credits are usable for any purpose — they’re paid usage you didn’t pay for yet. Always re-read each provider’s TOS before deploying commercially; it changes.
How do I handle audio files larger than 25 MB on Groq?
Chunk the audio before sending. The simplest reliable approach is ffmpeg -i input.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3 to split into 10-minute pieces, transcribe each, and concatenate the resulting text. Groq’s docs include a more aggressive recipe that downsamples to 16 kHz mono first, which both reduces file size and matches Whisper’s training audio format.
Which one has the best multilingual support?
Anywhere you see “whisper-large-v3” you get OpenAI’s published 99+ language coverage. Groq, Deepgram Whisper Cloud, and AssemblyAI Whisper-Streaming are all equivalent there. Deepgram Nova-3 supports a smaller set (around 30+ languages) but is faster and cheaper for the languages it does support — primarily English with strong coverage of Spanish, French, German, Portuguese, Italian, Dutch, Hindi, Japanese, Korean, and Mandarin.
Do any of these offer free real-time streaming?
Not at production volume. Deepgram and AssemblyAI both bill streaming minutes against their respective free credits ($200 and $50). Groq doesn’t offer streaming at all. If real-time is core to your product, plan to pay; the free credits are useful for evaluation, not for shipping a public voice product.
Related Reads
- Which Free Text-to-Speech API Should You Use in 2026? — the speech-to-text half meets its text-to-speech counterpart: free TTS APIs compared for the speak-back leg of a voice pipeline.
- Groq API: The Fastest Free AI API in 2026 — full breakdown of Groq’s LLM offering, which pairs directly with their Whisper endpoint.
- Cohere Free API: Embedding and Rerank for RAG — the embedding half of a free transcription-to-search pipeline.
- Together AI Free API: Llama, DeepSeek, FLUX — another free LLM source for post-transcription summarization.
- Cloudflare Workers AI — also hosts Whisper, billed in neurons; relevant if your stack is already on Cloudflare.
- 10 Best Free AI APIs in 2026 — the wider context for which provider does what.
Originally published at toolfreebie.com.
Top comments (0)