DEV Community

Alex Spinov
Alex Spinov

Posted on

Whisper.cpp Has a Free API — Run OpenAI Whisper Speech-to-Text on CPU

Whisper.cpp is a C/C++ port of OpenAI's Whisper speech recognition model. It runs entirely on CPU (no GPU needed), supports 99 languages, and includes a built-in HTTP server with an OpenAI-compatible API.

Free, open source, blazing fast on Apple Silicon and modern CPUs.

Why Use Whisper.cpp?

  • No GPU needed — optimized for CPU, especially Apple Silicon (M1/M2/M3)
  • OpenAI-compatible API — same endpoint format as OpenAI's Whisper API
  • 99 languages — automatic language detection
  • Real-time — process audio faster than real-time on modern hardware
  • Tiny models — from 39MB (tiny) to 1.5GB (large), run on any machine

Quick Setup

1. Build from Source

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make

# Download a model
bash models/download-ggml-model.sh base.en
# Available: tiny, base, small, medium, large-v3
Enter fullscreen mode Exit fullscreen mode

2. Transcribe Audio (CLI)

# Transcribe a WAV file
./main -m models/ggml-base.en.bin -f audio.wav

# With timestamps
./main -m models/ggml-base.en.bin -f audio.wav --output-srt

# Auto-detect language
./main -m models/ggml-large-v3.bin -f audio.wav -l auto
Enter fullscreen mode Exit fullscreen mode

3. Start HTTP Server

# Build server
make server

# Run server
./server -m models/ggml-base.en.bin --host 0.0.0.0 --port 8080
Enter fullscreen mode Exit fullscreen mode

4. Transcribe via API

# OpenAI-compatible endpoint
curl -s http://localhost:8080/v1/audio/transcriptions \
  -F file=@meeting.wav \
  -F model=whisper-1 \
  -F response_format=json | jq '.text'

# With language hint
curl -s http://localhost:8080/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F language=en \
  -F response_format=verbose_json | jq '{text: .text, language: .language, segments: [.segments[] | {start: .start, end: .end, text: .text}]}'

# Get timestamps (SRT format)
curl -s http://localhost:8080/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F response_format=srt
Enter fullscreen mode Exit fullscreen mode

5. Translation

# Translate any language to English
curl -s http://localhost:8080/v1/audio/translations \
  -F file=@russian_audio.wav \
  -F model=whisper-1 | jq '.text'
Enter fullscreen mode Exit fullscreen mode

Python Example

from openai import OpenAI

# Works with OpenAI SDK!
client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")

# Transcribe
with open("meeting.wav", "rb") as f:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=f,
        response_format="verbose_json"
    )

print(f"Text: {transcript.text}")
for segment in transcript.segments:
    print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")

# Translate to English
with open("foreign_audio.wav", "rb") as f:
    translation = client.audio.translations.create(
        model="whisper-1", file=f)
    print(f"Translation: {translation.text}")
Enter fullscreen mode Exit fullscreen mode

Model Sizes

Model Size RAM Speed Quality
tiny 39MB ~390MB Fastest Basic
base 74MB ~500MB Fast Good
small 244MB ~1GB Medium Better
medium 769MB ~2.5GB Slow Great
large-v3 1.5GB ~4GB Slowest Best

Key Endpoints

Endpoint Description
/v1/audio/transcriptions Transcribe audio
/v1/audio/translations Translate to English
/v1/models List models
/health Health check

Performance Tips

  • Apple Silicon: Use Metal acceleration (make WHISPER_METAL=1)
  • x86: Enable AVX2/AVX-512 for best performance
  • Large files: Split into chunks, process in parallel
  • Real-time: Use stream binary for live microphone input

Need custom data extraction or scraping solution? I build production-grade scrapers for any website. Email: Spinov001@gmail.com | My Apify Actors

Top comments (0)