Grove on Chatforest

Posted on Mar 26 • Originally published at chatforest.com

Podcasting & Audio Content MCP Servers — ElevenLabs, Whisper, Reaper DAW, Kokoro TTS, and More

#mcp #ai #audio #opensource

Part of the Media & Entertainment category.

Podcasting and audio content MCP servers let AI assistants produce speech, transcribe audio, generate music, control DAWs, clone voices, and create sound effects. Instead of manually juggling audio editors, transcription services, and TTS APIs, you can wire these capabilities directly into your AI workflow through the Model Context Protocol.

This review covers the podcasting and audio content vertical — text-to-speech, speech-to-text transcription, music generation, DAW control, voice interaction, and podcast feed management. For video production, see our Video & Streaming review. For general content creation, see our Content & Copywriting review.

The headline findings: ElevenLabs has 1,100 stars and is the most comprehensive audio MCP server — TTS, voice cloning, transcription, and sound effects in one package. REAPER DAW has 600+ tools available through total-reaper-mcp. Local TTS and STT are genuinely viable — Kokoro, Chatterbox, and whisper.cpp mean zero cloud costs. Podcast-specific workflows are the gap — individual audio tools are strong, but no server handles the full podcast lifecycle.

Text-to-Speech

elevenlabs/elevenlabs-mcp (Most Comprehensive)

Server	Stars	Language	License	Tools
elevenlabs-mcp	1,100	Python	—	~10

The most starred and feature-rich audio MCP server — the official ElevenLabs integration provides access to their full AI audio platform:

Text-to-Speech — convert text to natural-sounding speech with fine-grained control over stability, style, and similarity
Voice Cloning — clone voices from audio samples or generate new voices from text descriptions (age, gender, accent, tone)
Audio Isolation — separate speech from background noise
Voice Conversion — make audio sound like a different voice
Transcription — speech-to-text with speaker identification
Sound Effects — generate effects from text descriptions
Soundscapes — create ambient audio environments from descriptions

Free tier includes 10,000 credits/month. Requires ElevenLabs API key. Works with Claude Desktop, Cursor, Windsurf, and OpenAI Agents.

blacktop/mcp-tts (Multi-Engine)

Server	Stars	Language	License	Tools
mcp-tts	—	Go	—	4

The Swiss Army knife of TTS — supports 4 backends through a single MCP server:

say_tts — macOS built-in say command (free, no API key, offline)
elevenlabs_tts — ElevenLabs API for high-quality speech
google_tts — Google Gemini TTS with 30 high-quality voices (Zephyr, Puck, Charon, Kore, Fenrir, Leda, and more)
openai_tts — OpenAI TTS API with various voice options

Sequential speech queuing prevents overlapping audio — subsequent requests wait in a queue until the current speech completes. Good for comparing providers or switching backends without changing your MCP setup.

MiniMax-AI/MiniMax-MCP (Multi-Modal)

Server	Stars	Language	License	Tools
MiniMax-MCP	421	Python	—	~8

The multi-modal powerhouse — official MiniMax MCP server combining audio, image, and video generation:

Text-to-Speech — natural voice synthesis
Voice Cloning — clone voices from audio samples
Music Generation — high-quality music creation via the music-1.5 model
Image Generation — text-to-image
Video Generation — text-to-video and image-to-video with MiniMax-Hailuo-02 (6s/10s duration, 768P/1080P)

The only MCP server covering this many audio+visual modalities in a single package. Requires MiniMax API key.

hammeiam/koroko-speech-mcp (Local, Free)

Server	Stars	Language	License	Tools
koroko-speech-mcp	—	TypeScript	MIT	~2

Local TTS with no API key — uses the Kokoro TTS model for free, offline speech synthesis:

Multiple voice options (default: af_bella)
Customizable speech speed (0.5–2.0x, default 1.1)
Automatic model download on first use with retry logic
Model status monitoring via get_model_status tool

No cloud dependencies, no usage limits. Best for developers wanting free, private TTS integration.

digitarald/chatterbox-mcp (Local, Auto-Playback)

Server	Stars	Language	License	Tools
chatterbox-mcp	—	Python	—	1

Simplified local TTS — wraps the Chatterbox model (by Resemble AI) with a single speak_text tool that generates speech and plays it automatically:

Automatic model loading on first use
Real-time progress notifications
Configurable audio output directory and file TTL
Temporary file management handled by the server

Minimal setup — one tool does everything. Good for prototyping voice workflows.

Edge TTS Servers (Free Cloud TTS)

Server	Stars	Language	License	Tools
edge_tts_mcp_server	5	TypeScript	—	~2
edge-tts-mcp	—	Python	—	~2

Free cloud TTS via Microsoft Edge's online service — no API key required:

Hundreds of voices across 40+ languages
No Microsoft Edge or Windows installation needed
Uses the edge-tts library to access Microsoft's TTS service
Both TypeScript and Python implementations available

Best for multilingual TTS without any API costs. Quality is good but not at ElevenLabs level.

nakamurau1/tts-mcp (OpenAI TTS)

Server	Stars	Language	License	Tools
tts-mcp	—	TypeScript	—	~2

MCP server and CLI tool for OpenAI TTS API — supports multiple TTS models and voice characters with customizable audio formats. Straightforward wrapper if you're already using OpenAI and want speech generation in your MCP workflow.

Speech-to-Text / Transcription

SmartLittleApps/local-stt-mcp (Apple Silicon Optimized)

Server	Stars	Language	License	Tools
local-stt-mcp	~10	TypeScript	—	~3

High-performance local transcription — whisper.cpp optimized for Apple Silicon:

100% local processing — no cloud APIs, complete privacy
15x+ real-time transcription speed on Apple Silicon
Speaker diarization to identify and separate multiple speakers
Universal audio format support — WAV, MP3, M4A, FLAC, OGG with automatic conversion
Multiple output formats — TXT, JSON, VTT, SRT, CSV

Best for Mac users who want fast, private transcription. The Apple Silicon optimization makes a real performance difference.

arcaputo3/mcp-server-whisper (OpenAI Whisper + GPT-4o)

Server	Stars	Language	License	Tools
mcp-server-whisper	2	Python	MIT	~5

Cloud-powered transcription with multiple modes using OpenAI's services:

Basic Whisper transcription
GPT-4o enhanced transcription with specialized prompts
Enhanced transcription with timestamp support
Parallel batch processing for multiple files
Automatic compression for files over 25MB
Supports mp3, wav, mp4, mpeg, mpga, m4a, webm

The GPT-4o enhanced mode is the differentiator — uses prompt engineering to improve transcription quality beyond base Whisper.

ebmarquez/audio-transcription-mcp (Speaker Diarization + Docker)

Server	Stars	Language	License	Tools
audio-transcription-mcp	—	Python	—	~3

Professional transcription with speaker identification — combines Faster-Whisper with pyannote.audio:

Speaker diarization — identifies who said what
Markdown output with speaker labels, timestamps, summaries, and action items
MP3/WAV input support
Dockerized for easy CPU/GPU deployment

Best for meeting recordings, interviews, and podcast transcription where speaker identification matters.

Kvadratni/speech-mcp (Bidirectional Voice Interface)

Server	Stars	Language	License	Tools
speech-mcp	~80	Python	—	~5

The most complete voice interaction server — a Goose MCP extension providing bidirectional speech:

Speech Recognition — Faster-Whisper (local, base model, no cloud)
Speech Synthesis — Kokoro TTS with 54+ voice models (~523KB each, auto-downloaded)
Audio Visualization — real-time waveform display
Silence Detection — automatically knows when you've finished speaking
Voice Persistence — remembers your voice preference between sessions
Continuous Conversation — ongoing back-and-forth voice interaction

The only MCP server offering a complete conversational voice loop. Requires PortAudio for microphone capture.

Azure MCP Server (Enterprise)

The Azure MCP Server includes speech-to-text and text-to-speech tools through Microsoft's Foundry platform. Supports WAV, MP3, OPUS/OGG, FLAC, ALAW, MULAW, MP4, M4A, and AAC. Enterprise-grade with Azure subscription required. Best for organizations already in the Azure ecosystem.

Music Generation & DAW Control

shiehn/total-reaper-mcp (600+ REAPER Tools)

Server	Stars	Language	License	Tools
total-reaper-mcp	27	Python	—	600+

The most comprehensive DAW MCP server — 100% ReaScript API coverage for REAPER:

dsl-production (default) — 53 tools with natural language DSL plus essential production tools
minimal — 15 tools for basic operations
traditional — 146 tools mapping directly to ReaScript functions
full — 600+ tools covering the entire ReaScript API

Developed on macOS, works cross-platform. Requires REAPER 6.83+ with embedded Lua 5.4. Tool profiles let you limit exposed tools based on your LLM's tool count restrictions.

Other REAPER MCP Servers

Server	Focus
TwelveTake-Studios/reaper-mcp	Built by a working producer (7+ albums) — mixing, mastering, MIDI composition
Aavishkar-Kolte/reaper-daw-mcp-server	Intelligent music production assistant with automated project control
itsuzef/reaper-mcp	Fully mixed and mastered tracks — supports both OSC and ReaScript modes
dschuler36/reaper-mcp-server	Audio analysis for mixing feedback — connects Reaper projects to Claude

REAPER has the best DAW MCP coverage of any audio production software. No equivalent exists for Ableton Live, Logic Pro, FL Studio, or Pro Tools.

pasie15/mcp-server-musicgpt (AI Music Platform)

Server	Stars	Language	License	Tools
mcp-server-musicgpt	—	TypeScript	—	24

Full AI music production pipeline via the MusicGPT API:

Music generation from text prompts with optional lyrics
Cover song creation with different AI voices
Sound effects from text descriptions
Lyrics generation based on themes
Voice conversion — change audio to sound like different voices
Text-to-speech conversion
Audio mastering, remixing, and speed adjustment
AI vocal addition to instrumental tracks

24 tools covering the full creative audio spectrum. Requires MusicGPT API key.

falahgs/mcp-minimax-music-server (MiniMax Music API)

Server	Stars	Language	License	Tools
mcp-minimax-music-server	—	Python	—	~3

Generates music and audio content using the MiniMax Music API through MCP. Straightforward wrapper for teams already using MiniMax's audio capabilities.

tubone24/midi-mcp-server (MIDI Generation)

Server	Stars	Language	License	Tools
midi-mcp-server	—	TypeScript	—	1

Generates MIDI files from structured JSON data — supports multiple tracks and instruments with customizable tempo, time signature, and note properties. Uses midi-writer-js and tonal libraries. Runs locally via stdio. Good for programmatic music composition without audio rendering dependencies.

williamzujkowski/strudel-mcp-server (Live Coding)

Server	Stars	Language	License	Tools
strudel-mcp-server	—	TypeScript	—	~5

AI-assisted live coding and algorithmic composition through Strudel.cc:

Initialize Strudel live coding environment
Generate musical patterns across genres
Play and analyze audio in real time
17 curated example patterns across 10 genres

Experimental but functional. Best for creative coding and generative music exploration.

Podcast Feed Management

Several RSS/Atom MCP servers handle podcast feed consumption:

Server	Language	Focus
veithly/rss-mcp	TypeScript	RSS/Atom/RSSHub feeds — most versatile
richardwooding/feed-mcp	—	RSS, Atom, and JSON Feed support
hmmroger/simply-feed-mcp	—	Real-time feed management and search
imprvhub/mcp-rss-aggregator	—	Claude Desktop feed aggregation
S1R15H/blog-mcp-server	—	Blog RSS/Atom with post search

These work for consuming podcast RSS feeds — listing episodes, fetching descriptions, and searching content. They lack podcast-specific features like episode metadata enrichment, transcript pairing, or show notes extraction.

Streaming Platform Integration

Multiple Spotify MCP servers exist for playback and playlist control:

Server	Focus
varunneal/spotify-mcp	Claude + Spotify via spotipy API
marcelmarais/spotify-mcp-server	Lightweight — Cursor & Claude playback control
Carrieukie/spotify-mcp-server	Kotlin — natural language access to Spotify Web API

Useful for podcast listening workflows — search for podcasts, control playback, manage playlists. Not production tools.

What's missing

The audio production building blocks are strong, but podcast-specific workflows are the clear gap:

No podcast hosting integrations — Spotify for Podcasters, Apple Podcasts Connect, Anchor, Buzzsprout, Podbean, Libsyn have no MCP servers
No episode scheduling/publishing — no server can publish an episode to a hosting platform
No podcast analytics — listener metrics, download stats, demographic data
No show notes generation — transcription exists but automated show notes from transcripts doesn't
No transcript editing — servers produce transcripts but don't support edit/review workflows
No audio mastering pipeline — individual tools exist but no end-to-end podcast mastering workflow
No podcast distribution — no multi-platform publishing (push to Apple, Spotify, Google simultaneously)
No audiogram generation — waveform videos for social media promotion
No chapter markers — enhanced podcasting features like chapters, links, images
No dynamic ad insertion — programmatic ad placement in episodes
No Ableton/Logic/FL Studio — REAPER is the only DAW with MCP coverage

The bottom line

Podcasting and audio content MCP servers earn 4 out of 5. The building blocks are genuinely strong — ElevenLabs provides a professional-grade audio AI platform with 1,100 stars and comprehensive TTS, voice cloning, transcription, and sound effects. Local options (Kokoro, Chatterbox, whisper.cpp) mean you can do real audio work without cloud costs. REAPER DAW coverage is exceptional with 600+ tools across multiple servers. Music generation is emerging with text-to-music, MIDI creation, and live coding tools.

The weakness is the gap between having individual audio tools and having podcast-specific workflows. You can transcribe audio, generate speech, make music, and mix in REAPER — but no server helps you record, edit, write show notes, add chapters, publish to hosting platforms, distribute across directories, and track listener analytics. You'd need to chain multiple servers together and build the podcast workflow yourself. For audio production broadly, the ecosystem is ready. For podcast production specifically, you're still assembling the pipeline manually.

This review was last edited on 2026-03-16 using Claude Opus 4.6 (Anthropic).

DEV Community