Fish Audio S2 API is a production-grade text-to-speech REST API powered by a 4-billion-parameter model trained on 10 million hours of audio. It supports voice cloning, streaming, and 50+ languages. To use the Fish Audio S2 API efficiently—including sending requests, managing references, and running unit tests—Apidog is the fastest way to explore, document, and validate every endpoint.
Introduction
Modern TTS models have evolved: they're no longer robotic; they whisper, laugh, and shift tone. Fish Audio S2 API is part of this new wave—a 4B-parameter model trained on 10M+ hours of multilingual audio, producing speech almost indistinguishable from a real human.
If you're building podcast automation, interactive assistants, or real-time dubbing, integrating Fish Audio S2 API requires more than a POST request. You must handle authentication, reference audio, streaming, and robust unit testing to ensure production reliability.
💡 Tip: Before your first Fish Audio S2 API call, download Apidog for free. Visually test emotion tags, streaming chunks, voice cloning payloads, and binary audio responses—no code needed. Mock, validate, and listen inline so your TTS integration works from day one.
What Is Fish Audio S2 API?
Fish Audio S2 API is the HTTP interface to Fish Speech S2-Pro, an open-source TTS system using a Dual-Autoregressive (Dual-AR) architecture. The model splits semantic generation (4B params, slow AR over time) from residual codebook generation (400M params, fast AR over depth), enabling high-quality synthesis at a real-time factor (RTF) of 0.195 on a single NVIDIA H200.
Key Capabilities:
| Feature | Detail |
|---|---|
| Languages | ~50 (English, Chinese, Japanese, Korean, Arabic, French, German, etc.) |
| Voice cloning | 10–30 second reference audio, no fine-tuning required |
| Inline emotion control | Natural-language tags: [laugh], [whispers], [super happy]
|
| Multi-speaker generation | Native `<\ |
| Streaming | Real-time chunked audio via {% raw %}"streaming": true
|
| Output formats | WAV, MP3, PCM |
| Authentication | Bearer token (Authorization: Bearer YOUR_API_KEY) |
The API base URL after local deployment is http://127.0.0.1:8080; all endpoints are under the /v1/ namespace.
Getting Started with Fish Audio S2 API and Apidog
Prerequisites for Fish Audio S2 API
You'll need:
- A running Fish Speech S2-Pro server
- An API client capable of handling binary audio
Start the server:
python tools/api_server.py \
--llama-checkpoint-path checkpoints/s2-pro \
--decoder-checkpoint-path checkpoints/s2-pro/codec.pth \
--listen 0.0.0.0:8080 \
--compile \
--half \
--api-key YOUR_API_KEY \
--workers 4
-
--compileenablestorch.compilefor ~10x faster inference (with one-time warmup). -
--halfuses FP16 for reduced GPU memory.
Health check:
curl http://127.0.0.1:8080/v1/health
# {"status":"ok"}
Setting Up Fish Audio S2 API in Apidog
- Download Apidog for free and create a new HTTP project.
- Set the base URL to
http://127.0.0.1:8080under Environments. - Add a global header:
Authorization: Bearer YOUR_API_KEY
This global API key applies to all requests—no manual header setup per request. Easily switch between dev, staging, and production environments.
Making Your First Fish Audio S2 API Request in Apidog
Testing the Fish Audio S2 API Text-to-Speech Endpoint
- Endpoint:
POST /v1/tts - In Apidog: Create a new POST request, set the URL, and use this JSON body:
{
"text": "Hello! This is a test of the Fish Audio S2 API.",
"format": "wav",
"streaming": false,
"temperature": 0.8,
"top_p": 0.8,
"repetition_penalty": 1.1,
"max_new_tokens": 1024
}
Full TTS request schema:
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | required | Text to synthesize |
format |
string | "wav" |
Output format: wav, mp3, pcm
|
chunk_length |
int | 200 | Synthesis chunk size (100–300) |
seed |
int | null | Fix seed for reproducibility |
streaming |
bool | false | Return audio in real-time chunks |
max_new_tokens |
int | 1024 | Max tokens to generate |
temperature |
float | 0.8 | Sampling randomness (0.1–1.0) |
top_p |
float | 0.8 | Nucleus sampling threshold (0.1–1.0) |
repetition_penalty |
float | 1.1 | Penalize repeated sequences (0.9–2.0) |
use_memory_cache |
string | "off" |
Cache reference encoding in memory |
In Apidog: Click Send. The API returns raw audio bytes (audio/wav). Apidog will display an inline audio player so you can listen to the output instantly.
Voice Cloning with Fish Audio S2 API
Uploading a Reference Audio to Fish Audio S2 API via Apidog
The API supports zero-shot voice cloning via the references field in TTS requests. Provide a base64-encoded audio sample and its transcript.
Upload a reference:
- Endpoint:
POST /v1/references/add
{
"id": "my-voice-clone",
"text": "This is the reference transcription matching the audio.",
"audio": "<base64-encoded-wav-bytes>"
}
In Apidog: Use Binary or Form Data body type to upload the audio file and text together.
Sample response:
{
"success": true,
"message": "Reference added successfully",
"reference_id": "my-voice-clone"
}
Use the reference in TTS:
{
"text": "This sentence will be spoken in the cloned voice.",
"reference_id": "my-voice-clone",
"format": "mp3"
}
Apidog's Reference Management lets you save these as templates—just change the reference_id to test other voices.
How to Unit Test Fish Audio S2 API Integrations
Why Unit Tests Matter
Fish Audio S2 API integrations can fail if reference IDs expire, parameters go out of range, streaming payloads are mishandled, or formats mismatch. Automated unit tests catch these regressions before they impact users.
Writing Unit Tests for Fish Audio S2 API with Python
Below is a pytest suite using httpx for core endpoint coverage:
import pytest
import httpx
import base64
BASE_URL = "http://127.0.0.1:8080"
API_KEY = "YOUR_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}
class TestFishAudioS2API:
def test_health_check(self):
response = httpx.get(f"{BASE_URL}/v1/health", headers=HEADERS)
assert response.status_code == 200
assert response.json()["status"] == "ok"
def test_tts_basic_request(self):
payload = {
"text": "Unit test: verifying Fish Audio S2 API TTS output.",
"format": "wav",
"seed": 42,
}
response = httpx.post(
f"{BASE_URL}/v1/tts",
json=payload,
headers=HEADERS,
timeout=60,
)
assert response.status_code == 200
assert response.headers["content-type"] == "audio/wav"
assert len(response.content) > 1000
def test_tts_invalid_temperature_raises_error(self):
payload = {"text": "test", "temperature": 99.0}
response = httpx.post(
f"{BASE_URL}/v1/tts",
json=payload,
headers=HEADERS,
timeout=30,
)
assert response.status_code == 422
def test_reference_add_and_list(self):
with open("test_reference.wav", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode()
add_response = httpx.post(
f"{BASE_URL}/v1/references/add",
json={
"id": "unit-test-voice",
"text": "This is a unit test reference audio.",
"audio": audio_b64,
},
headers=HEADERS,
)
assert add_response.json()["success"] is True
list_response = httpx.get(
f"{BASE_URL}/v1/references/list", headers=HEADERS
)
assert "unit-test-voice" in list_response.json()["reference_ids"]
httpx.request(
"DELETE",
f"{BASE_URL}/v1/references/delete",
json={"reference_id": "unit-test-voice"},
headers=HEADERS,
)
Run tests:
pytest test_fish_audio_s2_api.py -v
Running Fish Audio S2 API Unit Tests with Apidog
Apidog's Test Scenarios automates endpoint checks visually:
- Open your Fish Audio S2 API collection
- Click Test Scenarios → New Scenario
- Add requests: health check → TTS → reference add → reference list
- In Assertions for the TTS request, add:
- Status =
200 - Header
content-typecontainsaudio - Response time <
30000ms
- Status =
- Click Run to execute
Apidog provides pass/fail reports, response timings, and diffs. Export or schedule runs on CI—no code or test framework setup required.
Advanced Fish Audio S2 API Features
Streaming Audio from Fish Audio S2 API
For real-time playback, set "streaming": true:
import httpx
with httpx.stream(
"POST",
"http://127.0.0.1:8080/v1/tts",
json={
"text": "Streaming audio from the Fish Audio S2 API in real time.",
"format": "wav",
"streaming": True,
},
headers={"Authorization": "Bearer YOUR_API_KEY"},
timeout=None,
) as response:
with open("streamed_output.wav", "wb") as audio_file:
for chunk in response.iter_bytes(chunk_size=4096):
audio_file.write(chunk)
The API returns audio bytes within ~100ms, making it suitable for live voice apps.
Inline Emotion Control via Fish Audio S2 API
Pass emotion tags within the text:
{
"text": "[whispers] The secret is hidden here. [super happy] I found it!",
"format": "wav"
}
No extra parameters needed. Supported tags include [laugh], [cough], [pitch up], [professional broadcast tone], [whisper in small voice].
Conclusion
Fish Audio S2 API exposes a production-grade TTS engine via clean REST endpoints. With support for voice cloning, streaming, and flexible emotion control, it fits a wide range of real-world voice generation tasks. For robust integration:
- Set sampling parameters (
temperature,top_p,repetition_penalty) - Manage reference audio lifecycles
- Maintain a unit test suite for endpoints
Apidog streamlines your workflow: send your first API request in minutes, listen to outputs inline, auto-generate client code, and run automated endpoint tests—no setup required. When sharing specs or documenting for your team, Apidog keeps everything synchronized.
Download Apidog for free and import the Fish Audio S2 API collection to start testing now.
FAQ
What is the Fish Audio S2 API?
The Fish Audio S2 API is the REST interface to Fish Speech S2-Pro, a 4B-parameter text-to-speech model trained on 10 million hours of audio. It supports voice cloning, streaming, emotion control, and 50+ languages via HTTP endpoints under /v1/.
How do I authenticate with the Fish Audio S2 API?
Send a Bearer token in every request header: Authorization: Bearer YOUR_API_KEY. The API key is set at server startup with --api-key. Apidog lets you store this at the environment level for all requests.
Can I unit test Fish Audio S2 API integrations without writing code?
Yes. Apidog's Test Scenarios feature lets you visually build and run unit tests against any endpoint. Define assertions (status, response time, headers) and Apidog executes on demand or CI—no test framework needed.
What audio formats does the Fish Audio S2 API support?
WAV, MP3, or PCM. Set the format with the "format" field in your TTS request. Default is WAV.
How does voice cloning work in Fish Audio S2 API?
Upload a 10–30 second reference audio and its transcript using POST /v1/references/add. Pass the reference ID in TTS requests via "reference_id". No fine-tuning required.
What is the real-time factor of the Fish Audio S2 API?
On a single NVIDIA H200, RTF is 0.195 with streaming enabled—about 5 seconds of audio per second of compute. Time-to-first-audio is ~100ms.
How do I test Fish Audio S2 API responses in Apidog?
When the API returns binary audio, Apidog renders an inline audio player. Listen, inspect headers, and add assertions—no file saving required.
Top comments (0)