Wanda

Posted on Mar 18 • Originally published at apidog.com

How to Use Fish Audio S2 API: A Complete Guide with Apidog

Fish Audio S2 API is a production-grade text-to-speech REST API powered by a 4-billion-parameter model trained on 10 million hours of audio. It supports voice cloning, streaming, and 50+ languages. To use the Fish Audio S2 API efficiently—including sending requests, managing references, and running unit tests—Apidog is the fastest way to explore, document, and validate every endpoint.

Try Apidog today

Introduction

Modern TTS models have evolved: they're no longer robotic; they whisper, laugh, and shift tone. Fish Audio S2 API is part of this new wave—a 4B-parameter model trained on 10M+ hours of multilingual audio, producing speech almost indistinguishable from a real human.

If you're building podcast automation, interactive assistants, or real-time dubbing, integrating Fish Audio S2 API requires more than a POST request. You must handle authentication, reference audio, streaming, and robust unit testing to ensure production reliability.

💡 Tip: Before your first Fish Audio S2 API call, download Apidog for free. Visually test emotion tags, streaming chunks, voice cloning payloads, and binary audio responses—no code needed. Mock, validate, and listen inline so your TTS integration works from day one.

What Is Fish Audio S2 API?

Fish Audio S2 API is the HTTP interface to Fish Speech S2-Pro, an open-source TTS system using a Dual-Autoregressive (Dual-AR) architecture. The model splits semantic generation (4B params, slow AR over time) from residual codebook generation (400M params, fast AR over depth), enabling high-quality synthesis at a real-time factor (RTF) of 0.195 on a single NVIDIA H200.

Key Capabilities:

Feature	Detail
Languages	~50 (English, Chinese, Japanese, Korean, Arabic, French, German, etc.)
Voice cloning	10–30 second reference audio, no fine-tuning required
Inline emotion control	Natural-language tags: `[laugh]`, `[whispers]`, `[super happy]`
Multi-speaker generation	Native `<\
Streaming	Real-time chunked audio via {% raw %}`"streaming": true`
Output formats	WAV, MP3, PCM
Authentication	Bearer token (`Authorization: Bearer YOUR_API_KEY`)

The API base URL after local deployment is http://127.0.0.1:8080; all endpoints are under the /v1/ namespace.

Getting Started with Fish Audio S2 API and Apidog

Prerequisites for Fish Audio S2 API

You'll need:

A running Fish Speech S2-Pro server
An API client capable of handling binary audio

Start the server:

python tools/api_server.py \
  --llama-checkpoint-path checkpoints/s2-pro \
  --decoder-checkpoint-path checkpoints/s2-pro/codec.pth \
  --listen 0.0.0.0:8080 \
  --compile \
  --half \
  --api-key YOUR_API_KEY \
  --workers 4

--compile enables torch.compile for ~10x faster inference (with one-time warmup).
--half uses FP16 for reduced GPU memory.

Health check:

curl http://127.0.0.1:8080/v1/health
# {"status":"ok"}

Setting Up Fish Audio S2 API in Apidog

Download Apidog for free and create a new HTTP project.
Set the base URL to http://127.0.0.1:8080 under Environments.
Add a global header:

   Authorization: Bearer YOUR_API_KEY

This global API key applies to all requests—no manual header setup per request. Easily switch between dev, staging, and production environments.

Making Your First Fish Audio S2 API Request in Apidog

Testing the Fish Audio S2 API Text-to-Speech Endpoint

Endpoint: POST /v1/tts
In Apidog: Create a new POST request, set the URL, and use this JSON body:

{
  "text": "Hello! This is a test of the Fish Audio S2 API.",
  "format": "wav",
  "streaming": false,
  "temperature": 0.8,
  "top_p": 0.8,
  "repetition_penalty": 1.1,
  "max_new_tokens": 1024
}

Full TTS request schema:

Parameter	Type	Default	Description
`text`	string	required	Text to synthesize
`format`	string	`"wav"`	Output format: `wav`, `mp3`, `pcm`
`chunk_length`	int	200	Synthesis chunk size (100–300)
`seed`	int	null	Fix seed for reproducibility
`streaming`	bool	false	Return audio in real-time chunks
`max_new_tokens`	int	1024	Max tokens to generate
`temperature`	float	0.8	Sampling randomness (0.1–1.0)
`top_p`	float	0.8	Nucleus sampling threshold (0.1–1.0)
`repetition_penalty`	float	1.1	Penalize repeated sequences (0.9–2.0)
`use_memory_cache`	string	`"off"`	Cache reference encoding in memory

In Apidog: Click Send. The API returns raw audio bytes (audio/wav). Apidog will display an inline audio player so you can listen to the output instantly.

Voice Cloning with Fish Audio S2 API

Uploading a Reference Audio to Fish Audio S2 API via Apidog

The API supports zero-shot voice cloning via the references field in TTS requests. Provide a base64-encoded audio sample and its transcript.

Upload a reference:

Endpoint: POST /v1/references/add

{
  "id": "my-voice-clone",
  "text": "This is the reference transcription matching the audio.",
  "audio": "<base64-encoded-wav-bytes>"
}

In Apidog: Use Binary or Form Data body type to upload the audio file and text together.

Sample response:

{
  "success": true,
  "message": "Reference added successfully",
  "reference_id": "my-voice-clone"
}

Use the reference in TTS:

{
  "text": "This sentence will be spoken in the cloned voice.",
  "reference_id": "my-voice-clone",
  "format": "mp3"
}

Apidog's Reference Management lets you save these as templates—just change the reference_id to test other voices.

How to Unit Test Fish Audio S2 API Integrations

Why Unit Tests Matter

Fish Audio S2 API integrations can fail if reference IDs expire, parameters go out of range, streaming payloads are mishandled, or formats mismatch. Automated unit tests catch these regressions before they impact users.

Writing Unit Tests for Fish Audio S2 API with Python

Below is a pytest suite using httpx for core endpoint coverage:

import pytest
import httpx
import base64

BASE_URL = "http://127.0.0.1:8080"
API_KEY = "YOUR_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

class TestFishAudioS2API:
    def test_health_check(self):
        response = httpx.get(f"{BASE_URL}/v1/health", headers=HEADERS)
        assert response.status_code == 200
        assert response.json()["status"] == "ok"

    def test_tts_basic_request(self):
        payload = {
            "text": "Unit test: verifying Fish Audio S2 API TTS output.",
            "format": "wav",
            "seed": 42,
        }
        response = httpx.post(
            f"{BASE_URL}/v1/tts",
            json=payload,
            headers=HEADERS,
            timeout=60,
        )
        assert response.status_code == 200
        assert response.headers["content-type"] == "audio/wav"
        assert len(response.content) > 1000

    def test_tts_invalid_temperature_raises_error(self):
        payload = {"text": "test", "temperature": 99.0}
        response = httpx.post(
            f"{BASE_URL}/v1/tts",
            json=payload,
            headers=HEADERS,
            timeout=30,
        )
        assert response.status_code == 422

    def test_reference_add_and_list(self):
        with open("test_reference.wav", "rb") as f:
            audio_b64 = base64.b64encode(f.read()).decode()

        add_response = httpx.post(
            f"{BASE_URL}/v1/references/add",
            json={
                "id": "unit-test-voice",
                "text": "This is a unit test reference audio.",
                "audio": audio_b64,
            },
            headers=HEADERS,
        )
        assert add_response.json()["success"] is True

        list_response = httpx.get(
            f"{BASE_URL}/v1/references/list", headers=HEADERS
        )
        assert "unit-test-voice" in list_response.json()["reference_ids"]

        httpx.request(
            "DELETE",
            f"{BASE_URL}/v1/references/delete",
            json={"reference_id": "unit-test-voice"},
            headers=HEADERS,
        )

Run tests:

pytest test_fish_audio_s2_api.py -v

Running Fish Audio S2 API Unit Tests with Apidog

Apidog's Test Scenarios automates endpoint checks visually:

Open your Fish Audio S2 API collection
Click Test Scenarios → New Scenario
Add requests: health check → TTS → reference add → reference list
In Assertions for the TTS request, add:
- Status = 200
- Header content-type contains audio
- Response time < 30000ms
Click Run to execute

Apidog provides pass/fail reports, response timings, and diffs. Export or schedule runs on CI—no code or test framework setup required.

Advanced Fish Audio S2 API Features

Streaming Audio from Fish Audio S2 API

For real-time playback, set "streaming": true:

import httpx

with httpx.stream(
    "POST",
    "http://127.0.0.1:8080/v1/tts",
    json={
        "text": "Streaming audio from the Fish Audio S2 API in real time.",
        "format": "wav",
        "streaming": True,
    },
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    timeout=None,
) as response:
    with open("streamed_output.wav", "wb") as audio_file:
        for chunk in response.iter_bytes(chunk_size=4096):
            audio_file.write(chunk)

The API returns audio bytes within ~100ms, making it suitable for live voice apps.

Inline Emotion Control via Fish Audio S2 API

Pass emotion tags within the text:

{
  "text": "[whispers] The secret is hidden here. [super happy] I found it!",
  "format": "wav"
}

No extra parameters needed. Supported tags include [laugh], [cough], [pitch up], [professional broadcast tone], [whisper in small voice].

Conclusion

Fish Audio S2 API exposes a production-grade TTS engine via clean REST endpoints. With support for voice cloning, streaming, and flexible emotion control, it fits a wide range of real-world voice generation tasks. For robust integration:

Set sampling parameters (temperature, top_p, repetition_penalty)
Manage reference audio lifecycles
Maintain a unit test suite for endpoints

Apidog streamlines your workflow: send your first API request in minutes, listen to outputs inline, auto-generate client code, and run automated endpoint tests—no setup required. When sharing specs or documenting for your team, Apidog keeps everything synchronized.

Download Apidog for free and import the Fish Audio S2 API collection to start testing now.

FAQ

What is the Fish Audio S2 API?

The Fish Audio S2 API is the REST interface to Fish Speech S2-Pro, a 4B-parameter text-to-speech model trained on 10 million hours of audio. It supports voice cloning, streaming, emotion control, and 50+ languages via HTTP endpoints under /v1/.

How do I authenticate with the Fish Audio S2 API?

Send a Bearer token in every request header: Authorization: Bearer YOUR_API_KEY. The API key is set at server startup with --api-key. Apidog lets you store this at the environment level for all requests.

Can I unit test Fish Audio S2 API integrations without writing code?

Yes. Apidog's Test Scenarios feature lets you visually build and run unit tests against any endpoint. Define assertions (status, response time, headers) and Apidog executes on demand or CI—no test framework needed.

What audio formats does the Fish Audio S2 API support?

WAV, MP3, or PCM. Set the format with the "format" field in your TTS request. Default is WAV.

How does voice cloning work in Fish Audio S2 API?

Upload a 10–30 second reference audio and its transcript using POST /v1/references/add. Pass the reference ID in TTS requests via "reference_id". No fine-tuning required.

What is the real-time factor of the Fish Audio S2 API?

On a single NVIDIA H200, RTF is 0.195 with streaming enabled—about 5 seconds of audio per second of compute. Time-to-first-audio is ~100ms.

How do I test Fish Audio S2 API responses in Apidog?

When the API returns binary audio, Apidog renders an inline audio player. Listen, inspect headers, and add assertions—no file saving required.