DEV Community

Cover image for How to Use Fish Audio S2 API: A Complete Guide with Apidog
Wanda
Wanda

Posted on • Originally published at apidog.com

How to Use Fish Audio S2 API: A Complete Guide with Apidog

Fish Audio S2 API is a production-grade text-to-speech REST API powered by a 4-billion-parameter model trained on 10 million hours of audio. It supports voice cloning, streaming, and 50+ languages. To use the Fish Audio S2 API efficiently—including sending requests, managing references, and running unit tests—Apidog is the fastest way to explore, document, and validate every endpoint.

Try Apidog today

Introduction

Modern TTS models have evolved: they're no longer robotic; they whisper, laugh, and shift tone. Fish Audio S2 API is part of this new wave—a 4B-parameter model trained on 10M+ hours of multilingual audio, producing speech almost indistinguishable from a real human.

If you're building podcast automation, interactive assistants, or real-time dubbing, integrating Fish Audio S2 API requires more than a POST request. You must handle authentication, reference audio, streaming, and robust unit testing to ensure production reliability.

💡 Tip: Before your first Fish Audio S2 API call, download Apidog for free. Visually test emotion tags, streaming chunks, voice cloning payloads, and binary audio responses—no code needed. Mock, validate, and listen inline so your TTS integration works from day one.

What Is Fish Audio S2 API?

Fish Audio S2 API is the HTTP interface to Fish Speech S2-Pro, an open-source TTS system using a Dual-Autoregressive (Dual-AR) architecture. The model splits semantic generation (4B params, slow AR over time) from residual codebook generation (400M params, fast AR over depth), enabling high-quality synthesis at a real-time factor (RTF) of 0.195 on a single NVIDIA H200.

Key Capabilities:

Feature Detail
Languages ~50 (English, Chinese, Japanese, Korean, Arabic, French, German, etc.)
Voice cloning 10–30 second reference audio, no fine-tuning required
Inline emotion control Natural-language tags: [laugh], [whispers], [super happy]
Multi-speaker generation Native `<\
Streaming Real-time chunked audio via {% raw %}"streaming": true
Output formats WAV, MP3, PCM
Authentication Bearer token (Authorization: Bearer YOUR_API_KEY)

The API base URL after local deployment is http://127.0.0.1:8080; all endpoints are under the /v1/ namespace.

Getting Started with Fish Audio S2 API and Apidog

Prerequisites for Fish Audio S2 API

You'll need:

  • A running Fish Speech S2-Pro server
  • An API client capable of handling binary audio

Start the server:

python tools/api_server.py \
  --llama-checkpoint-path checkpoints/s2-pro \
  --decoder-checkpoint-path checkpoints/s2-pro/codec.pth \
  --listen 0.0.0.0:8080 \
  --compile \
  --half \
  --api-key YOUR_API_KEY \
  --workers 4
Enter fullscreen mode Exit fullscreen mode
  • --compile enables torch.compile for ~10x faster inference (with one-time warmup).
  • --half uses FP16 for reduced GPU memory.

Health check:

curl http://127.0.0.1:8080/v1/health
# {"status":"ok"}
Enter fullscreen mode Exit fullscreen mode

Setting Up Fish Audio S2 API in Apidog

  1. Download Apidog for free and create a new HTTP project.
  2. Set the base URL to http://127.0.0.1:8080 under Environments.
  3. Add a global header:
   Authorization: Bearer YOUR_API_KEY
Enter fullscreen mode Exit fullscreen mode

This global API key applies to all requests—no manual header setup per request. Easily switch between dev, staging, and production environments.

Making Your First Fish Audio S2 API Request in Apidog

Testing the Fish Audio S2 API Text-to-Speech Endpoint

  • Endpoint: POST /v1/tts
  • In Apidog: Create a new POST request, set the URL, and use this JSON body:
{
  "text": "Hello! This is a test of the Fish Audio S2 API.",
  "format": "wav",
  "streaming": false,
  "temperature": 0.8,
  "top_p": 0.8,
  "repetition_penalty": 1.1,
  "max_new_tokens": 1024
}
Enter fullscreen mode Exit fullscreen mode

Full TTS request schema:

Parameter Type Default Description
text string required Text to synthesize
format string "wav" Output format: wav, mp3, pcm
chunk_length int 200 Synthesis chunk size (100–300)
seed int null Fix seed for reproducibility
streaming bool false Return audio in real-time chunks
max_new_tokens int 1024 Max tokens to generate
temperature float 0.8 Sampling randomness (0.1–1.0)
top_p float 0.8 Nucleus sampling threshold (0.1–1.0)
repetition_penalty float 1.1 Penalize repeated sequences (0.9–2.0)
use_memory_cache string "off" Cache reference encoding in memory

In Apidog: Click Send. The API returns raw audio bytes (audio/wav). Apidog will display an inline audio player so you can listen to the output instantly.

Voice Cloning with Fish Audio S2 API

Uploading a Reference Audio to Fish Audio S2 API via Apidog

The API supports zero-shot voice cloning via the references field in TTS requests. Provide a base64-encoded audio sample and its transcript.

Upload a reference:

  • Endpoint: POST /v1/references/add
{
  "id": "my-voice-clone",
  "text": "This is the reference transcription matching the audio.",
  "audio": "<base64-encoded-wav-bytes>"
}
Enter fullscreen mode Exit fullscreen mode

In Apidog: Use Binary or Form Data body type to upload the audio file and text together.

Sample response:

{
  "success": true,
  "message": "Reference added successfully",
  "reference_id": "my-voice-clone"
}
Enter fullscreen mode Exit fullscreen mode

Use the reference in TTS:

{
  "text": "This sentence will be spoken in the cloned voice.",
  "reference_id": "my-voice-clone",
  "format": "mp3"
}
Enter fullscreen mode Exit fullscreen mode

Apidog's Reference Management lets you save these as templates—just change the reference_id to test other voices.

How to Unit Test Fish Audio S2 API Integrations

Why Unit Tests Matter

Fish Audio S2 API integrations can fail if reference IDs expire, parameters go out of range, streaming payloads are mishandled, or formats mismatch. Automated unit tests catch these regressions before they impact users.

Writing Unit Tests for Fish Audio S2 API with Python

Below is a pytest suite using httpx for core endpoint coverage:

import pytest
import httpx
import base64

BASE_URL = "http://127.0.0.1:8080"
API_KEY = "YOUR_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

class TestFishAudioS2API:
    def test_health_check(self):
        response = httpx.get(f"{BASE_URL}/v1/health", headers=HEADERS)
        assert response.status_code == 200
        assert response.json()["status"] == "ok"

    def test_tts_basic_request(self):
        payload = {
            "text": "Unit test: verifying Fish Audio S2 API TTS output.",
            "format": "wav",
            "seed": 42,
        }
        response = httpx.post(
            f"{BASE_URL}/v1/tts",
            json=payload,
            headers=HEADERS,
            timeout=60,
        )
        assert response.status_code == 200
        assert response.headers["content-type"] == "audio/wav"
        assert len(response.content) > 1000

    def test_tts_invalid_temperature_raises_error(self):
        payload = {"text": "test", "temperature": 99.0}
        response = httpx.post(
            f"{BASE_URL}/v1/tts",
            json=payload,
            headers=HEADERS,
            timeout=30,
        )
        assert response.status_code == 422

    def test_reference_add_and_list(self):
        with open("test_reference.wav", "rb") as f:
            audio_b64 = base64.b64encode(f.read()).decode()

        add_response = httpx.post(
            f"{BASE_URL}/v1/references/add",
            json={
                "id": "unit-test-voice",
                "text": "This is a unit test reference audio.",
                "audio": audio_b64,
            },
            headers=HEADERS,
        )
        assert add_response.json()["success"] is True

        list_response = httpx.get(
            f"{BASE_URL}/v1/references/list", headers=HEADERS
        )
        assert "unit-test-voice" in list_response.json()["reference_ids"]

        httpx.request(
            "DELETE",
            f"{BASE_URL}/v1/references/delete",
            json={"reference_id": "unit-test-voice"},
            headers=HEADERS,
        )
Enter fullscreen mode Exit fullscreen mode

Run tests:

pytest test_fish_audio_s2_api.py -v
Enter fullscreen mode Exit fullscreen mode

Running Fish Audio S2 API Unit Tests with Apidog

Apidog's Test Scenarios automates endpoint checks visually:

  1. Open your Fish Audio S2 API collection
  2. Click Test ScenariosNew Scenario
  3. Add requests: health check → TTS → reference add → reference list
  4. In Assertions for the TTS request, add:
    • Status = 200
    • Header content-type contains audio
    • Response time < 30000ms
  5. Click Run to execute

Apidog provides pass/fail reports, response timings, and diffs. Export or schedule runs on CI—no code or test framework setup required.

Advanced Fish Audio S2 API Features

Streaming Audio from Fish Audio S2 API

For real-time playback, set "streaming": true:

import httpx

with httpx.stream(
    "POST",
    "http://127.0.0.1:8080/v1/tts",
    json={
        "text": "Streaming audio from the Fish Audio S2 API in real time.",
        "format": "wav",
        "streaming": True,
    },
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    timeout=None,
) as response:
    with open("streamed_output.wav", "wb") as audio_file:
        for chunk in response.iter_bytes(chunk_size=4096):
            audio_file.write(chunk)
Enter fullscreen mode Exit fullscreen mode

The API returns audio bytes within ~100ms, making it suitable for live voice apps.

Inline Emotion Control via Fish Audio S2 API

Pass emotion tags within the text:

{
  "text": "[whispers] The secret is hidden here. [super happy] I found it!",
  "format": "wav"
}
Enter fullscreen mode Exit fullscreen mode

No extra parameters needed. Supported tags include [laugh], [cough], [pitch up], [professional broadcast tone], [whisper in small voice].

Conclusion

Fish Audio S2 API exposes a production-grade TTS engine via clean REST endpoints. With support for voice cloning, streaming, and flexible emotion control, it fits a wide range of real-world voice generation tasks. For robust integration:

  • Set sampling parameters (temperature, top_p, repetition_penalty)
  • Manage reference audio lifecycles
  • Maintain a unit test suite for endpoints

Apidog streamlines your workflow: send your first API request in minutes, listen to outputs inline, auto-generate client code, and run automated endpoint tests—no setup required. When sharing specs or documenting for your team, Apidog keeps everything synchronized.

Download Apidog for free and import the Fish Audio S2 API collection to start testing now.

FAQ

What is the Fish Audio S2 API?

The Fish Audio S2 API is the REST interface to Fish Speech S2-Pro, a 4B-parameter text-to-speech model trained on 10 million hours of audio. It supports voice cloning, streaming, emotion control, and 50+ languages via HTTP endpoints under /v1/.

How do I authenticate with the Fish Audio S2 API?

Send a Bearer token in every request header: Authorization: Bearer YOUR_API_KEY. The API key is set at server startup with --api-key. Apidog lets you store this at the environment level for all requests.

Can I unit test Fish Audio S2 API integrations without writing code?

Yes. Apidog's Test Scenarios feature lets you visually build and run unit tests against any endpoint. Define assertions (status, response time, headers) and Apidog executes on demand or CI—no test framework needed.

What audio formats does the Fish Audio S2 API support?

WAV, MP3, or PCM. Set the format with the "format" field in your TTS request. Default is WAV.

How does voice cloning work in Fish Audio S2 API?

Upload a 10–30 second reference audio and its transcript using POST /v1/references/add. Pass the reference ID in TTS requests via "reference_id". No fine-tuning required.

What is the real-time factor of the Fish Audio S2 API?

On a single NVIDIA H200, RTF is 0.195 with streaming enabled—about 5 seconds of audio per second of compute. Time-to-first-audio is ~100ms.

How do I test Fish Audio S2 API responses in Apidog?

When the API returns binary audio, Apidog renders an inline audio player. Listen, inspect headers, and add assertions—no file saving required.

Top comments (0)