DEV Community

WonderLab
WonderLab

Posted on

Open Source Project of the Day (Part 11): Supertonic - Lightning-Fast On-Device Multilingual TTS

Introduction

"What if speech synthesis could run on your device at 1000+ characters per second — completely offline, supporting 50+ languages?"

This is Part 11 of the "Open Source Project of the Day" series. Today we explore Supertonic (GitHub).

Traditional TTS systems either rely on cloud APIs (with latency and privacy concerns) or are slow with poor quality. Supertonic uses ONNX Runtime to deliver blazing-fast, high-quality, fully on-device speech synthesis — reaching 1000+ characters/second on an M1 Mac, supporting 50+ languages, with built-in intelligent text normalization requiring no preprocessing. Speech synthesis truly "flies."

What You'll Learn

  • Supertonic's core architecture and technical characteristics
  • How to use Supertonic for TTS across various platforms
  • The advantages and implementation of ONNX Runtime
  • How built-in text normalization works
  • Streaming processing and real-time speech synthesis
  • Comparative analysis with other TTS systems
  • How to start building applications with Supertonic

Prerequisites

  • Basic understanding of TTS (Text-to-Speech)
  • Familiarity with at least one programming language (Python, JavaScript, Swift, Java, etc.)
  • Basic understanding of ONNX concepts (optional)
  • Basic knowledge of on-device AI (optional)

Project Background

Project Introduction

Supertonic is a lightning-fast, on-device, multilingual Text-to-Speech (TTS) system designed for ultimate performance and minimal computational overhead. Running on ONNX Runtime, it operates entirely on-device — no cloud, no API calls, no privacy concerns.

Core problems the project solves:

  • Cloud TTS has latency and privacy issues
  • Traditional on-device TTS is slow and low quality
  • Lack of multilingual support
  • Text normalization requires preprocessing
  • Different platforms need different implementations

Target user groups:

  • Mobile app developers needing on-device TTS
  • Desktop app developers needing offline speech synthesis
  • Developers with privacy requirements
  • Internationalized app developers needing multilingual TTS
  • Developers requiring extreme performance

Author/Team Introduction

Team: Supertone Inc.

  • Background: Technology company focused on voice technology and AI
  • Contributors: 4 contributors, including the core development team
  • Philosophy: Build a blazing-fast, high-quality, fully on-device TTS system

Project creation date: 2024 (based on GitHub activity, an actively maintained project)

Project Stats

  • GitHub Stars: 2.6k+ (rapidly and continuously growing)
  • 🍴 Forks: 232+
  • 📦 Version: v2.0.0 (latest version, released January 6, 2026)
  • 📄 License: MIT (code), OpenRAIL-M (model)
  • 🌐 Demo: Hugging Face Spaces
  • 📚 Documentation: GitHub README includes complete usage guides
  • 💬 Community: Active GitHub Issues

Project development history:

  • 2024: Project created, released v1
  • 2024-2025: Continuous optimization, added multilingual support
  • 2025: Released v2, significant performance improvements
  • 2026: Continuous iteration, growing community activity

Main Features

Core Purpose

Supertonic's core purpose is to provide a blazing-fast, high-quality, fully on-device TTS system, with main features including:

  1. Blazing-fast speech synthesis: Reaches 1000+ characters/second on an M1 Mac
  2. Multilingual support: Supports 5 languages including English, Chinese, Korean, Spanish, and Portuguese
  3. Intelligent text normalization: Built-in text normalization requiring no preprocessing
  4. Streaming processing: Supports streaming TTS for real-time speech synthesis
  5. Fully offline: No cloud required, runs entirely on-device

Use Cases

  1. Mobile applications

    • Reading assistant apps
    • Voice navigation apps
    • Accessibility apps
  2. Desktop applications

    • E-book readers
    • Document reading tools
    • Voice assistants
  3. Web applications

    • Browser extensions
    • Online speech synthesis services
    • Voice chat applications
  4. IoT devices

    • Smart speakers
    • Voice interaction devices
    • Edge computing devices

Quick Start

Installation

Supertonic supports multiple programming languages and platforms:

Python:

# Install Python package
pip install supertonic

# Usage example
from supertonic import SupertonicTTS

tts = SupertonicTTS()
audio = tts.synthesize("Hello, world!")
Enter fullscreen mode Exit fullscreen mode

JavaScript/Node.js:

# Install npm package
npm install supertonic

# Usage example
const { SupertonicTTS } = require('supertonic');

const tts = new SupertonicTTS();
const audio = await tts.synthesize("Hello, world!");
Enter fullscreen mode Exit fullscreen mode

Other platforms:

  • C++: Use the implementation in the cpp directory
  • Swift: Use the implementation in the swift directory
  • Java: Use the implementation in the java directory
  • C#: Use the implementation in the csharp directory
  • Go: Use the implementation in the go directory
  • Rust: Use the implementation in the rust directory
  • Flutter: Use the implementation in the flutter directory
  • Web: Use the implementation in the web directory

Simplest Usage Examples

Python example:

from supertonic import SupertonicTTS

# Initialize TTS engine
tts = SupertonicTTS()

# Synthesize speech
text = "Supertonic is a lightning-fast, on-device TTS system."
audio = tts.synthesize(text)

# Save audio file
with open("output.wav", "wb") as f:
    f.write(audio)
Enter fullscreen mode Exit fullscreen mode

JavaScript example:

const { SupertonicTTS } = require('supertonic');

async function synthesize() {
    const tts = new SupertonicTTS();
    const audio = await tts.synthesize("Supertonic is lightning-fast!");
    // Process audio data
    console.log("Audio generated:", audio.length, "bytes");
}

synthesize();
Enter fullscreen mode Exit fullscreen mode

Core Features

  • Blazing-fast performance: 1000+ characters/second on M1 Mac, far surpassing traditional TTS systems
  • Multilingual support: Supports 5 major international languages
  • Intelligent text normalization: Built-in text normalization handles numbers, dates, abbreviations, and complex expressions
  • Streaming processing: Supports streaming TTS for real-time speech synthesis
  • Fully offline: No cloud required, runs entirely on-device, protecting privacy
  • Cross-platform support: Supports C++, Swift, JavaScript, Java, C#, Go, Rust, Flutter, Web, and more
  • ONNX Runtime: Based on ONNX Runtime for efficient inference
  • High-quality speech: Generates natural, clear speech

Project Advantages

Comparison Supertonic Cloud TTS Traditional On-Device TTS
Speed ✅ 1000+ chars/sec ⚠️ Network-dependent ❌ Slow
Privacy ✅ Fully local ❌ Data uploaded ✅ Local
Latency ✅ Ultra-low ❌ Network latency ⚠️ Moderate
Multilingual ✅ 5 languages ✅ Supported ⚠️ Limited
Text normalization ✅ Built-in intelligent processing ⚠️ Preprocessing required ❌ Preprocessing required
Offline use ✅ Fully offline ❌ Requires network ✅ Offline
Cost ✅ Free and open source ❌ API fees ✅ Free

Why choose Supertonic?

Compared to cloud TTS and traditional on-device TTS, Supertonic provides blazing-fast performance, full offline capability, intelligent text normalization, and multilingual support — making it the ideal choice for on-device TTS.


Detailed Project Analysis

Architecture Design

Supertonic uses ONNX Runtime as its inference engine for efficient on-device TTS.

Core Architecture

Supertonic TTS System
├── Text Normalization
│   ├── Number processing
│   ├── Date/time processing
│   ├── Abbreviation expansion
│   └── Multilingual support
├── Text-to-Latent
│   ├── Flow Matching model
│   ├── Length-Aware RoPE
│   └── Text-speech alignment
├── Latent-to-Speech
│   ├── Speech Autoencoder
│   ├── Streaming processing
│   └── Audio generation
└── ONNX Runtime (inference engine)
    ├── Model optimization
    ├── Hardware acceleration
    └── Cross-platform support
Enter fullscreen mode Exit fullscreen mode

ONNX Runtime Advantages

ONNX Runtime provides the following advantages:

  • Cross-platform: Unified model format, supports multiple platforms
  • Hardware acceleration: Supports GPU, NPU, and other hardware acceleration
  • Model optimization: Automatically optimizes model inference performance
  • Easy deployment: Models can be deployed directly after export

Text Normalization

Supertonic has built-in intelligent text normalization that handles:

  • Numbers: 123 → "one hundred twenty-three"
  • Dates: 2024-01-01 → "January first, twenty twenty-four"
  • Times: 2:30 → "two thirty"
  • Abbreviations: Dr. → "Doctor"
  • Units: 30kph → "thirty kilometers per hour"
  • Technical abbreviations: h → "hours"

Advantages:

  • No preprocessing required, directly handles raw text
  • Intelligently recognizes context for correct abbreviation expansion
  • Supports multiple languages, each with dedicated normalization rules

Streaming Processing

Supertonic supports streaming TTS for real-time speech synthesis:

Workflow:

  1. Text chunking
  2. Audio generation chunk by chunk
  3. Real-time audio stream output
  4. Low-latency response

Advantages:

  • Low latency, suitable for real-time applications
  • Low memory usage, suitable for mobile devices
  • Great user experience, fast response

Multilingual Support

Supertonic supports 5 languages:

English, Chinese, Korean, Spanish, and Portuguese

Each language has dedicated:

  • Text normalization rules
  • Speech models
  • Pronunciation dictionaries

Performance Optimization

Supertonic achieves blazing-fast performance through multiple techniques:

Model Optimization

  • Model compression: Reduce model size, improve inference speed
  • Quantization: Use INT8 quantization to boost speed while maintaining quality
  • Operator fusion: Merge multiple operators to reduce computational overhead

Hardware Acceleration

  • GPU acceleration: Leverage GPU parallel computing capabilities
  • NPU acceleration: Supports NPU hardware acceleration (e.g., Apple Neural Engine)
  • CPU optimization: SIMD optimization for CPUs

Inference Optimization

  • Batch processing: Process multiple requests in batches
  • Caching: Cache audio results for frequently used text
  • Preloading: Preload models into memory

Application Cases

Multiple projects are built on Supertonic:

  1. TLDRL: Chrome extension, free on-device TTS that can read any webpage aloud
  2. Read Aloud: Open-source TTS browser extension supporting Chrome and Edge
  3. PageEcho: iOS e-book reader app
  4. VoiceChat: On-device voice-to-voice LLM chatbot in the browser
  5. OmniAvatar: Generate talking avatar videos from photos and voice
  6. CopiloTTS: Kotlin multiplatform TTS SDK
  7. Voice Mixer: PyQt5 tool for mixing and modifying voice styles
  8. Supertonic MNN: Lightweight library based on MNN (fp32/fp16/int8)
  9. Transformers.js: Hugging Face's JS library with Supertonic support
  10. Pinokio: One-click local cloud for Mac, Windows, and Linux

Technical Papers

Supertonic is based on three core papers:

  1. SupertonicTTS: Main Architecture

    • Introduces the overall architecture of SupertonicTTS
    • Includes the speech autoencoder and Flow Matching-based text-to-latent module
    • Efficient design choices
  2. Length-Aware RoPE: Text-Speech Alignment

    • Proposes Length-Aware Rotary Position Embedding (LARoPE)
    • Improves text-speech alignment in cross-attention mechanisms
  3. Self-Purifying Flow Matching: Training with Noisy Labels

    • Describes the self-purification technique
    • Robust training of Flow Matching models using noisy or unreliable labels

Project Resources

Official Resources

Who Should Use This

Supertonic is especially suitable for: Mobile app developers needing on-device TTS, desktop app developers needing offline speech synthesis, developers with privacy requirements, internationalized app developers needing multilingual TTS, developers requiring extreme performance, and developers needing real-time speech synthesis.

Not suitable for: Users who only need cloud TTS, scenarios that don't require multilingual support, extreme edge cases with strict model size constraints.


Welcome to visit my personal homepage for more useful knowledge and interesting products

Top comments (0)