Hi everyone! I'm a developer passionate about making AI accessible on low-resource devices. I've been working on speech synthesis for a while, and I wanted to share a project I've been building: TinyTTS.
The idea started from a simple frustration — I needed text-to-speech in a Node.js app, but every option either required Python, called a cloud API, or shipped a massive model. I thought: what if TTS could be as easy as npm install and just work offline?
So I built one from scratch.
TL;DR
- 1.6M parameters — smallest TTS model I know of that still sounds natural
- ~3.4 MB ONNX model (auto-downloaded on first use)
- 44.1 kHz output, ~53x real-time on a laptop CPU
- Zero Python dependency — pure Node.js + ONNX Runtime
- 100% G2P match with the Python version
npm install tiny-tts
const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
await tts.speak('Hello world!', { output: 'hello.wav' });
The Problem
Most TTS solutions for Node.js fall into one of these categories:
| Approach | Downside |
|---|---|
| Cloud APIs (Google, AWS, Azure) | Requires internet, costs money, privacy concerns |
| Python wrapper (Coqui, Bark, etc.) | Need Python installed, 100MB–1GB models |
| System TTS (say.js, espeak) | Robotic quality, platform-dependent |
| WebSocket to Python server | Extra infra, latency, complexity |
I wanted something that's npm install and done. Run on a $5 VPS, a Raspberry Pi, or in a CI pipeline — no cloud, no Python, no hassle.
The Architecture
TinyTTS is an end-to-end VITS-based model compressed down to just 1.62 million parameters:
Text → G2P → Phoneme IDs → ONNX Model → 44.1kHz WAV
How small is 1.6M params?
| Model | Parameters | Size |
|---|---|---|
| TinyTTS | 1.6M | ~3.4 MB |
| Piper | ~63M | ~63 MB |
| Kokoro | 82M | ~330 MB |
| Coqui XTTS | 467M | ~1.8 GB |
Benchmark (CPU only, same machine)
| Engine | Synthesis Time | Audio Duration | RTFx |
|---|---|---|---|
| TinyTTS (ONNX) | 92 ms | 4.88s | ~53x |
| Piper (ONNX) | 112 ms | 2.91s | ~26x |
| Kokoro ONNX | 933 ms | 3.16s | ~3x |
Usage
API
const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
await tts.speak('Hello world!', { output: 'hello.wav' });
await tts.speak('This is faster.', {
output: 'fast.wav',
speed: 1.5
});
await tts.dispose();
CLI
npx tiny-tts "The weather is nice today." -o weather.wav
npx tiny-tts "Quick test" -o test.wav --speed 1.3
Python
Also available on PyPI with identical output:
pip install tiny-tts
from tiny_tts import TinyTTS
tts = TinyTTS()
tts.speak("Hello world!", output_path="hello.wav")
What's Next
This is just the beginning. Here's what I'm working on:
- Improve voice quality — better prosody, more natural intonation, reduce artifacts while keeping the model tiny
- More voices — different speakers, genders, and speaking styles
- Multi-language support — expanding beyond English to other languages
Links
- npm: npmjs.com/package/tiny-tts
- PyPI: pypi.org/project/tiny-tts
- GitHub: github.com/tronghieuit/tiny-tts
- Live Demo: huggingface.co/spaces/backtracking/tiny-tts-demo
If you've read this far — try it out and let me know what you think! I'm especially curious about edge use cases: IoT, CI/CD audio generation, accessibility tools, game dev, etc.
Top comments (0)