tronghieuit

Posted on Apr 9 • Edited on Apr 13

I Built a 1.6M-Parameter Offline Text-to-Speech Engine for Node.js — Here's How

#javascript #node #ai #showdev

Hi everyone! I'm a developer passionate about making AI accessible on low-resource devices. I've been working on speech synthesis for a while, and I wanted to share a project I've been building: TinyTTS.

The idea started from a simple frustration — I needed text-to-speech in a Node.js app, but every option either required Python, called a cloud API, or shipped a massive model. I thought: what if TTS could be as easy as npm install and just work offline?

So I built one from scratch.

TL;DR

1.6M parameters — smallest TTS model I know of that still sounds natural
~3.4 MB ONNX model (auto-downloaded on first use)
44.1 kHz output, ~53x real-time on a laptop CPU
Zero Python dependency — pure Node.js + ONNX Runtime
100% G2P match with the Python version

npm install tiny-tts

const TinyTTS = require('tiny-tts');
const tts = new TinyTTS();
await tts.speak('Hello world!', { output: 'hello.wav' });

The Problem

Most TTS solutions for Node.js fall into one of these categories:

Approach	Downside
Cloud APIs (Google, AWS, Azure)	Requires internet, costs money, privacy concerns
Python wrapper (Coqui, Bark, etc.)	Need Python installed, 100MB–1GB models
System TTS (say.js, espeak)	Robotic quality, platform-dependent
WebSocket to Python server	Extra infra, latency, complexity

I wanted something that's npm install and done. Run on a $5 VPS, a Raspberry Pi, or in a CI pipeline — no cloud, no Python, no hassle.

The Architecture

TinyTTS is an end-to-end VITS-based model compressed down to just 1.62 million parameters:

Text → G2P → Phoneme IDs → ONNX Model → 44.1kHz WAV

How small is 1.6M params?

Model	Parameters	Size
TinyTTS	1.6M	~3.4 MB
Piper	~63M	~63 MB
Kokoro	82M	~330 MB
Coqui XTTS	467M	~1.8 GB

Benchmark (CPU only, same machine)

Engine	Synthesis Time	Audio Duration	RTFx
TinyTTS (ONNX)	92 ms	4.88s	~53x
Piper (ONNX)	112 ms	2.91s	~26x
Kokoro ONNX	933 ms	3.16s	~3x

Usage

API

const TinyTTS = require('tiny-tts');

const tts = new TinyTTS();

await tts.speak('Hello world!', { output: 'hello.wav' });

await tts.speak('This is faster.', {
  output: 'fast.wav',
  speed: 1.5
});

await tts.dispose();

CLI

npx tiny-tts "The weather is nice today." -o weather.wav
npx tiny-tts "Quick test" -o test.wav --speed 1.3

Python

Also available on PyPI with identical output:

pip install tiny-tts

from tiny_tts import TinyTTS
tts = TinyTTS()
tts.speak("Hello world!", output_path="hello.wav")

What's Next

This is just the beginning. Here's what I'm working on:

Improve voice quality — better prosody, more natural intonation, reduce artifacts while keeping the model tiny
More voices — different speakers, genders, and speaking styles
Multi-language support — expanding beyond English to other languages

DEV Community