Voice cloning models, measured across five languages

ivan-digital — Thu, 02 Jul 2026 20:06:47 +0000

I benchmarked local voice-cloning models across English, German, Modern Standard Arabic, Spanish, and Mandarin Chinese.

Models:

OmniVoice int8
Chatterbox Multilingual fp16
VoxCPM2 bf16
Fish Audio S2 Pro fp16

The benchmark uses Google FLEURS references. Each row includes reference audio, generated audio, speaker similarity, WER/CER, generated audio length, and RTF.

Main result in this run: OmniVoice was the strongest all-around row set. VoxCPM2 bf16 was especially strong on Arabic speaker match. Fish Audio S2 Pro showed strong German/Arabic similarity but slower RTF. Chatterbox Multilingual was competitive on Arabic and Spanish.

This is not a human MOS study. It is an engineering benchmark for comparing model behavior inside one local speech stack.

Full post with the table and audio samples:

https://www.soniqo.audio/blog/voice-cloning-benchmarks

Speech Studio, the desktop app using the same stack:

https://www.soniqo.audio/speech-studio

Building an NLP Pipeline to Classify 225,000 Central Bank Sentences

ivan-digital — Thu, 09 Apr 2026 21:21:13 +0000

The Problem

Central banks communicate through dense, jargon-heavy documents — policy statements, meeting minutes, press conferences. A single Fed statement is 1,500+ words. The ECB publishes minutes in 10,000+ word documents. Multiply that by 26 central banks, each publishing monthly or quarterly, and you have an impossible amount of text to track manually.

I wanted to answer a simple question: which central banks are turning hawkish and which are turning dovish — right now?

The Approach

Instead of summarizing entire documents, I break them into individual sentences and classify each one. Every sentence gets two labels:

Sentiment (what policy direction does it signal?):

rate_hike, rate_cut, rate_hold
guidance_hawkish, guidance_dovish
dissent_hawkish, dissent_dovish
liquidity_easing, liquidity_tightening
neutral, irrelevant

Topic (what economic area?):

mp_inflation, mp_interest_rate, mp_economic_activity
mp_labor_market, mp_exchange_rate, mp_credit
financial_stability, fiscal_policy, governance

This gives a granular view — not just "the Fed is hawkish" but "the Fed's inflation language is hawkish while its labor market language is turning dovish."

Architecture

The pipeline has four stages:

1. Crawling

Each central bank has a custom async crawler (Python + aiohttp). Some banks publish clean HTML, others only PDFs, a few require Playwright for JavaScript-rendered pages. The crawlers run daily via Airflow.

Sources per bank:

Policy statements and decisions
Meeting minutes
Press conference transcripts
Speeches (for some banks)

2. Sentence Splitting

Documents are split into sentences using rule-based splitting tuned for central bank language. This matters because naive splitting breaks on abbreviations like "Fed." or "Q4." or numbered lists common in policy documents.

3. Classification

Each sentence is classified by an LLM with bank-specific prompt rules. The key insight: central bank language is domain-specific enough that generic sentiment analysis fails badly.

Examples that trip up generic classifiers:

Sentence	Naive Classification	Correct
"Future monetary policy decisions will be conditional on the inflation outlook"	guidance_hawkish	neutral (boilerplate)
"The member voted against the rate increase"	dissent_hawkish	dissent_dovish (wanted lower rates)
"Average interest rate on ruble loans rose to 8.5%"	rate_hike	neutral (market rate description, not policy)

To catch these errors, each sentence is classified twice at different temperatures (0.0 and 0.1). Disagreements are flagged for review.

4. Aggregation

Sentence-level classifications are aggregated into document-level and bank-level metrics:

Hawk/dove ratio per document
Stance shifts over time
Dissent tracking (who dissented and in which direction)

What I Learned

Dissent direction is counterintuitive. If the majority voted to hike rates and one member dissented, that dissent is dovish — the dissenter wanted lower rates. This seems obvious in retrospect, but getting the prompts right took several iterations.

Boilerplate is the enemy. Every central bank repeats the same conditional phrases meeting after meeting: "future decisions will depend on incoming data." These aren't signals — they're filler. The classifier needed explicit examples of common boilerplate to avoid false positives.

Bank-specific rules matter. The PBOC communicates completely differently from the Fed. PBOC statements are short and formulaic. Fed minutes are discursive with extensive debate. The Bank of Russia quarterly reviews describe market conditions that look like policy decisions but aren't. Each required tailored prompt rules.

Current Scale

26 central banks: Fed, ECB, BOJ, BoE, PBOC, RBI, BCB, BoC, RBA, TCMB, SNB, CBR, BoK, Banxico, SARB, CBN, MAS, BoI, NBP, Norges, Riksbank, RBNZ, MNB, NBS
225,000+ classified sentences
12 sentiment classes, 9 topic categories
Daily updates via Airflow

The Divergences Right Now

Some current policy stances that stand out:

Bank	Rate	Stance	Notable
BOJ	0.75%	Cautiously hawkish	Normalizing after decades at zero
SNB	0.00%	Neutral	Back to the floor
TCMB	37%	Hawkish	Emergency tightening
PBOC	3.00%	Dovish	Supporting growth
BCB	14.75%	Hawkish	Among highest G20
Fed	3.75%	Mixed	Cutting but cautious language

Live Dashboard

The full dashboard is at monetary.live — each bank has its own page with statement history, sentiment breakdowns, and policy metrics.

Also tracking tech trends with a separate pipeline at pulsar.ivan.digital (arXiv papers, GitHub repos, Reddit discussions).

Tech Stack

Python async crawlers (aiohttp, Playwright)
LLM classification with self-validation
SQLite for storage
Airflow for orchestration
Firebase Hosting for the dashboard
structlog for logging

Would love feedback on the methodology. If you work with central bank text data or NLP for finance, I'd be curious to hear what approaches you've tried.

DEV Community: ivan-digital