DEV Community

Yahya Saleh
Yahya Saleh

Posted on • Originally published at voicefrom.ai

Benchmarking five live translation systems with an open-source eval harness (including OpenAI's GPT-Realtime-Translate)

We built an open-source evaluation harness for live speech-to-speech translation and used it to benchmark five platforms head-to-head. This post walks through the methodology (GEMBA-MQM v2 for accuracy, Ear-Voice Span for latency) and the results.

Eval harness: https://github.com/VoiceFrom/live-s2st-eval

Top comments (0)