We built an open-source evaluation harness for live speech-to-speech translation and used it to benchmark five platforms head-to-head. This post walks through the methodology (GEMBA-MQM v2 for accuracy, Ear-Voice Span for latency) and the results.
Eval harness: https://github.com/VoiceFrom/live-s2st-eval

Top comments (0)