DEV Community

Apoorv Darshan
Apoorv Darshan

Posted on

Lip-syncing a VRM character to ElevenLabs/OpenAI TTS

To make Scowld's avatar feel alive, the VRM mouth tracks the TTS audio. The native side fetches speech from ElevenLabs or OpenAI (BYOK), hands the audio to the web layer, and three-vrm drives blendshapes from the playback.

Paired with on-device speech-to-text, you get a real voice loop: speak → transcribe → LLM → speak back, lip-synced.

All open source: https://github.com/apoorvdarshan/scowld

Top comments (0)