To make Scowld's avatar feel alive, the VRM mouth tracks the TTS audio. The native side fetches speech from ElevenLabs or OpenAI (BYOK), hands the audio to the web layer, and three-vrm drives blendshapes from the playback.
Paired with on-device speech-to-text, you get a real voice loop: speak → transcribe → LLM → speak back, lip-synced.
All open source: https://github.com/apoorvdarshan/scowld
Top comments (0)