Praneeth Kawya Thathsara

Posted on Dec 25, 2025 • Edited on Dec 31, 2025

Building Duolingo-Style AI Video Call Characters Using Rive

#ai #chatbot #animation #dulingo

A Developer-First Guide to Real-Time Conversational Animation

Written by **Praneeth Kawya Thathsara*

Rive Expert · Full-Time Rive Animator · Mascot & Interactive Animation Specialist*

AI voice conversation apps are rapidly becoming a core interaction model for:

language learning
interview practice
coaching
education platforms

One of the most well-known examples is Duolingo’s “Video Call with Lily” — a real-time, responsive character that listens, reacts, and speaks naturally.

Many teams assume this is built with video playback or complex 3D pipelines.

In reality, experiences like this are best implemented using state-driven 2D animation, with Rive acting as the real-time character engine.

This article explains:

how Duolingo-style AI characters actually work
what a Rive animator builds
what developers need on the backend
how both sides integrate cleanly

High-Level Architecture Overview

At a system level, a real-time conversational character consists of four parts:

user voice input
AI processing (STT → LLM → TTS)
real-time animation control
frontend rendering

Rive sits at the animation control layer — not the AI layer.

It receives signals from the app and renders character behavior in real time.

This Is Not Normal Animation

This is not timeline animation or video playback.

Instead, the character is a live state machine:

continuously running
driven by parameters from the app
reacting instantly to user and AI signals

Think of the character as a UI component, not a cutscene.

What the Rive Animator Builds

A professional Rive animator delivers a production-ready character system, not just visual assets.

1. State Machine Design

Typical high-level states include:

Idle — breathing, blinking, micro-movement
Listening — user is speaking
Thinking — AI processing response
Speaking — AI voice output
Reacting — emotion, correction, praise, confusion

State transitions are triggered entirely by app logic.

2. Exposed Inputs for Developers

This is the most important part for developers.

A clean Rive setup exposes clear, predictable inputs.

Core State Inputs

mode (Number / Enum): Idle, Listening, Thinking, Speaking
emotion (Number): neutral → happy → confused → serious
intensity (0–1): controls expressiveness

Voice-Driven Inputs

talkAmount (0–1): driven by audio amplitude
OR viseme (Number): phoneme-based lip sync
userIsSpeaking (Boolean)
aiIsSpeaking (Boolean)

Triggers

nod
smile
confused
praise
correct
interrupt

These inputs are documented and handed off to the dev team.

3. Lip Sync Strategy (Two Options)

Option A — Audio Amplitude (Most Common)

use TTS audio amplitude
drive mouth / jaw movement via talkAmount
lightweight, reliable, easy to integrate

Most production apps start here.

Option B — Viseme-Based Lip Sync

phoneme → viseme mapping
higher realism
requires TTS or middleware that outputs visemes

Teams often upgrade to this later.

The Most Important Part: Listening Behavior

From a UX perspective, listening animation matters more than speaking.

While the user is talking, the character should:

maintain eye contact
perform subtle nods
react to pauses
show attention and understanding

These are implemented using micro-loops and triggers, not random animation spam.

This is what makes the experience feel like a video call, not a chatbot.

What Developers Need to Implement

1. Voice Pipeline

microphone input
speech-to-text (STT)
LLM response generation
text-to-speech (TTS)
audio stream + amplitude or viseme data

2. State Control Logic

Your app decides:

when the user is speaking → Listening
when AI is processing → Thinking
when AI is speaking → Speaking
which emotion to apply

The animation system stays dumb.

Logic stays in code.

3. Rive Integration

Rive integrates cleanly with:

Flutter
Web
React Native
iOS / Android

Developers simply set input values.

No animation math required.

4. Animator–Developer Contract

A good Rive animator provides:

clearly named inputs
defined ranges and behavior
test controls inside the Rive file
integration notes for dev teams

This avoids trial-and-error integration.

Why Use Rive for Conversational Characters?

runs at 60fps
tiny file sizes
no video streaming cost
fully real-time
cross-platform
developer-friendly API

Rive is effectively a real-time animation runtime, not just a design tool.

About the Author

Praneeth Kawya Thathsara is a:

Rive Expert
Full-Time Rive Animator
Specialist in AI-driven interactive mascots
Working remotely with global product teams

He focuses on production-ready Rive systems for real applications, not demos.

Contact

📧 uiuxanimation@gmail.com

📧 riveanimator@gmail.com

Available for:

AI conversation apps
language learning platforms
educational tools
mascot & character systems
developer-ready Rive integrations

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.