DEV Community

Cover image for Building Duolingo-Style AI Video Call Characters Using Rive
Praneeth Kawya Thathsara
Praneeth Kawya Thathsara

Posted on

Building Duolingo-Style AI Video Call Characters Using Rive

A Developer-First Guide to Real-Time Conversational Animation

Written by Praneeth Kawya Thathsara
Rive Expert · Rive Animator · Full-Time Mascot & Interactive Animation Specialist

AI voice conversation apps are rapidly becoming a core interaction model for language learning, interview practice, coaching, and education platforms. One of the most well-known examples is Duolingo’s “Video Call with Lily”—a real-time, responsive character that listens, reacts, and speaks naturally.

Many teams assume this is built with video or complex 3D pipelines. In reality, experiences like this are best implemented using state-driven 2D animation, with Rive acting as the real-time character engine.

This article explains:

How Duolingo-style AI characters actually work

What the Rive animator builds

What developers need on the backend

How both sides integrate cleanly

High-Level Architecture Overview

At a system level, this experience consists of four parts:

User voice input

AI processing (STT → LLM → TTS)

Real-time animation control

Frontend rendering

Rive sits at the animation control layer, not the AI layer.

What Makes This Different from Normal Animation

This is not timeline animation or video playback.

Instead, the character is a live state machine:

Continuously running

Driven by parameters from the app

Reacting instantly to user and AI signals

Think of the character as a UI component, not a cutscene.

What the Rive Animator Builds

A professional Rive Animator delivers a production-ready character system, not just assets.

1. State Machine Design

Typical states include:

Idle – breathing, blinking, micro-movement

Listening – user is speaking

Thinking – AI processing response

Speaking – AI voice output

Reacting – emotion, correction, praise, confusion

State transitions are triggered entirely by app logic.

*2. Exposed Inputs for Developers
*

The most important part for developers is input control.
Common inputs include:

Core State

mode (Number or Enum): Idle / Listening / Thinking / Speaking

emotion (Number): neutral → happy → confused → serious

intensity (0–1): controls expressiveness

Voice-Driven

talkAmount (0–1): driven by audio amplitude

OR viseme (Number): phoneme-based lip sync

userIsSpeaking (Boolean)

aiIsSpeaking (Boolean)

Triggers

nod

smile

confused

praise

correct

interrupt

These inputs are documented and handed off to dev teams.

*3. Lip Sync Strategy (Two Options)
*

Option A – Audio Amplitude (Most Common)

Use TTS audio amplitude

Drive mouth/jaw movement with talkAmount

Lightweight, reliable, easy to integrate

Option B – Viseme-Based Lip Sync

Phoneme → viseme mapping

Higher realism

Requires TTS or middleware that outputs visemes

Most production apps start with Option A and upgrade later.

The Most Important Part: Listening Behavior

From a UX perspective, listening animation matters more than speaking.

While the user is talking, the character should:

Maintain eye contact

Perform subtle nods

React to pauses

Show attention and understanding

These are implemented as micro-loops and triggers, not random animation spam.

This is what makes the experience feel like a video call, not a chatbot.

**

What Developers Need to Implement (Backend & Frontend)

**

  1. Voice Pipeline

Microphone input

Speech-to-Text (STT)

LLM response generation

Text-to-Speech (TTS)

Audio stream + amplitude or viseme data

  1. State Control Logic

Your app decides:
When the user is speaking → Listening
When AI is processing → Thinking
When AI is speaking → Speaking
Which emotion to apply
The animation system stays dumb—logic stays in code.

  1. Rive Integration

Rive integrates cleanly with:

  • Flutter
  • Web
  • React Native
  • iOS / Android

Developers simply set values on inputs—no animation math required.

*4. Animator–Developer Contract
*

A good Rive Animator provides:

Clearly named inputs

Defined ranges and behavior

Test controls inside the Rive file

Integration notes for dev teams

This avoids trial-and-error integration.

Why Use Rive for This?

Runs at 60fps

Tiny file sizes

No video streaming cost

Fully real-time

Cross-platform

Developer-friendly API

Rive is effectively a real-time animation runtime, not just a design tool.

About the Author

Praneeth Kawya Thathsara is a:

Rive Expert

Full-time Rive Animator

Specialist in AI-driven interactive mascots

Working remotely with teams globally

He focuses on production-ready Rive systems for real applications—not demos.

Contact

📧 uiuxanimation@gmail.com

📧 riveanimator@gmail.com

Available for:

AI conversation apps
Language learning platforms
Educational tools
Mascot & character systems
Developer-ready Rive integrations

Top comments (0)