DEV Community

Cover image for Building Duolingo-Style AI Video Call Characters Using Rive
Praneeth Kawya Thathsara
Praneeth Kawya Thathsara

Posted on • Edited on

Building Duolingo-Style AI Video Call Characters Using Rive

A Developer-First Guide to Real-Time Conversational Animation

Written by **Praneeth Kawya Thathsara*

Rive Expert · Full-Time Rive Animator · Mascot & Interactive Animation Specialist*

AI voice conversation apps are rapidly becoming a core interaction model for:

  • language learning
  • interview practice
  • coaching
  • education platforms

One of the most well-known examples is Duolingo’s “Video Call with Lily” — a real-time, responsive character that listens, reacts, and speaks naturally.

Many teams assume this is built with video playback or complex 3D pipelines.

In reality, experiences like this are best implemented using state-driven 2D animation, with Rive acting as the real-time character engine.

This article explains:

  • how Duolingo-style AI characters actually work
  • what a Rive animator builds
  • what developers need on the backend
  • how both sides integrate cleanly

Conversational character overview


High-Level Architecture Overview

At a system level, a real-time conversational character consists of four parts:

  • user voice input
  • AI processing (STT → LLM → TTS)
  • real-time animation control
  • frontend rendering

Rive sits at the animation control layer — not the AI layer.

It receives signals from the app and renders character behavior in real time.


This Is Not Normal Animation

This is not timeline animation or video playback.

Instead, the character is a live state machine:

  • continuously running
  • driven by parameters from the app
  • reacting instantly to user and AI signals

Think of the character as a UI component, not a cutscene.


What the Rive Animator Builds

Rive state machine

A professional Rive animator delivers a production-ready character system, not just visual assets.


1. State Machine Design

Typical high-level states include:

  • Idle — breathing, blinking, micro-movement
  • Listening — user is speaking
  • Thinking — AI processing response
  • Speaking — AI voice output
  • Reacting — emotion, correction, praise, confusion

State transitions are triggered entirely by app logic.


2. Exposed Inputs for Developers

This is the most important part for developers.

A clean Rive setup exposes clear, predictable inputs.

Core State Inputs

  • mode (Number / Enum): Idle, Listening, Thinking, Speaking
  • emotion (Number): neutral → happy → confused → serious
  • intensity (0–1): controls expressiveness

Voice-Driven Inputs

  • talkAmount (0–1): driven by audio amplitude
  • OR viseme (Number): phoneme-based lip sync
  • userIsSpeaking (Boolean)
  • aiIsSpeaking (Boolean)

Triggers

  • nod
  • smile
  • confused
  • praise
  • correct
  • interrupt

These inputs are documented and handed off to the dev team.


3. Lip Sync Strategy (Two Options)

Option A — Audio Amplitude (Most Common)

  • use TTS audio amplitude
  • drive mouth / jaw movement via talkAmount
  • lightweight, reliable, easy to integrate

Most production apps start here.


Option B — Viseme-Based Lip Sync

  • phoneme → viseme mapping
  • higher realism
  • requires TTS or middleware that outputs visemes

Teams often upgrade to this later.


The Most Important Part: Listening Behavior

From a UX perspective, listening animation matters more than speaking.

While the user is talking, the character should:

  • maintain eye contact
  • perform subtle nods
  • react to pauses
  • show attention and understanding

These are implemented using micro-loops and triggers, not random animation spam.

This is what makes the experience feel like a video call, not a chatbot.


What Developers Need to Implement

1. Voice Pipeline

  • microphone input
  • speech-to-text (STT)
  • LLM response generation
  • text-to-speech (TTS)
  • audio stream + amplitude or viseme data

2. State Control Logic

Your app decides:

  • when the user is speaking → Listening
  • when AI is processing → Thinking
  • when AI is speaking → Speaking
  • which emotion to apply

The animation system stays dumb.

Logic stays in code.


3. Rive Integration

Rive integrates cleanly with:

  • Flutter
  • Web
  • React Native
  • iOS / Android

Developers simply set input values.

No animation math required.


4. Animator–Developer Contract

A good Rive animator provides:

  • clearly named inputs
  • defined ranges and behavior
  • test controls inside the Rive file
  • integration notes for dev teams

This avoids trial-and-error integration.


Why Use Rive for Conversational Characters?

  • runs at 60fps
  • tiny file sizes
  • no video streaming cost
  • fully real-time
  • cross-platform
  • developer-friendly API

Rive is effectively a real-time animation runtime, not just a design tool.


About the Author

Praneeth Kawya Thathsara is a:

  • Rive Expert
  • Full-Time Rive Animator
  • Specialist in AI-driven interactive mascots
  • Working remotely with global product teams

He focuses on production-ready Rive systems for real applications, not demos.


Contact

📧 uiuxanimation@gmail.com

📧 riveanimator@gmail.com

Available for:

  • AI conversation apps
  • language learning platforms
  • educational tools
  • mascot & character systems
  • developer-ready Rive integrations

Top comments (1)

Collapse
 
okthoi profile image
oknao

"🤖 AhaChat AI Ecosystem is here!
💬 AI Response – Auto-reply to customers 24/7
🎯 AI Sales – Smart assistant that helps close more deals
🔍 AI Trigger – Understands message context & responds instantly
🎨 AI Image – Generate or analyze images with one command
🎤 AI Voice – Turn text into natural, human-like speech
📊 AI Funnel – Qualify & nurture your best leads automatically"