AI voice agents are great at talking but not at connecting. Voice alone can’t express emotion, empathy, or trust. In this blog, we’ll explore how you can give your AI a face and personality using VideoSDK’s Simli Avatar integration to make every interaction more lifelike and engaging.
Let’s answer what most builders are wondering:
- How can I make my AI agent feel more human and expressive?
- What are avatars, and how do they actually work with voice agents?
- How do I integrate an avatar into my VideoSDK pipeline?
- What are the best practices for creating realistic, reliable avatars?
What is an AI Avatar?
An AI avatar is a real-time visual representation of your voice-based AI agent, showing facial expressions, and mimicking natural movement. Using tools like Simli, avatars render in real time, giving your AI a relatable and human-like presence.
Let's create an talking AI Avatar
Project structure
├── main.py # Main agent implementation
├── requirements.txt # Python dependencies
├── mcp_joke.py # Weather MCP server
├── .env.example # Environment variables template
└── README.md # This file
Pre-requisites
- Make sure you've a python >=3.12
- Simli Api key (dashboard link)
- VideoSDK Auth Token (token)
Create a .env file
VIDEOSDK_AUTH_TOKEN = ""
SIMLI_API_KEY = ""
GOOGLE_API_KEY = ""
Create and Activate the Virtual Environment
python -m venv .venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate
Install all these dependencies
pip install videosdk-agents videosdk-plugins-google videosdk-plugins-simli python-dotenv fastmcp
Create a main.py file
import asyncio
import sys
from pathlib import Path
import requests
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, MCPServerStdio
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from videosdk.plugins.simli import SimliAvatar, SimliConfig
from dotenv import load_dotenv
import os
load_dotenv(override=True)
def get_room_id(auth_token: str) -> str:
url = "https://api.videosdk.live/v2/rooms"
headers = {
"Authorization": auth_token
}
response = requests.post(url, headers=headers)
response.raise_for_status()
return response.json()["roomId"]
class MyVoiceAgent(Agent):
def __init__(self):
mcp_script_weather = Path(__file__).parent / "mcp_joke.py"
super().__init__(
instructions="You are VideoSDK's AI Avatar Voice Agent with real-time capabilities. You are a helpful virtual assistant with a visual avatar that can answer questions about weather help with other tasks in real-time.",
mcp_servers = [
MCPServerStdio(
executable_path=sys.executable,
process_arguments= [str(mcp_script_weather)],
session_timeout=30
)
]
)
async def on_enter(self) -> None:
await self.session.say("Hello! I'm your real-time AI avatar assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Goodbye! It was great talking with you!")
async def start_session(context: JobContext):
# Initialize Gemini Realtime model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
api_key="xxxxxx",
config=GeminiLiveConfig(
voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
response_modalities=["AUDIO"]
)
)
# Initialize Simli Avatar
simli_config = SimliConfig(
apiKey="xxxxxxxxxxxxx",
faceId="0c2b8b04-5274-41f1-a21c-d5c98322efa9" # default
)
simli_avatar = SimliAvatar(config=simli_config)
# Create pipeline with avatar
pipeline = RealTimePipeline(
model=model,
avatar=simli_avatar
)
session = AgentSession(
agent=MyVoiceAgent(),
pipeline=pipeline
)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
def make_context() -> JobContext:
auth_token = os.getenv("VIDEOSDK_AUTH_TOKEN")
room_id = get_room_id(auth_token)
room_options = RoomOptions(
room_id=room_id,
auth_token=auth_token,
name="Simli Avatar Realtime Agent",
playground=True
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
Create a mcp_joke file
from fastmcp import FastMCP
import httpx
mcp = FastMCP("JokeServer")
@mcp.tool()
async def get_random_joke() -> str:
"""
Fetch a random joke from the Official Joke API and format it for voice response.
"""
JOKE_API_URL = "https://official-joke-api.appspot.com/random_joke"
async with httpx.AsyncClient() as client:
try:
response = await client.get(JOKE_API_URL, timeout=10)
response.raise_for_status()
joke_data = response.json()
setup = joke_data.get("setup", "Hmm... I forgot the joke setup!")
punchline = joke_data.get("punchline", "Oh wait, I forgot the punchline too!")
# Voice-friendly response (add pauses for TTS)
return f"Here's a joke for you! {setup} ... {punchline}"
except httpx.RequestError as e:
return f"Oops! I couldn’t fetch a joke right now. Network error: {e}"
except Exception as e:
return f"Something went wrong while getting a joke: {e}"
if __name__ == "__main__":
mcp.run(transport="stdio")
Run your agent
python main.py
You can dive deeper into the playground and agent capabilities in the VideoSDK AI Playground documentation.
We’d love to hear from you!
- Have you tried creating your own AI-powered Simli Avatar using the IoT SDK?
- What challenges did you face while integrating real-time voice and motion on your device?
- Are you more excited about building expressive AI companions or functional IoT assistants?
- How do you see voice-interactive avatars and devices changing the way people connect, learn, and play?
👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!
Top comments (0)