What if you could turn dull, static text and audio-based content into exciting videos with the help of AI? With AI video avatar generators, you can easily create high-quality videos that grab your audience's attention starting from simple text or audio.
These AI video generators can serve several purposes, from being deployed as a customer support agent in your application to give more engaging support and enhance customer satisfaction to being used as an educational tool to engage students in interactive learning environments. They can also be used to create virtual assistants that guide users through getting started with a product or tool without needing to go through the documentation.
This tutorial will guide you in setting up and implementing real-time AI video avatars using Simli, an AI video avatar generator. Simli provides developers with a speech-to-video API to create Lipsynced AI avatars with lifelike, animated characters, realistic head movements, and synchronized speech.
Following this guide, you will learn how to quickly create a video avatar from voice inputs, ready to be deployed in interactive projects. So, let's get started right away!
The complete source code for the project is available on GitHub.
Prerequisites
You should have:
- A basic understanding of JavaScript and React.
- Node and Node Package Manager (NPM) installed on your computer.
Before setting up the API environment and then moving on to creating a real-time AI video avatar using the Simli API, let's briefly look at the steps needed to create an AI video avatar with Simli.
Steps to Create an AI Video Avatar with Simli:
- Obtain the API key
- Choose a face ID
- Initialize the Simli client
- Call the
simliClient.start()
function to set the WebRTC connection - Stream audio using
sendAudioData()
Set Up Your API Environment in Minutes
Start by signing up on Simli to retrieve your API key. For a quick sign-in, you can choose Google.
Once you’ve successfully created an account, you will be redirected to the user profile dashboard, where you can generate your API key and track your API usage.
Click the icon above to copy your API key and store it securely. After retrieving your API key, select an avatar to display on the frontend.
Choose Your AI Avatar
Simli provides sample AI avatars that can be accessed through its available faces, with new avatars being added constantly.
Here are a few of the available faces:
To get the ID for each face, copy the random text after the name. For example, the ID for Jenna will be tmp9i8bbq7c
.
If you don’t want to use any available avatars, Simli has a create avatar tool that lets you create custom avatars simply by uploading images. However, this tutorial will use an existing avatar.
Now that you have the face ID and the Simli API key, let’s create a Next.js app.
Create a Next.js App
To bootstrap a Next.js application, open your terminal, cd
into the directory where you would like to create the application, and run this command:
npx create-next-app@latest simli-demo
This command will prompt a few questions about configuring the Next.js application. Here’s what you should respond to each question:
Select the response for each question as shown above by pressing enter
.
Installing Dependencies
Next, install the simli-client
and AudioContext
packages by running this code:
npm install simli-client standardized-audio-context
The SimliClient, also known as Simli’s WebRTC frontend client, is a tool to integrate real-time video and audio streaming capabilities into web applications using WebRTC. This will enable you to avoid the manual WebRTC setup.
The AudioContext is used to downsample the audio and convert it into chunks that the SimliClient
can process.
Initialize the SimliClient in Your Project
In your Next.js application, navigate to the page.js
file and paste the following code:
// src/app//page.js
...
// Declare video and audio ref
...
import { useRef, useEffect } from 'react';
function Home() {
const videoRef = useRef(null);
const audioRef = useRef(null);
return (
<div>
<video ref={videoRef} autoPlay playsInline></video>
<audio ref={audioRef} autoPlay></audio>
</div>
);
...
In the code above, a videoRef
and audioRef
was created using the useRef hook to access the <video>
and <audio>
HTML elements in the component. The SimliClient
SDK uses videoRef
and audioRef
to attach live WebRTC video and audio streams to these HTML elements. The <video>
and <audio>
elements will be used to render the video and audio data from the remote streams on the client side.
The next step is to configure SimliClient
and pass in the video and audio ref. To do so, paste the following code inside Interview.js
:
// src/app//page.js
...
// configure the simli client
...
import { SimliClient } from 'simli-client';
const simliClient = new SimliClient();
const simliConfig = {
apiKey: "your api key",
faceID: "tmp9i8bbq7c",
handleSilence: true,
maxSessionLength: 3600,
maxIdleTime: 600,
videoRef: videoRef,
audioRef: audioRef,
};
...
This block of code creates a new instance of the SimliClient
and a simliConfig
object. Let’s break down each part of the simliConfig
object:
-
apiKey
: This is a unique key when creating an account with Simli. -
faceID
: Represents the avatar face ID that will be rendered in the video stream. Simli provides different avatars; you can choose one using its face ID. -
handleSilence
: This boolean indicates whether the client should handle silent moments in the audio stream (e.g., muting or pausing the video if no audio is detected). -
maxSessionLength
: Sets the maximum session length (in seconds). Here, it's set to 1 hour (3600 seconds), limiting the duration of any single connection session. -
maxIdleTime
: Sets the maximum idle time (in seconds). The session will disconnect after 600 seconds (10 minutes) without activity. -
videoRef
and
audioRef
: These are references to the video and audio elements where the media streams will be displayed in the browser. SimliClient can connect the WebRTC streams directly to these elements by passing these refs.
Start Real-time Streaming with AI Video Avatar
Once you have successfully configured SimliClient
, the next step is establishing the webRTC connection.
But before that, you need to create a function that will reduce the audio to 16 kHz and break it into smaller pulse-code modulation (PCM) chunks. This guide will use a prerecorded mp3 audio that will be sent to the SimliClient
. You can download and use any audio of your choice.
Paste the following code inside page.js
file to create the downsampleAndChunkAudio
function:
// src/app//page.js
...
// Downsample the audio to PCM chunks
...
const downsampleAndChunkAudio = async (audioUrl, chunkSizeInMs = 100) => {
// Create an AudioContext with a target sample rate of 16kHz
const audioContext = new AudioContext({ sampleRate: 16000 });
// Fetch and decode audio file
const response = await fetch(audioUrl);
const arrayBuffer = await response.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
// Extract PCM data from audio buffer
const rawPCM = audioBuffer.getChannelData(0); // assuming mono audio for simplicity
// Calculate chunk size in samples (16-bit PCM)
const chunkSizeInSamples = (chunkSizeInMs / 1000) * 16000;
const pcmChunks = [];
// Loop through the raw PCM data and create chunks
for (let i = 0; i < rawPCM.length; i += chunkSizeInSamples) {
const chunk = rawPCM.subarray(i, i + chunkSizeInSamples);
// Convert each chunk to Int16Array PCM data
const int16Chunk = new Int16Array(chunk.length);
for (let j = 0; j < chunk.length; j++) {
int16Chunk[j] = Math.max(-32768, Math.min(32767, chunk[j] * 32768));
}
pcmChunks.push(int16Chunk);
}
return pcmChunks;
};
This downsampleAndChunkAudio
function takes audio as an argument and processes the audio file by downsampling it to 16 kHz and breaking it into smaller PCM chunks. This format is required for audio to be sent to the SimliClient
.
Next, you have to initialize SimliClinet
and establish the WebRTC connection. To do so, paste the following code inside page.js
file:
// src/app//page.js
...
// Initialize simli client
...
async function initializeClient() {
try {
simliClient.Initialize(simliConfig);
await simliClient.start();
// setIsInitialized(true);
// Send audio data in chunks
const pcmChunks = await downsampleAndChunkAudio(audioUrl);
const interval = setInterval(() => {
const chunk = pcmChunks.shift();
// if (isInitialized && chunk) {
chunk && simliClient.sendAudioData(chunk);
// }
if (!pcmChunks.length) clearInterval(interval);
console.log("PCM ", chunk);
}, 120);
} catch(error){
alert(error);
}
}
The initializeClient
function initializes the SimliClient with the simliConfig object that was earlier declared. It then calls the downsampleAndChunkAudio
function to break the audio into chunks of type PCM16 before sending it to the Simli client.
Note: The audio data should be of PCM16 type and have a sample rate 16KHz.
PCM16 is a standard audio format ideal for voice processing. When you send this audio format to Simli's API, it helps maintain synchronization between the audio and the avatar's lip movements. This enhances the viewer experience, as it mimics natural speaking in real-time.
Render and Integrate the AI Avatar on the Frontend
Now that you have finished building the application, let’s render it on the browser. To do so, open your terminal and run this code:
npm run dev
This command will start a local host server on http://localhost:3000
.
Watch the application in action through this video.
You should checkout this GitHub repository to explore a hands-on example on how to integrate Simli's API for building interactive AI avatars.
Conclusion
This quick guide showed how to create a real-time AI video avatar using the Simli API. While this article covered only the basics—such as sending prerecorded audio to the Simli API—Simli offers capabilities that extend far beyond this scope.
To unlock Simli's full potential, you can enhance your AI video avatars by integrating additional tools like OpenAI for language models and Deepgram or Elevenlabs for converting text to speech. These tools work seamlessly with Simli to create more engaging and interactive video experiences.
Check this tutorial for a more advanced use case of Simli. Sign up on Simli today to get started!
Top comments (0)