Building conversational AI for Android is becoming one of the most in-demand capabilities in mobile development. With the rapid rise of real-time voice assistants, interactive agents, and AI-driven communication, developers are increasingly looking for efficient ways to bring natural voice interaction into Android apps.
In the past, implementing conversational AI required several different components working together. Speech recognition, large language models, text-to-speech, and low-latency audio processing all needed to be manually connected, which led to long development cycles and complex engineering work.
ZEGOCLOUD simplifies this entire workflow by offering ready-to-use components designed for real-time conversational AI. In this tutorial, you will learn how to create a voice-driven AI bot for Android using ZEGOCLOUD. By the end, you will have a working prototype that supports natural, continuous voice conversations suitable for production.
How to Build a Conversational AI on Android
Download Complete Source Code
| Component | Repository |
|---|---|
| Backend Server + Web Client | https://github.com/ZEGOCLOUD/blog-aiagent-server-and-web |
| Android Client | https://github.com/ZEGOCLOUD/blog-aiagent-android |
Prerequisites
Before starting the implementation, prepare the following:
- Android Studio (latest stable version)
- A ZEGOCLOUD account
- Some familiarity with Kotlin
- A backend service for handling AI logic. This tutorial uses a Next.js server deployed on Vercel
Once everything is ready, you can start building.
Architecture Overview
ZEGOCLOUD’s Conversational AI relies on both server and client components.
Why Both Are Needed
ZEGOCLOUD processes audio in the cloud. It performs:
- 1. Speech recognition
- 2. LLM response generation
- 3. Text-to-speech audio output
The Android client streams audio and plays back AI results. The backend securely stores API keys and generates authentication tokens.
| Component | Responsibilities |
|---|---|
| Backend Server | Stores credentials, generates tokens, manages AI agent lifecycle |
| Android Client | Captures voice, streams audio, plays AI responses, displays subtitles |
| ZEGOCLOUD | Runs ASR to LLM to TTS pipeline in real time |
Steps to Build a Conversational AI on Android
Step 1: Set Up Your Backend Server
This example uses a simple Next.js backend deployed to Vercel.
1.1 Environment Variables
Create .env.local:
# ZEGO Configuration
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_server_secret_32_chars
# AI Agent Configuration
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant
# System Prompt
SYSTEM_PROMPT="You are my best friend whom I can talk to about anything. You're warm, understanding, and always there for me."
# LLM Configuration
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name
# TTS Configuration
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts
1.2 Token Generation API
// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';
function generateToken(appId: number, userId: string, secret: string,
effectiveTimeInSeconds: number): string {
const tokenInfo = {
app_id: appId,
user_id: userId,
nonce: Math.floor(Math.random() * 2147483647),
ctime: Math.floor(Date.now() / 1000),
expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
payload: ''
};
const plainText = JSON.stringify(tokenInfo);
const nonce = crypto.randomBytes(12);
const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
cipher.final(), cipher.getAuthTag()]);
const buf = Buffer.concat([
Buffer.alloc(8).writeBigInt64BE(BigInt(tokenInfo.expire), 0) || Buffer.alloc(8),
Buffer.from([0, 12]), nonce,
Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
Buffer.from([1])
]);
return '04' + buf.toString('base64');
}
export async function POST(request: NextRequest) {
const { userId } = await request.json();
const token = generateToken(
parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
userId,
process.env.ZEGO_SERVER_SECRET!,
3600
);
return NextResponse.json({ token });
}
1.3 AI Agent API Signature
// app/api/zego/utils.ts
import crypto from 'crypto';
export function generateSignature(appId: number, signatureNonce: string,
serverSecret: string, timestamp: number): string {
const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
return crypto.createHash('md5').update(str).digest('hex');
}
export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
const serverSecret = process.env.ZEGO_SERVER_SECRET!;
const signatureNonce = crypto.randomBytes(8).toString('hex');
const timestamp = Math.floor(Date.now() / 1000);
const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);
const url = new URL('https://aigc-aiagent-api.zegotech.cn');
url.searchParams.set('Action', action);
url.searchParams.set('AppId', appId.toString());
url.searchParams.set('SignatureNonce', signatureNonce);
url.searchParams.set('Timestamp', timestamp.toString());
url.searchParams.set('Signature', signature);
url.searchParams.set('SignatureVersion', '2.0');
const response = await fetch(url.toString(), {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
});
const result = await response.json();
return result.Data as T;
}
1.4 Deploy to Vercel
- 1. Push repo to GitHub
- 2. Import repo into Vercel
- 3. Add environment variables
- 4. Deploy
Step 2: Build the Android Client
2.1 Create Android Project
- Language: Kotlin
- Minimum SDK: API 24
2.2 Integrate ZEGOCLOUD AI Agent Optimized SDK
This version supports subtitle messages through onRecvExperimentalAPI, which the public Maven version does not support.
Manual Integration Steps
- Download the AI Agent optimized SDK
- Copy these files into your project:
app/libs/ZegoExpressEngine.jar
app/libs/arm64-v8a/libZegoExpressEngine.so
app/libs/armeabi-v7a/libZegoExpressEngine.so
build.gradle
android {
defaultConfig {
ndk {
abiFilters 'armeabi-v7a', 'arm64-v8a'
}
}
sourceSets {
main {
jniLibs.srcDirs = ['libs']
}
}
}
dependencies {
implementation files('libs/ZegoExpressEngine.jar')
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
implementation 'com.google.code.gson:gson:2.10.1'
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
}
2.3 Permissions
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
2.4 App Configuration
object AppConfig {
const val APP_ID: Long = 123456789L
const val SERVER_URL = "https://your-project.vercel.app"
fun generateUserId(): String = "user${System.currentTimeMillis() % 100000}"
fun generateRoomId(): String = "room${System.currentTimeMillis() % 100000}"
}
2.5 Initialize ZEGO Engine
class ZegoExpressManager(private val application: Application) {
private var engine: ZegoExpressEngine? = null
fun initEngine() {
val profile = ZegoEngineProfile().apply {
appID = AppConfig.APP_ID
scenario = ZegoScenario.HIGH_QUALITY_CHATROOM
application = this@ZegoExpressManager.application
}
engine = ZegoExpressEngine.createEngine(profile, eventHandler)
engine?.apply {
enableAGC(true)
enableANS(true)
setANSMode(ZegoANSMode.MEDIUM)
}
}
}
2.6 Room Login, Publishing and Playing Streams
fun loginRoom(roomId: String, userId: String, token: String, callback: (Int) -> Unit) {
val user = ZegoUser(userId)
val config = ZegoRoomConfig().apply {
this.token = token
}
engine?.loginRoom(roomId, user, config) { errorCode, _ ->
callback(errorCode)
}
}
fun startPublishing(streamId: String) {
engine?.startPublishingStream(streamId)
}
fun startPlaying(streamId: String) {
engine?.startPlayingStream(streamId)
}
2.7 Subtitle Display
private val audioChatMessageParser = AudioChatMessageParser()
audioChatMessageParser.setAudioChatMessageListListener(object :
AudioChatMessageParser.AudioChatMessageListListener {
override fun onMessageListUpdated(messagesList: MutableList<AudioChatMessage>) {
runOnUiThread {
binding.messageList.onMessageListUpdated(messagesList)
}
}
override fun onAudioChatStateUpdate(statusMessage: AudioChatAgentStatusMessage) {}
})
zegoManager.onRecvExperimentalAPI = { content ->
try {
val json = JSONObject(content)
if (json.getString("method") == "liveroom.room.on_recive_room_channel_message") {
val msgContent = json.getJSONObject("params").getString("msg_content")
audioChatMessageParser.parseAudioChatMessage(msgContent)
}
} catch (e: Exception) { e.printStackTrace() }
}
2.8 UI Layout
<ConstraintLayout>
<LinearLayout android:id="@+id/topSection">
<TextView android:id="@+id/tvStatus" />
<Button android:id="@+id/btnCall" android:text="Start Call" />
</LinearLayout>
<com.zegocloud.aiagent.subtitle.AIChatListView
android:id="@+id/messageList" />
</ConstraintLayout>
2.9 Complete Call Flow
class MainViewModel(private val zegoManager: ZegoExpressManager) : ViewModel() {
fun startCall() {
viewModelScope.launch {
_isLoading.value = true
try {
val roomId = AppConfig.generateRoomId()
val userId = AppConfig.generateUserId()
val userStreamId = "${roomId}_${userId}_main"
zegoManager.initEngine()
val token = ApiService.getToken(userId)
?: throw Exception("Failed to get token")
val loginResult = zegoManager.loginRoom(roomId, userId, token)
if (loginResult != 0) throw Exception("Failed to login room")
zegoManager.startPublishing(userStreamId)
val agentStreamId = ApiService.startAgent(roomId, userId, userStreamId)
?: throw Exception("Failed to start agent")
zegoManager.startPlaying(agentStreamId)
_isConnected.value = true
_currentRoomId = roomId
} catch (e: Exception) {
_error.value = e.message
} finally {
_isLoading.value = false
}
}
}
fun endCall() {
viewModelScope.launch {
_currentRoomId?.let { roomId ->
ApiService.stopAgent(roomId)
zegoManager.logoutRoom(roomId)
}
_isConnected.value = false
_currentRoomId = null
}
}
}
Step 3: Run the Demo
You can now test real-time conversational AI on a real Android device.
https://youtu.be/TSEVDlY_l4M
Conclusion
You have built a complete conversational AI system on Android that includes:
- A secure backend
- A Kotlin-based Android client
- Real-time speech processing and AI responses
- Live subtitle rendering
You can continue by refining the system prompt, adding avatar animations, or integrating richer UI components. This setup works well for AI assistants, customer support bots, AI companions, learning tutors, and more.
If you want to continue exploring, you can start with ZEGOCLOUD’s conversational AI, which provides enough resources for extensive prototyping.

Top comments (0)