Voice-first AI applications are reshaping mobile experiences. Users expect more than text input—they want natural, spoken conversations with their apps. If you're looking to build an AI chatbot for Android, you've come to the right place.
This tutorial walks you through creating a fully functional Android AI chatbot from scratch. We'll use ZEGOCLOUD's Conversational AI platform to handle speech recognition, language processing, and text-to-speech—all through a clean, integrated SDK. By the end, your Android app will have real-time voice conversation capabilities that rival the big tech apps.
Let's get started.
Quick Links
| Resource | Repository |
|---|---|
| Backend Server + Web Client | github.com/ZEGOCLOUD/blog-aiagent-server-and-web |
| Android Client | github.com/ZEGOCLOUD/blog-aiagent-android |
Prerequisites
Before diving in, ensure you have:
- Android Studio (latest stable version)
- Kotlin programming experience
- ZEGOCLOUD Account — Create free account
- Backend deployment — We'll use Next.js on Vercel (one-click deploy)
Architecture Overview
Understanding the Components
ZEGOCLOUD's Conversational AI runs as a cloud service handling three core functions:
| Service | Function |
|---|---|
| ASR (Automatic Speech Recognition) | Converts voice to text |
| LLM (Large Language Model) | Processes and generates responses |
| TTS (Text-to-Speech) | Converts text back to natural speech |
System Flow
sequenceDiagram
participant User as Android Client
participant Server as Backend Server
participant ZEGO as ZEGO Cloud
participant LLM as LLM Service
participant TTS as TTS Service
User->>Server: 1. Request Token
Server-->>User: 2. Return Token
User->>ZEGO: 3. Login Room & Publish Audio Stream
User->>Server: 4. Request to Start AI Agent
Server->>ZEGO: 5. Register Agent (LLM/TTS config)
Server->>ZEGO: 6. Create Agent Instance
ZEGO-->>Server: 7. Return Agent Instance ID
Server-->>User: 8. Return Agent Stream ID
User->>ZEGO: 9. Play Agent's Audio Stream
loop Conversation
User->>ZEGO: User speaks (audio stream)
ZEGO->>ZEGO: ASR: Speech to Text
ZEGO->>LLM: Send text to LLM
LLM-->>ZEGO: LLM response
ZEGO->>TTS: Text to Speech
TTS-->>ZEGO: Audio data
ZEGO-->>User: AI voice + Subtitles
end
User->>Server: 10. Request to Stop AI Agent
Server->>ZEGO: 11. Delete Agent Instance
Why Do We Need Both Server and Client?
A common question when building an AI chatbot on Android applications: why can't the client connect directly to ZEGOCLOUD? The answer lies in security and architectural design.
| Component | Role | Security Boundary |
|---|---|---|
| Backend Server | Token generation, AI agent management, API signing | Trusted environment (your control) |
| Android Client | Audio capture, stream publishing, UI display | Untrusted environment (user devices) |
| ZEGO Cloud | ASR, LLM, TTS processing | External service |
Key reasons for this architecture:
Secret Protection — Your
ZEGO_SERVER_SECRETand LLM API keys never leave your server. If these credentials were embedded in the Android app, attackers could decompile it and steal them.Token-Based Authentication — Clients request short-lived tokens (typically 1 hour) instead of using permanent credentials. This limits the damage if a token is compromised.
Centralized AI Configuration — System prompts, LLM settings, and TTS voices are configured on your server, not in the client. This allows you to update AI behavior without app updates.
Audit & Rate Limiting — Your server can log all AI agent requests, implement rate limiting, and monitor usage patterns.
This separation ensures your build an AI chatbot for Android deployment remains secure and maintainable in production.
Step 1: Set Up the Backend
Your backend handles authentication and AI agent management. We use Next.js for simplicity.
1.1 Environment Configuration
Create a .env.local file in your server root:
# ZEGO Credentials (from Console: https://console.zegocloud.com/)
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_32_char_secret
# AI Agent Settings
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant
# Define AI Personality
SYSTEM_PROMPT="You are my best friend who I can talk to about anything. You're warm, understanding, and always there for me. Respond naturally like a close friend would."
# LLM Provider (OpenAI, Doubao, Claude, etc.)
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name
# Text-to-Speech Configuration
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts
| Variable | Purpose | Where to Get |
|---|---|---|
NEXT_PUBLIC_ZEGO_APP_ID |
Application identifier | ZEGOCLOUD Console → Project Settings |
ZEGO_SERVER_SECRET |
32-character secret key | ZEGOCLOUD Console → Project Settings |
SYSTEM_PROMPT |
AI behavior definition | Customize for your use case |
LLM_* |
Language model config | From your LLM provider |
TTS_* |
Voice synthesis settings | Use test values or your TTS service |
1.2 Token Generation API
// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';
function generateToken(appId: number, userId: string, secret: string,
effectiveTimeInSeconds: number): string {
const tokenInfo = {
app_id: appId,
user_id: userId,
nonce: Math.floor(Math.random() * 2147483647),
ctime: Math.floor(Date.now() / 1000),
expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
payload: ''
};
const plainText = JSON.stringify(tokenInfo);
const nonce = crypto.randomBytes(12);
const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
cipher.final(), cipher.getAuthTag()]);
const buf = Buffer.concat([
Buffer.alloc(8).writeBigInt64BE(BigInt(tokenInfo.expire), 0) || Buffer.alloc(8),
Buffer.from([0, 12]), nonce,
Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
Buffer.from([1]) // GCM mode
]);
return '04' + buf.toString('base64');
}
export async function POST(request: NextRequest) {
const { userId } = await request.json();
const token = generateToken(
parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
userId,
process.env.ZEGO_SERVER_SECRET!,
3600
);
return NextResponse.json({ token });
}
1.3 AI Agent Signature Utility
// app/api/zego/utils.ts
import crypto from 'crypto';
export function generateSignature(appId: number, signatureNonce: string,
serverSecret: string, timestamp: number): string {
const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
return crypto.createHash('md5').update(str).digest('hex');
}
export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
const serverSecret = process.env.ZEGO_SERVER_SECRET!;
const signatureNonce = crypto.randomBytes(8).toString('hex');
const timestamp = Math.floor(Date.now() / 1000);
const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);
const url = new URL('https://aigc-aiagent-api.zegotech.cn');
url.searchParams.set('Action', action);
url.searchParams.set('AppId', appId.toString());
url.searchParams.set('SignatureNonce', signatureNonce);
url.searchParams.set('Timestamp', timestamp.toString());
url.searchParams.set('Signature', signature);
url.searchParams.set('SignatureVersion', '2.0');
const response = await fetch(url.toString(), {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
});
const result = await response.json();
return result.Data as T;
}
1.4 Deploy to Vercel
- Push code to GitHub
- Import repository in Vercel
- Configure environment variables
- Click Deploy
Your server will be live at https://your-project.vercel.app.
Step 2: Build the Android Client
Now we'll create an AI chatbot in Android using Kotlin and the ZEGOCLOUD SDK.
2.1 Create New Project
Start a new Android project:
- Language: Kotlin
- Minimum SDK: API 24
2.2 Add Dependencies
Important: Use the AI Agent-optimized ZEGO SDK. The standard Maven version won't support subtitle callbacks (
onRecvExperimentalAPI).
Download from: ZEGO AI Agent SDK
Integration steps:
-
Extract and copy files:
-
ZegoExpressEngine.jar→app/libs/ -
libZegoExpressEngine.so(arm64) →app/libs/arm64-v8a/ -
libZegoExpressEngine.so(v7a) →app/libs/armeabi-v7a/
-
Update
app/build.gradle:
android {
defaultConfig {
ndk {
abiFilters 'armeabi-v7a', 'arm64-v8a'
}
}
sourceSets {
main {
jniLibs.srcDirs = ['libs']
}
}
}
dependencies {
// ZEGO Express SDK (AI Agent version)
implementation files('libs/ZegoExpressEngine.jar')
// Networking & JSON
implementation 'com.squareup.okhttp3:okhttp:4.12.0'
implementation 'com.google.code.gson:gson:2.10.1'
// Coroutines
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
}
2.3 Add Permissions
In AndroidManifest.xml:
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
2.4 Configuration Class
Centralize your app settings:
// config/AppConfig.kt
object AppConfig {
// Must match NEXT_PUBLIC_ZEGO_APP_ID in backend
const val APP_ID: Long = 1234567890L
// Your Vercel deployment URL
const val SERVER_URL = "https://your-project.vercel.app"
// Helper functions for unique IDs
fun generateUserId(): String = "user${System.currentTimeMillis() % 100000}"
fun generateRoomId(): String = "room${System.currentTimeMillis() % 100000}"
}
Note: Use
http://10.0.2.2:3000for emulator testing to access localhost.
2.5 API Service Layer
Handle all backend communication:
// api/ApiService.kt
object ApiService {
private val client = OkHttpClient.Builder()
.connectTimeout(30, TimeUnit.SECONDS)
.readTimeout(30, TimeUnit.SECONDS)
.build()
private val gson = Gson()
private val JSON = "application/json; charset=utf-8".toMediaType()
// Get authentication token
suspend fun getToken(userId: String): TokenResponse = withContext(Dispatchers.IO) {
val body = mapOf("userId" to userId)
val request = Request.Builder()
.url("${AppConfig.SERVER_URL}/api/zego/token")
.post(gson.toJson(body).toRequestBody(JSON))
.build()
val response = client.newCall(request).execute()
val responseBody = response.body?.string()
gson.fromJson(responseBody, TokenResponse::class.java)
}
// Start AI agent instance
suspend fun startAgent(roomId: String, userId: String, userStreamId: String): AgentResponse =
withContext(Dispatchers.IO) {
val body = mapOf(
"roomId" to roomId,
"userId" to userId,
"userStreamId" to userStreamId
)
val request = Request.Builder()
.url("${AppConfig.SERVER_URL}/api/zego/start")
.post(gson.toJson(body).toRequestBody(JSON))
.build()
val response = client.newCall(request).execute()
val responseBody = response.body?.string()
gson.fromJson(responseBody, AgentResponse::class.java)
}
// Stop AI agent
suspend fun stopAgent(agentInstanceId: String): StopResponse = withContext(Dispatchers.IO) {
val body = mapOf("agentInstanceId" to agentInstanceId)
val request = Request.Builder()
.url("${AppConfig.SERVER_URL}/api/zego/stop")
.post(gson.toJson(body).toRequestBody(JSON))
.build()
val response = client.newCall(request).execute()
val responseBody = response.body?.string()
gson.fromJson(responseBody, StopResponse::class.java)
}
}
// Response data classes
data class TokenResponse(val code: Int?, val data: TokenData?, val message: String?)
data class TokenData(val token: String?)
data class AgentResponse(val code: Int?, val data: AgentData?, val message: String?)
data class AgentData(val agentInstanceId: String?, val agentUserId: String?, val agentStreamId: String?)
data class StopResponse(val code: Int?, val message: String?)
2.6 Initialize ZEGO Engine
Set up audio-optimized RTC:
class ZegoExpressManager(private val application: Application) {
private var engine: ZegoExpressEngine? = null
fun initEngine() {
val profile = ZegoEngineProfile().apply {
appID = AppConfig.APP_ID
scenario = ZegoScenario.HIGH_QUALITY_CHATROOM
application = this@ZegoExpressManager.application
}
engine = ZegoExpressEngine.createEngine(profile, eventHandler)
// Audio optimization
engine?.apply {
enableAGC(true) // Automatic Gain Control
enableANS(true) // Noise Suppression
setANSMode(ZegoANSMode.MEDIUM)
}
}
}
2.7 Room & Stream Management
// Login to RTC room
fun loginRoom(roomId: String, userId: String, token: String, callback: (Int) -> Unit) {
val user = ZegoUser(userId)
val config = ZegoRoomConfig().apply { this.token = token }
engine?.loginRoom(roomId, user, config) { errorCode, _ ->
callback(errorCode)
}
}
// Publish user's audio stream
fun startPublishing(streamId: String) {
engine?.startPublishingStream(streamId)
}
// Play AI agent's audio stream
fun startPlaying(streamId: String) {
engine?.startPlayingStream(streamId)
}
2.8 Subtitle Display
ZEGO provides official components for subtitle parsing:
// Create parser instance
private val audioChatMessageParser = AudioChatMessageParser()
// Set up listener
audioChatMessageParser.setAudioChatMessageListListener(object : AudioChatMessageParser.AudioChatMessageListListener {
override fun onMessageListUpdated(messagesList: MutableList<AudioChatMessage>) {
runOnUiThread {
binding.messageList.onMessageListUpdated(messagesList)
}
}
override fun onAudioChatStateUpdate(statusMessage: AudioChatAgentStatusMessage) {
// Handle status updates
}
})
// Parse incoming messages
zegoManager.onRecvExperimentalAPI = { content ->
try {
val json = JSONObject(content)
if (json.getString("method") == "liveroom.room.on_recive_room_channel_message") {
val msgContent = json.getJSONObject("params").getString("msg_content")
audioChatMessageParser.parseAudioChatMessage(msgContent)
}
} catch (e: Exception) { e.printStackTrace() }
}
Download
AudioChatMessageParser.javaandAIChatListView.javafrom ZEGO Subtitle Guide.
2.9 Layout Design
Simple two-section UI:
<ConstraintLayout>
<!-- Top: Status + Controls -->
<LinearLayout android:id="@+id/topSection">
<TextView android:id="@+id/tvStatus" />
<Button android:id="@+id/btnCall" android:text="Start Call" />
</LinearLayout>
<!-- Bottom: Subtitles -->
<com.zegocloud.aiagent.subtitle.AIChatListView
android:id="@+id/messageList" />
</ConstraintLayout>
2.10 Complete Call Flow (ViewModel)
Orchestrate the entire conversation flow:
class MainViewModel(private val zegoManager: ZegoExpressManager) : ViewModel() {
fun startCall() {
viewModelScope.launch {
_isLoading.value = true
try {
val roomId = AppConfig.generateRoomId()
val userId = AppConfig.generateUserId()
val userStreamId = "${roomId}_${userId}_main"
zegoManager.initEngine()
// 1. Get token
val token = ApiService.getToken(userId)
?: throw Exception("Failed to get token")
// 2. Login room
val loginResult = zegoManager.loginRoom(roomId, userId, token)
if (loginResult != 0) throw Exception("Login failed")
// 3. Publish stream
zegoManager.startPublishing(userStreamId)
// 4. Start AI agent
val agentStreamId = ApiService.startAgent(roomId, userId, userStreamId)
?: throw Exception("Failed to start agent")
// 5. Play agent stream
zegoManager.startPlaying(agentStreamId)
_isConnected.value = true
_currentRoomId = roomId
} catch (e: Exception) {
_error.value = e.message
} finally {
_isLoading.value = false
}
}
}
fun endCall() {
viewModelScope.launch {
_currentRoomId?.let { roomId ->
ApiService.stopAgent(roomId)
zegoManager.logoutRoom(roomId)
}
_isConnected.value = false
_currentRoomId = null
}
}
}
| Step | Action | Method |
|---|---|---|
| 1-2 | Authentication | ApiService.getToken() |
| 3 | Connect & Publish |
loginRoom() + startPublishing()
|
| 4-8 | Launch AI | ApiService.startAgent() |
| 9 | Receive Response | startPlaying() |
| 10-11 | Cleanup | ApiService.stopAgent() |
Conclusion
You now have a production-ready AI chatbot for Android with full voice conversation capabilities. Here's what you've built:
- Secure token-based authentication
- Real-time voice streaming via ZEGOCLOUD
- AI-powered responses with natural TTS
- Live subtitle display for accessibility
Next steps to consider:
- Customize the
SYSTEM_PROMPTto match your brand voice - Add visual avatars for enhanced engagement
- Test on physical devices for optimal audio performance
The same architecture scales to customer service bots, language tutors, AI companions, and more. The foundation is solid—now it's time to innovate.
Ready to ship? Start with ZEGOCLOUD and deploy your Android AI chatbot today.
Questions or feedback? Reach out to the ZEGOCLOUD community for support and inspiration.
Top comments (0)