DEV Community

Cover image for How to Build an AI Chatbot on Android
Stephen568hub
Stephen568hub

Posted on

How to Build an AI Chatbot on Android

Voice-first AI applications are reshaping mobile experiences. Users expect more than text input—they want natural, spoken conversations with their apps. If you're looking to build an AI chatbot for Android, you've come to the right place.

This tutorial walks you through creating a fully functional Android AI chatbot from scratch. We'll use ZEGOCLOUD's Conversational AI platform to handle speech recognition, language processing, and text-to-speech—all through a clean, integrated SDK. By the end, your Android app will have real-time voice conversation capabilities that rival the big tech apps.

Let's get started.

Quick Links

Resource Repository
Backend Server + Web Client github.com/ZEGOCLOUD/blog-aiagent-server-and-web
Android Client github.com/ZEGOCLOUD/blog-aiagent-android

Prerequisites

Before diving in, ensure you have:

  • Android Studio (latest stable version)
  • Kotlin programming experience
  • ZEGOCLOUD AccountCreate free account
  • Backend deployment — We'll use Next.js on Vercel (one-click deploy)

Architecture Overview

Understanding the Components

ZEGOCLOUD's Conversational AI runs as a cloud service handling three core functions:

Service Function
ASR (Automatic Speech Recognition) Converts voice to text
LLM (Large Language Model) Processes and generates responses
TTS (Text-to-Speech) Converts text back to natural speech

System Flow

sequenceDiagram
    participant User as Android Client
    participant Server as Backend Server
    participant ZEGO as ZEGO Cloud
    participant LLM as LLM Service
    participant TTS as TTS Service

    User->>Server: 1. Request Token
    Server-->>User: 2. Return Token

    User->>ZEGO: 3. Login Room & Publish Audio Stream
    User->>Server: 4. Request to Start AI Agent
    Server->>ZEGO: 5. Register Agent (LLM/TTS config)
    Server->>ZEGO: 6. Create Agent Instance
    ZEGO-->>Server: 7. Return Agent Instance ID
    Server-->>User: 8. Return Agent Stream ID

    User->>ZEGO: 9. Play Agent's Audio Stream

    loop Conversation
        User->>ZEGO: User speaks (audio stream)
        ZEGO->>ZEGO: ASR: Speech to Text
        ZEGO->>LLM: Send text to LLM
        LLM-->>ZEGO: LLM response
        ZEGO->>TTS: Text to Speech
        TTS-->>ZEGO: Audio data
        ZEGO-->>User: AI voice + Subtitles
    end

    User->>Server: 10. Request to Stop AI Agent
    Server->>ZEGO: 11. Delete Agent Instance
Enter fullscreen mode Exit fullscreen mode

Why Do We Need Both Server and Client?

A common question when building an AI chatbot on Android applications: why can't the client connect directly to ZEGOCLOUD? The answer lies in security and architectural design.

Component Role Security Boundary
Backend Server Token generation, AI agent management, API signing Trusted environment (your control)
Android Client Audio capture, stream publishing, UI display Untrusted environment (user devices)
ZEGO Cloud ASR, LLM, TTS processing External service

Key reasons for this architecture:

  1. Secret Protection — Your ZEGO_SERVER_SECRET and LLM API keys never leave your server. If these credentials were embedded in the Android app, attackers could decompile it and steal them.

  2. Token-Based Authentication — Clients request short-lived tokens (typically 1 hour) instead of using permanent credentials. This limits the damage if a token is compromised.

  3. Centralized AI Configuration — System prompts, LLM settings, and TTS voices are configured on your server, not in the client. This allows you to update AI behavior without app updates.

  4. Audit & Rate Limiting — Your server can log all AI agent requests, implement rate limiting, and monitor usage patterns.

This separation ensures your build an AI chatbot for Android deployment remains secure and maintainable in production.

Step 1: Set Up the Backend

Your backend handles authentication and AI agent management. We use Next.js for simplicity.

1.1 Environment Configuration

Create a .env.local file in your server root:

# ZEGO Credentials (from Console: https://console.zegocloud.com/)
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_32_char_secret

# AI Agent Settings
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant

# Define AI Personality
SYSTEM_PROMPT="You are my best friend who I can talk to about anything. You're warm, understanding, and always there for me. Respond naturally like a close friend would."

# LLM Provider (OpenAI, Doubao, Claude, etc.)
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name

# Text-to-Speech Configuration
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts
Enter fullscreen mode Exit fullscreen mode
Variable Purpose Where to Get
NEXT_PUBLIC_ZEGO_APP_ID Application identifier ZEGOCLOUD Console → Project Settings
ZEGO_SERVER_SECRET 32-character secret key ZEGOCLOUD Console → Project Settings
SYSTEM_PROMPT AI behavior definition Customize for your use case
LLM_* Language model config From your LLM provider
TTS_* Voice synthesis settings Use test values or your TTS service

1.2 Token Generation API

// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';

function generateToken(appId: number, userId: string, secret: string,
                       effectiveTimeInSeconds: number): string {
  const tokenInfo = {
    app_id: appId,
    user_id: userId,
    nonce: Math.floor(Math.random() * 2147483647),
    ctime: Math.floor(Date.now() / 1000),
    expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
    payload: ''
  };

  const plainText = JSON.stringify(tokenInfo);
  const nonce = crypto.randomBytes(12);
  const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
  const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
                                   cipher.final(), cipher.getAuthTag()]);

  const buf = Buffer.concat([
    Buffer.alloc(8).writeBigInt64BE(BigInt(tokenInfo.expire), 0) || Buffer.alloc(8),
    Buffer.from([0, 12]), nonce,
    Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
    Buffer.from([1]) // GCM mode
  ]);
  return '04' + buf.toString('base64');
}

export async function POST(request: NextRequest) {
  const { userId } = await request.json();
  const token = generateToken(
    parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
    userId,
    process.env.ZEGO_SERVER_SECRET!,
    3600
  );
  return NextResponse.json({ token });
}
Enter fullscreen mode Exit fullscreen mode

1.3 AI Agent Signature Utility

// app/api/zego/utils.ts
import crypto from 'crypto';

export function generateSignature(appId: number, signatureNonce: string,
                                  serverSecret: string, timestamp: number): string {
  const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
  return crypto.createHash('md5').update(str).digest('hex');
}

export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
  const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
  const serverSecret = process.env.ZEGO_SERVER_SECRET!;
  const signatureNonce = crypto.randomBytes(8).toString('hex');
  const timestamp = Math.floor(Date.now() / 1000);
  const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);

  const url = new URL('https://aigc-aiagent-api.zegotech.cn');
  url.searchParams.set('Action', action);
  url.searchParams.set('AppId', appId.toString());
  url.searchParams.set('SignatureNonce', signatureNonce);
  url.searchParams.set('Timestamp', timestamp.toString());
  url.searchParams.set('Signature', signature);
  url.searchParams.set('SignatureVersion', '2.0');

  const response = await fetch(url.toString(), {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body)
  });

  const result = await response.json();
  return result.Data as T;
}
Enter fullscreen mode Exit fullscreen mode

1.4 Deploy to Vercel

  1. Push code to GitHub
  2. Import repository in Vercel
  3. Configure environment variables
  4. Click Deploy

Your server will be live at https://your-project.vercel.app.


Step 2: Build the Android Client

Now we'll create an AI chatbot in Android using Kotlin and the ZEGOCLOUD SDK.

2.1 Create New Project

Start a new Android project:

  • Language: Kotlin
  • Minimum SDK: API 24

2.2 Add Dependencies

Important: Use the AI Agent-optimized ZEGO SDK. The standard Maven version won't support subtitle callbacks (onRecvExperimentalAPI).

Download from: ZEGO AI Agent SDK

Integration steps:

  1. Extract and copy files:

    • ZegoExpressEngine.jarapp/libs/
    • libZegoExpressEngine.so (arm64) → app/libs/arm64-v8a/
    • libZegoExpressEngine.so (v7a) → app/libs/armeabi-v7a/
  2. Update app/build.gradle:

android {
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'
        }
    }

    sourceSets {
        main {
            jniLibs.srcDirs = ['libs']
        }
    }
}

dependencies {
    // ZEGO Express SDK (AI Agent version)
    implementation files('libs/ZegoExpressEngine.jar')

    // Networking & JSON
    implementation 'com.squareup.okhttp3:okhttp:4.12.0'
    implementation 'com.google.code.gson:gson:2.10.1'

    // Coroutines
    implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
}
Enter fullscreen mode Exit fullscreen mode

2.3 Add Permissions

In AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
Enter fullscreen mode Exit fullscreen mode

2.4 Configuration Class

Centralize your app settings:

// config/AppConfig.kt
object AppConfig {
    // Must match NEXT_PUBLIC_ZEGO_APP_ID in backend
    const val APP_ID: Long = 1234567890L

    // Your Vercel deployment URL
    const val SERVER_URL = "https://your-project.vercel.app"

    // Helper functions for unique IDs
    fun generateUserId(): String = "user${System.currentTimeMillis() % 100000}"
    fun generateRoomId(): String = "room${System.currentTimeMillis() % 100000}"
}
Enter fullscreen mode Exit fullscreen mode

Note: Use http://10.0.2.2:3000 for emulator testing to access localhost.

2.5 API Service Layer

Handle all backend communication:

// api/ApiService.kt
object ApiService {
    private val client = OkHttpClient.Builder()
        .connectTimeout(30, TimeUnit.SECONDS)
        .readTimeout(30, TimeUnit.SECONDS)
        .build()
    private val gson = Gson()
    private val JSON = "application/json; charset=utf-8".toMediaType()

    // Get authentication token
    suspend fun getToken(userId: String): TokenResponse = withContext(Dispatchers.IO) {
        val body = mapOf("userId" to userId)
        val request = Request.Builder()
            .url("${AppConfig.SERVER_URL}/api/zego/token")
            .post(gson.toJson(body).toRequestBody(JSON))
            .build()

        val response = client.newCall(request).execute()
        val responseBody = response.body?.string()
        gson.fromJson(responseBody, TokenResponse::class.java)
    }

    // Start AI agent instance
    suspend fun startAgent(roomId: String, userId: String, userStreamId: String): AgentResponse =
        withContext(Dispatchers.IO) {
            val body = mapOf(
                "roomId" to roomId,
                "userId" to userId,
                "userStreamId" to userStreamId
            )
            val request = Request.Builder()
                .url("${AppConfig.SERVER_URL}/api/zego/start")
                .post(gson.toJson(body).toRequestBody(JSON))
                .build()

            val response = client.newCall(request).execute()
            val responseBody = response.body?.string()
            gson.fromJson(responseBody, AgentResponse::class.java)
        }

    // Stop AI agent
    suspend fun stopAgent(agentInstanceId: String): StopResponse = withContext(Dispatchers.IO) {
        val body = mapOf("agentInstanceId" to agentInstanceId)
        val request = Request.Builder()
            .url("${AppConfig.SERVER_URL}/api/zego/stop")
            .post(gson.toJson(body).toRequestBody(JSON))
            .build()

        val response = client.newCall(request).execute()
        val responseBody = response.body?.string()
        gson.fromJson(responseBody, StopResponse::class.java)
    }
}

// Response data classes
data class TokenResponse(val code: Int?, val data: TokenData?, val message: String?)
data class TokenData(val token: String?)

data class AgentResponse(val code: Int?, val data: AgentData?, val message: String?)
data class AgentData(val agentInstanceId: String?, val agentUserId: String?, val agentStreamId: String?)

data class StopResponse(val code: Int?, val message: String?)
Enter fullscreen mode Exit fullscreen mode

2.6 Initialize ZEGO Engine

Set up audio-optimized RTC:

class ZegoExpressManager(private val application: Application) {
    private var engine: ZegoExpressEngine? = null

    fun initEngine() {
        val profile = ZegoEngineProfile().apply {
            appID = AppConfig.APP_ID
            scenario = ZegoScenario.HIGH_QUALITY_CHATROOM
            application = this@ZegoExpressManager.application
        }

        engine = ZegoExpressEngine.createEngine(profile, eventHandler)

        // Audio optimization
        engine?.apply {
            enableAGC(true)  // Automatic Gain Control
            enableANS(true)  // Noise Suppression
            setANSMode(ZegoANSMode.MEDIUM)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

2.7 Room & Stream Management

// Login to RTC room
fun loginRoom(roomId: String, userId: String, token: String, callback: (Int) -> Unit) {
    val user = ZegoUser(userId)
    val config = ZegoRoomConfig().apply { this.token = token }
    engine?.loginRoom(roomId, user, config) { errorCode, _ ->
        callback(errorCode)
    }
}

// Publish user's audio stream
fun startPublishing(streamId: String) {
    engine?.startPublishingStream(streamId)
}

// Play AI agent's audio stream
fun startPlaying(streamId: String) {
    engine?.startPlayingStream(streamId)
}
Enter fullscreen mode Exit fullscreen mode

2.8 Subtitle Display

ZEGO provides official components for subtitle parsing:

// Create parser instance
private val audioChatMessageParser = AudioChatMessageParser()

// Set up listener
audioChatMessageParser.setAudioChatMessageListListener(object : AudioChatMessageParser.AudioChatMessageListListener {
    override fun onMessageListUpdated(messagesList: MutableList<AudioChatMessage>) {
        runOnUiThread {
            binding.messageList.onMessageListUpdated(messagesList)
        }
    }

    override fun onAudioChatStateUpdate(statusMessage: AudioChatAgentStatusMessage) {
        // Handle status updates
    }
})

// Parse incoming messages
zegoManager.onRecvExperimentalAPI = { content ->
    try {
        val json = JSONObject(content)
        if (json.getString("method") == "liveroom.room.on_recive_room_channel_message") {
            val msgContent = json.getJSONObject("params").getString("msg_content")
            audioChatMessageParser.parseAudioChatMessage(msgContent)
        }
    } catch (e: Exception) { e.printStackTrace() }
}
Enter fullscreen mode Exit fullscreen mode

Download AudioChatMessageParser.java and AIChatListView.java from ZEGO Subtitle Guide.

2.9 Layout Design

Simple two-section UI:

<ConstraintLayout>
    <!-- Top: Status + Controls -->
    <LinearLayout android:id="@+id/topSection">
        <TextView android:id="@+id/tvStatus" />
        <Button android:id="@+id/btnCall" android:text="Start Call" />
    </LinearLayout>

    <!-- Bottom: Subtitles -->
    <com.zegocloud.aiagent.subtitle.AIChatListView
        android:id="@+id/messageList" />
</ConstraintLayout>
Enter fullscreen mode Exit fullscreen mode

2.10 Complete Call Flow (ViewModel)

Orchestrate the entire conversation flow:

class MainViewModel(private val zegoManager: ZegoExpressManager) : ViewModel() {

    fun startCall() {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                val roomId = AppConfig.generateRoomId()
                val userId = AppConfig.generateUserId()
                val userStreamId = "${roomId}_${userId}_main"

                zegoManager.initEngine()

                // 1. Get token
                val token = ApiService.getToken(userId)
                    ?: throw Exception("Failed to get token")

                // 2. Login room
                val loginResult = zegoManager.loginRoom(roomId, userId, token)
                if (loginResult != 0) throw Exception("Login failed")

                // 3. Publish stream
                zegoManager.startPublishing(userStreamId)

                // 4. Start AI agent
                val agentStreamId = ApiService.startAgent(roomId, userId, userStreamId)
                    ?: throw Exception("Failed to start agent")

                // 5. Play agent stream
                zegoManager.startPlaying(agentStreamId)

                _isConnected.value = true
                _currentRoomId = roomId
            } catch (e: Exception) {
                _error.value = e.message
            } finally {
                _isLoading.value = false
            }
        }
    }

    fun endCall() {
        viewModelScope.launch {
            _currentRoomId?.let { roomId ->
                ApiService.stopAgent(roomId)
                zegoManager.logoutRoom(roomId)
            }
            _isConnected.value = false
            _currentRoomId = null
        }
    }
}
Enter fullscreen mode Exit fullscreen mode
Step Action Method
1-2 Authentication ApiService.getToken()
3 Connect & Publish loginRoom() + startPublishing()
4-8 Launch AI ApiService.startAgent()
9 Receive Response startPlaying()
10-11 Cleanup ApiService.stopAgent()

Conclusion

You now have a production-ready AI chatbot for Android with full voice conversation capabilities. Here's what you've built:

  • Secure token-based authentication
  • Real-time voice streaming via ZEGOCLOUD
  • AI-powered responses with natural TTS
  • Live subtitle display for accessibility

Next steps to consider:

  • Customize the SYSTEM_PROMPT to match your brand voice
  • Add visual avatars for enhanced engagement
  • Test on physical devices for optimal audio performance

The same architecture scales to customer service bots, language tutors, AI companions, and more. The foundation is solid—now it's time to innovate.

Ready to ship? Start with ZEGOCLOUD and deploy your Android AI chatbot today.

Questions or feedback? Reach out to the ZEGOCLOUD community for support and inspiration.

Top comments (0)