Stephen568hub

Posted on Dec 11, 2025

How to Build a Conversational AI on Android

#ai #tutorial #zegocloud #developer

Building conversational AI for Android is becoming one of the most in-demand capabilities in mobile development. With the rapid rise of real-time voice assistants, interactive agents, and AI-driven communication, developers are increasingly looking for efficient ways to bring natural voice interaction into Android apps.

In the past, implementing conversational AI required several different components working together. Speech recognition, large language models, text-to-speech, and low-latency audio processing all needed to be manually connected, which led to long development cycles and complex engineering work.

ZEGOCLOUD simplifies this entire workflow by offering ready-to-use components designed for real-time conversational AI. In this tutorial, you will learn how to create a voice-driven AI bot for Android using ZEGOCLOUD. By the end, you will have a working prototype that supports natural, continuous voice conversations suitable for production.

How to Build a Conversational AI on Android

Download Complete Source Code

Component	Repository
Backend Server + Web Client	https://github.com/ZEGOCLOUD/blog-aiagent-server-and-web
Android Client	https://github.com/ZEGOCLOUD/blog-aiagent-android

Prerequisites

Before starting the implementation, prepare the following:

Android Studio (latest stable version)
A ZEGOCLOUD account
Some familiarity with Kotlin
A backend service for handling AI logic. This tutorial uses a Next.js server deployed on Vercel

Once everything is ready, you can start building.

Architecture Overview

ZEGOCLOUD’s Conversational AI relies on both server and client components.

Why Both Are Needed

ZEGOCLOUD processes audio in the cloud. It performs:

1. Speech recognition
2. LLM response generation
3. Text-to-speech audio output

The Android client streams audio and plays back AI results. The backend securely stores API keys and generates authentication tokens.

Component	Responsibilities
Backend Server	Stores credentials, generates tokens, manages AI agent lifecycle
Android Client	Captures voice, streams audio, plays AI responses, displays subtitles
ZEGOCLOUD	Runs ASR to LLM to TTS pipeline in real time

Steps to Build a Conversational AI on Android

Step 1: Set Up Your Backend Server

This example uses a simple Next.js backend deployed to Vercel.

1.1 Environment Variables

Create .env.local:

# ZEGO Configuration
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_server_secret_32_chars

# AI Agent Configuration
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant

# System Prompt
SYSTEM_PROMPT="You are my best friend whom I can talk to about anything. You're warm, understanding, and always there for me."

# LLM Configuration
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name

# TTS Configuration
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts

1.2 Token Generation API

// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';

function generateToken(appId: number, userId: string, secret: string,
                       effectiveTimeInSeconds: number): string {
  const tokenInfo = {
    app_id: appId,
    user_id: userId,
    nonce: Math.floor(Math.random() * 2147483647),
    ctime: Math.floor(Date.now() / 1000),
    expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
    payload: ''
  };

  const plainText = JSON.stringify(tokenInfo);
  const nonce = crypto.randomBytes(12);
  const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
  const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
                                   cipher.final(), cipher.getAuthTag()]);

  const buf = Buffer.concat([
    Buffer.alloc(8).writeBigInt64BE(BigInt(tokenInfo.expire), 0) || Buffer.alloc(8),
    Buffer.from([0, 12]), nonce,
    Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
    Buffer.from([1])
  ]);
  return '04' + buf.toString('base64');
}

export async function POST(request: NextRequest) {
  const { userId } = await request.json();
  const token = generateToken(
    parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
    userId,
    process.env.ZEGO_SERVER_SECRET!,
    3600
  );
  return NextResponse.json({ token });
}

1.3 AI Agent API Signature

// app/api/zego/utils.ts
import crypto from 'crypto';

export function generateSignature(appId: number, signatureNonce: string,
                                  serverSecret: string, timestamp: number): string {
  const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
  return crypto.createHash('md5').update(str).digest('hex');
}

export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
  const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
  const serverSecret = process.env.ZEGO_SERVER_SECRET!;
  const signatureNonce = crypto.randomBytes(8).toString('hex');
  const timestamp = Math.floor(Date.now() / 1000);
  const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);

  const url = new URL('https://aigc-aiagent-api.zegotech.cn');
  url.searchParams.set('Action', action);
  url.searchParams.set('AppId', appId.toString());
  url.searchParams.set('SignatureNonce', signatureNonce);
  url.searchParams.set('Timestamp', timestamp.toString());
  url.searchParams.set('Signature', signature);
  url.searchParams.set('SignatureVersion', '2.0');

  const response = await fetch(url.toString(), {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body)
  });

  const result = await response.json();
  return result.Data as T;
}

1.4 Deploy to Vercel

1. Push repo to GitHub
2. Import repo into Vercel
3. Add environment variables
4. Deploy

Step 2: Build the Android Client

2.1 Create Android Project

Language: Kotlin
Minimum SDK: API 24

2.2 Integrate ZEGOCLOUD AI Agent Optimized SDK

This version supports subtitle messages through onRecvExperimentalAPI, which the public Maven version does not support.

Manual Integration Steps

Download the AI Agent optimized SDK
Copy these files into your project:

app/libs/ZegoExpressEngine.jar
app/libs/arm64-v8a/libZegoExpressEngine.so
app/libs/armeabi-v7a/libZegoExpressEngine.so

build.gradle

android {
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'
        }
    }
    sourceSets {
        main {
            jniLibs.srcDirs = ['libs']
        }
    }
}

dependencies {
    implementation files('libs/ZegoExpressEngine.jar')
    implementation 'com.squareup.okhttp3:okhttp:4.12.0'
    implementation 'com.google.code.gson:gson:2.10.1'
    implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
}

2.3 Permissions

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

2.4 App Configuration

object AppConfig {
    const val APP_ID: Long = 123456789L
    const val SERVER_URL = "https://your-project.vercel.app"

    fun generateUserId(): String = "user${System.currentTimeMillis() % 100000}"
    fun generateRoomId(): String = "room${System.currentTimeMillis() % 100000}"
}

2.5 Initialize ZEGO Engine

class ZegoExpressManager(private val application: Application) {
    private var engine: ZegoExpressEngine? = null

    fun initEngine() {
        val profile = ZegoEngineProfile().apply {
            appID = AppConfig.APP_ID
            scenario = ZegoScenario.HIGH_QUALITY_CHATROOM
            application = this@ZegoExpressManager.application
        }

        engine = ZegoExpressEngine.createEngine(profile, eventHandler)

        engine?.apply {
            enableAGC(true)
            enableANS(true)
            setANSMode(ZegoANSMode.MEDIUM)
        }
    }
}

2.6 Room Login, Publishing and Playing Streams

fun loginRoom(roomId: String, userId: String, token: String, callback: (Int) -> Unit) {
    val user = ZegoUser(userId)
    val config = ZegoRoomConfig().apply {
        this.token = token
    }
    engine?.loginRoom(roomId, user, config) { errorCode, _ ->
        callback(errorCode)
    }
}

fun startPublishing(streamId: String) {
    engine?.startPublishingStream(streamId)
}

fun startPlaying(streamId: String) {
    engine?.startPlayingStream(streamId)
}

2.7 Subtitle Display

private val audioChatMessageParser = AudioChatMessageParser()

audioChatMessageParser.setAudioChatMessageListListener(object :
    AudioChatMessageParser.AudioChatMessageListListener {
    override fun onMessageListUpdated(messagesList: MutableList<AudioChatMessage>) {
        runOnUiThread {
            binding.messageList.onMessageListUpdated(messagesList)
        }
    }
    override fun onAudioChatStateUpdate(statusMessage: AudioChatAgentStatusMessage) {}
})

zegoManager.onRecvExperimentalAPI = { content ->
    try {
        val json = JSONObject(content)
        if (json.getString("method") == "liveroom.room.on_recive_room_channel_message") {
            val msgContent = json.getJSONObject("params").getString("msg_content")
            audioChatMessageParser.parseAudioChatMessage(msgContent)
        }
    } catch (e: Exception) { e.printStackTrace() }
}

2.8 UI Layout

<ConstraintLayout>
    <LinearLayout android:id="@+id/topSection">
        <TextView android:id="@+id/tvStatus" />
        <Button android:id="@+id/btnCall" android:text="Start Call" />
    </LinearLayout>

    <com.zegocloud.aiagent.subtitle.AIChatListView
        android:id="@+id/messageList" />
</ConstraintLayout>

2.9 Complete Call Flow

class MainViewModel(private val zegoManager: ZegoExpressManager) : ViewModel() {

    fun startCall() {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                val roomId = AppConfig.generateRoomId()
                val userId = AppConfig.generateUserId()
                val userStreamId = "${roomId}_${userId}_main"

                zegoManager.initEngine()

                val token = ApiService.getToken(userId)
                    ?: throw Exception("Failed to get token")

                val loginResult = zegoManager.loginRoom(roomId, userId, token)
                if (loginResult != 0) throw Exception("Failed to login room")

                zegoManager.startPublishing(userStreamId)

                val agentStreamId = ApiService.startAgent(roomId, userId, userStreamId)
                    ?: throw Exception("Failed to start agent")

                zegoManager.startPlaying(agentStreamId)

                _isConnected.value = true
                _currentRoomId = roomId
            } catch (e: Exception) {
                _error.value = e.message
            } finally {
                _isLoading.value = false
            }
        }
    }

    fun endCall() {
        viewModelScope.launch {
            _currentRoomId?.let { roomId ->
                ApiService.stopAgent(roomId)
                zegoManager.logoutRoom(roomId)
            }
            _isConnected.value = false
            _currentRoomId = null
        }
    }
}

Step 3: Run the Demo

You can now test real-time conversational AI on a real Android device.
https://youtu.be/TSEVDlY_l4M

Conclusion

You have built a complete conversational AI system on Android that includes:

A secure backend
A Kotlin-based Android client
Real-time speech processing and AI responses
Live subtitle rendering

You can continue by refining the system prompt, adding avatar animations, or integrating richer UI components. This setup works well for AI assistants, customer support bots, AI companions, learning tutors, and more.

If you want to continue exploring, you can start with ZEGOCLOUD’s conversational AI, which provides enough resources for extensive prototyping.

DEV Community

How to Build a Conversational AI on Android

How to Build a Conversational AI on Android

Download Complete Source Code

Prerequisites

Architecture Overview

Why Both Are Needed

Steps to Build a Conversational AI on Android

Step 1: Set Up Your Backend Server

1.1 Environment Variables

1.2 Token Generation API

1.3 AI Agent API Signature

1.4 Deploy to Vercel

Step 2: Build the Android Client

2.1 Create Android Project

2.2 Integrate ZEGOCLOUD AI Agent Optimized SDK

2.3 Permissions

2.4 App Configuration

2.5 Initialize ZEGO Engine

2.6 Room Login, Publishing and Playing Streams

2.7 Subtitle Display

2.8 UI Layout

2.9 Complete Call Flow

Step 3: Run the Demo

Conclusion

Top comments (0)