DEV Community

Cover image for How to Build a Conversational AI on Android
Stephen568hub
Stephen568hub

Posted on

How to Build a Conversational AI on Android

Building conversational AI for Android is becoming one of the most in-demand capabilities in mobile development. With the rapid rise of real-time voice assistants, interactive agents, and AI-driven communication, developers are increasingly looking for efficient ways to bring natural voice interaction into Android apps.

In the past, implementing conversational AI required several different components working together. Speech recognition, large language models, text-to-speech, and low-latency audio processing all needed to be manually connected, which led to long development cycles and complex engineering work.

ZEGOCLOUD simplifies this entire workflow by offering ready-to-use components designed for real-time conversational AI. In this tutorial, you will learn how to create a voice-driven AI bot for Android using ZEGOCLOUD. By the end, you will have a working prototype that supports natural, continuous voice conversations suitable for production.

How to Build a Conversational AI on Android

Download Complete Source Code

Prerequisites

Before starting the implementation, prepare the following:

  • Android Studio (latest stable version)
  • A ZEGOCLOUD account
  • Some familiarity with Kotlin
  • A backend service for handling AI logic. This tutorial uses a Next.js server deployed on Vercel

Once everything is ready, you can start building.

Architecture Overview

ZEGOCLOUD’s Conversational AI relies on both server and client components.

Why Both Are Needed

ZEGOCLOUD processes audio in the cloud. It performs:

  • 1. Speech recognition
  • 2. LLM response generation
  • 3. Text-to-speech audio output

The Android client streams audio and plays back AI results. The backend securely stores API keys and generates authentication tokens.

Component Responsibilities
Backend Server Stores credentials, generates tokens, manages AI agent lifecycle
Android Client Captures voice, streams audio, plays AI responses, displays subtitles
ZEGOCLOUD Runs ASR to LLM to TTS pipeline in real time

Steps to Build a Conversational AI on Android

Step 1: Set Up Your Backend Server

This example uses a simple Next.js backend deployed to Vercel.

1.1 Environment Variables

Create .env.local:

# ZEGO Configuration
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_server_secret_32_chars

# AI Agent Configuration
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant

# System Prompt
SYSTEM_PROMPT="You are my best friend whom I can talk to about anything. You're warm, understanding, and always there for me."

# LLM Configuration
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name

# TTS Configuration
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts

Enter fullscreen mode Exit fullscreen mode

1.2 Token Generation API

// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';

function generateToken(appId: number, userId: string, secret: string,
                       effectiveTimeInSeconds: number): string {
  const tokenInfo = {
    app_id: appId,
    user_id: userId,
    nonce: Math.floor(Math.random() * 2147483647),
    ctime: Math.floor(Date.now() / 1000),
    expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
    payload: ''
  };

  const plainText = JSON.stringify(tokenInfo);
  const nonce = crypto.randomBytes(12);
  const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
  const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
                                   cipher.final(), cipher.getAuthTag()]);

  const buf = Buffer.concat([
    Buffer.alloc(8).writeBigInt64BE(BigInt(tokenInfo.expire), 0) || Buffer.alloc(8),
    Buffer.from([0, 12]), nonce,
    Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
    Buffer.from([1])
  ]);
  return '04' + buf.toString('base64');
}

export async function POST(request: NextRequest) {
  const { userId } = await request.json();
  const token = generateToken(
    parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
    userId,
    process.env.ZEGO_SERVER_SECRET!,
    3600
  );
  return NextResponse.json({ token });
}
Enter fullscreen mode Exit fullscreen mode

1.3 AI Agent API Signature

// app/api/zego/utils.ts
import crypto from 'crypto';

export function generateSignature(appId: number, signatureNonce: string,
                                  serverSecret: string, timestamp: number): string {
  const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
  return crypto.createHash('md5').update(str).digest('hex');
}

export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
  const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
  const serverSecret = process.env.ZEGO_SERVER_SECRET!;
  const signatureNonce = crypto.randomBytes(8).toString('hex');
  const timestamp = Math.floor(Date.now() / 1000);
  const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);

  const url = new URL('https://aigc-aiagent-api.zegotech.cn');
  url.searchParams.set('Action', action);
  url.searchParams.set('AppId', appId.toString());
  url.searchParams.set('SignatureNonce', signatureNonce);
  url.searchParams.set('Timestamp', timestamp.toString());
  url.searchParams.set('Signature', signature);
  url.searchParams.set('SignatureVersion', '2.0');

  const response = await fetch(url.toString(), {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body)
  });

  const result = await response.json();
  return result.Data as T;
}
Enter fullscreen mode Exit fullscreen mode

1.4 Deploy to Vercel

  • 1. Push repo to GitHub
  • 2. Import repo into Vercel
  • 3. Add environment variables
  • 4. Deploy

Step 2: Build the Android Client

2.1 Create Android Project

  • Language: Kotlin
  • Minimum SDK: API 24

2.2 Integrate ZEGOCLOUD AI Agent Optimized SDK

This version supports subtitle messages through onRecvExperimentalAPI, which the public Maven version does not support.

Manual Integration Steps

  • Download the AI Agent optimized SDK
  • Copy these files into your project:
app/libs/ZegoExpressEngine.jar
app/libs/arm64-v8a/libZegoExpressEngine.so
app/libs/armeabi-v7a/libZegoExpressEngine.so
Enter fullscreen mode Exit fullscreen mode

build.gradle

android {
    defaultConfig {
        ndk {
            abiFilters 'armeabi-v7a', 'arm64-v8a'
        }
    }
    sourceSets {
        main {
            jniLibs.srcDirs = ['libs']
        }
    }
}

dependencies {
    implementation files('libs/ZegoExpressEngine.jar')
    implementation 'com.squareup.okhttp3:okhttp:4.12.0'
    implementation 'com.google.code.gson:gson:2.10.1'
    implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.7.3'
}
Enter fullscreen mode Exit fullscreen mode

2.3 Permissions

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
Enter fullscreen mode Exit fullscreen mode

2.4 App Configuration

object AppConfig {
    const val APP_ID: Long = 123456789L
    const val SERVER_URL = "https://your-project.vercel.app"

    fun generateUserId(): String = "user${System.currentTimeMillis() % 100000}"
    fun generateRoomId(): String = "room${System.currentTimeMillis() % 100000}"
}
Enter fullscreen mode Exit fullscreen mode

2.5 Initialize ZEGO Engine

class ZegoExpressManager(private val application: Application) {
    private var engine: ZegoExpressEngine? = null

    fun initEngine() {
        val profile = ZegoEngineProfile().apply {
            appID = AppConfig.APP_ID
            scenario = ZegoScenario.HIGH_QUALITY_CHATROOM
            application = this@ZegoExpressManager.application
        }

        engine = ZegoExpressEngine.createEngine(profile, eventHandler)

        engine?.apply {
            enableAGC(true)
            enableANS(true)
            setANSMode(ZegoANSMode.MEDIUM)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

2.6 Room Login, Publishing and Playing Streams

fun loginRoom(roomId: String, userId: String, token: String, callback: (Int) -> Unit) {
    val user = ZegoUser(userId)
    val config = ZegoRoomConfig().apply {
        this.token = token
    }
    engine?.loginRoom(roomId, user, config) { errorCode, _ ->
        callback(errorCode)
    }
}

fun startPublishing(streamId: String) {
    engine?.startPublishingStream(streamId)
}

fun startPlaying(streamId: String) {
    engine?.startPlayingStream(streamId)
}
Enter fullscreen mode Exit fullscreen mode

2.7 Subtitle Display

private val audioChatMessageParser = AudioChatMessageParser()

audioChatMessageParser.setAudioChatMessageListListener(object :
    AudioChatMessageParser.AudioChatMessageListListener {
    override fun onMessageListUpdated(messagesList: MutableList<AudioChatMessage>) {
        runOnUiThread {
            binding.messageList.onMessageListUpdated(messagesList)
        }
    }
    override fun onAudioChatStateUpdate(statusMessage: AudioChatAgentStatusMessage) {}
})

zegoManager.onRecvExperimentalAPI = { content ->
    try {
        val json = JSONObject(content)
        if (json.getString("method") == "liveroom.room.on_recive_room_channel_message") {
            val msgContent = json.getJSONObject("params").getString("msg_content")
            audioChatMessageParser.parseAudioChatMessage(msgContent)
        }
    } catch (e: Exception) { e.printStackTrace() }
}
Enter fullscreen mode Exit fullscreen mode

2.8 UI Layout

<ConstraintLayout>
    <LinearLayout android:id="@+id/topSection">
        <TextView android:id="@+id/tvStatus" />
        <Button android:id="@+id/btnCall" android:text="Start Call" />
    </LinearLayout>

    <com.zegocloud.aiagent.subtitle.AIChatListView
        android:id="@+id/messageList" />
</ConstraintLayout>
Enter fullscreen mode Exit fullscreen mode

2.9 Complete Call Flow

class MainViewModel(private val zegoManager: ZegoExpressManager) : ViewModel() {

    fun startCall() {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                val roomId = AppConfig.generateRoomId()
                val userId = AppConfig.generateUserId()
                val userStreamId = "${roomId}_${userId}_main"

                zegoManager.initEngine()

                val token = ApiService.getToken(userId)
                    ?: throw Exception("Failed to get token")

                val loginResult = zegoManager.loginRoom(roomId, userId, token)
                if (loginResult != 0) throw Exception("Failed to login room")

                zegoManager.startPublishing(userStreamId)

                val agentStreamId = ApiService.startAgent(roomId, userId, userStreamId)
                    ?: throw Exception("Failed to start agent")

                zegoManager.startPlaying(agentStreamId)

                _isConnected.value = true
                _currentRoomId = roomId
            } catch (e: Exception) {
                _error.value = e.message
            } finally {
                _isLoading.value = false
            }
        }
    }

    fun endCall() {
        viewModelScope.launch {
            _currentRoomId?.let { roomId ->
                ApiService.stopAgent(roomId)
                zegoManager.logoutRoom(roomId)
            }
            _isConnected.value = false
            _currentRoomId = null
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Run the Demo

You can now test real-time conversational AI on a real Android device.
https://youtu.be/TSEVDlY_l4M

Conclusion

You have built a complete conversational AI system on Android that includes:

  • A secure backend
  • A Kotlin-based Android client
  • Real-time speech processing and AI responses
  • Live subtitle rendering

You can continue by refining the system prompt, adding avatar animations, or integrating richer UI components. This setup works well for AI assistants, customer support bots, AI companions, learning tutors, and more.

If you want to continue exploring, you can start with ZEGOCLOUD’s conversational AI, which provides enough resources for extensive prototyping.

Top comments (0)