DEV Community

Cover image for How to Build an AI Voice Assistant on Android with Flutter
Stephen568hub
Stephen568hub

Posted on

How to Build an AI Voice Assistant on Android with Flutter

Voice interfaces represent the next evolution in mobile human-computer interaction. Users increasingly expect natural, hands-free communication with their applications rather than traditional touch input. For developers seeking to build an AI voice assistant for Android using Flutter, ZEGOCLOUD provides a comprehensive cross-platform solution.

This technical guide presents a systematic approach to integrating voice AI capabilities into Flutter applications. Through ZEGOCLOUD's Conversational AI platform, developers can implement automatic speech recognition, large language model processing, and neural text-to-speech within a unified Dart SDK architecture.

ZEGOCLOUD Platform Overview

ZEGOCLOUD delivers real-time communication services through cloud-based infrastructure. The Conversational AI product consolidates three distinct AI services:

Service Technology Output
ASR DeepSpeech-based recognition Transcribed text
LLM Large language model inference Contextual responses
TTS Neural voice synthesis Natural audio output

The platform abstracts complexity associated with managing separate ASR, LLM, and TTS providers while maintaining sub-second response latency suitable for conversational applications.

System Architecture

Component Interaction

sequenceDiagram
    participant C as Flutter Client
    participant B as Backend Server
    participant Z as ZEGO Cloud
    participant A as AI Services

    C->>B: Request auth token
    B-->>C: Return token

    C->>Z: Join RTC room + publish stream
    C->>B: Create AI agent request
    B->>Z: Register agent with LLM/TTS
    B->>Z: Instantiate agent
    Z-->>B: Agent stream identifier
    B-->>C: Agent stream ID

    C->>Z: Subscribe to agent audio

    Note over C,Z: Conversation loop active
    C->>Z: User audio frames
    Z->>A: ASR transcription
    A->>Z: Text payload
    Z->>A: LLM inference request
    A->>Z: Response text
    Z->>A: TTS synthesis request
    A->>Z: Audio payload
    Z-->>C: AI voice + subtitle data

    C->>B: Terminate agent
    B->>Z: Delete agent instance
Enter fullscreen mode Exit fullscreen mode

Data Flow Summary

Phase Action Endpoint
Authentication Token request /api/zego/token
Session Init Room login + publish ZEGO RTC
Agent Creation AI agent registration /api/zego/start
Media Playback Stream subscription ZEGO RTC
Conversation Bidirectional audio ZEGO Cloud
Termination Agent cleanup /api/zego/stop

Understanding the Server-Client Separation

Building a flutter ai voice assistant requires a two-tier architecture rather than direct client-to-service communication. This design decision stems from fundamental security and operational requirements.

Architectural Roles:

Layer Primary Function Credentials Stored
Backend Server Authentication, agent lifecycle, API signing Full access (server secret, LLM keys)
Flutter Client Audio I/O, stream management, UI None (receives short-lived tokens only)
ZEGO Cloud ASR, LLM, TTS processing N/A (external service)

Rationale for Separation:

  1. Credential Security — Server secrets (ZEGO_SERVER_SECRET, LLM API keys) remain in your controlled infrastructure. Mobile applications can be decompiled, making them unsuitable for permanent credential storage.

  2. Token Lifecycle Management — Clients authenticate using time-limited tokens (default 1 hour). This approach provides revocable access without exposing long-term credentials.

  3. Dynamic AI Configuration — System prompts, model selections, and voice profiles are managed server-side. Changes deploy instantly without requiring client app updates through app stores.

  4. Operational Control — Centralized logging, rate limiting, usage analytics, and abuse detection operate at the server layer, providing a single control point for production monitoring.

When you build voice ai assistant for android applications using Flutter, this architecture ensures security, maintainability, and operational visibility across both Android and iOS platforms.

Implementation Guide

Phase 1: Backend Infrastructure

Deploy a Next.js server to handle authentication and agent lifecycle management.

Environment Setup:

# .env.local configuration
NEXT_PUBLIC_ZEGO_APP_ID=<your_app_id>
ZEGO_SERVER_SECRET=<32_character_secret>

# AI Agent parameters
ZEGO_AGENT_ID=voice_assistant_01
SYSTEM_PROMPT="You are a helpful voice assistant. Provide concise, spoken-friendly responses."

# External AI services
LLM_URL=https://api.provider.com/v1/chat/completions
LLM_API_KEY=<your_api_key>
LLM_MODEL=gpt-4

# TTS configuration
TTS_VENDOR=ByteDance
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts
Enter fullscreen mode Exit fullscreen mode

Token Generation Handler:

// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';

const generateToken = (appId: number, userId: string, secret: string, ttl: number): string => {
  const payload = {
    app_id: appId,
    user_id: userId,
    nonce: Math.floor(Math.random() * 2147483647),
    ctime: Math.floor(Date.now() / 1000),
    expire: Math.floor(Date.now() / 1000) + ttl,
    payload: ''
  };

  const encrypted = crypto
    .createCipheriv('aes-256-gcm', secret, crypto.randomBytes(12))
    .update(JSON.stringify(payload), 'utf8');

  return '04' + Buffer.concat([
    Buffer.alloc(8).writeBigInt64BE(BigInt(payload.expire)),
    Buffer.from([0, 12]),
    encrypted.final(),
    Buffer.from([1])
  ]).toString('base64');
};

export async function POST(req: NextRequest) {
  const { userId } = await req.json();
  const token = generateToken(
    parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
    userId,
    process.env.ZEGO_SERVER_SECRET!,
    3600
  );
  return NextResponse.json({ token });
}
Enter fullscreen mode Exit fullscreen mode

Agent Management API:

// app/api/zego/start/route.ts
import { sendZegoRequest } from '@/lib/zego-utils';

export async function POST(req: Request) {
  const { roomId, userId, userStreamId } = await req.json();

  const agentData = await sendZegoRequest('CreateAIAgent', {
    agentId: process.env.ZEGO_AGENT_ID,
    roomId,
    userId,
    userStreamId,
    llmConfig: {
      model: process.env.LLM_MODEL,
      systemPrompt: process.env.SYSTEM_PROMPT
    },
    ttsConfig: {
      vendor: process.env.TTS_VENDOR,
      voiceType: process.env.TTS_VOICE_TYPE
    }
  });

  return Response.json({
    agentInstanceId: agentData.instanceId,
    agentStreamId: agentData.streamId
  });
}
Enter fullscreen mode Exit fullscreen mode

Phase 2: Flutter Client Implementation

With the backend infrastructure complete, we now build the Flutter client. This phase covers project configuration, network communication, ZEGOCLOUD engine initialization, and UI integration for both Android and iOS platforms.

Step 2.1: Project Configuration

Create a new Flutter project and add the required dependencies to your pubspec.yaml:

# pubspec.yaml
name: ai_voice_assistant
description: Cross-platform AI voice assistant using ZEGOCLOUD

environment:
  sdk: ^3.8.1

dependencies:
  flutter:
    sdk: flutter

  # ZEGO Express SDK
  zego_express_engine: ^3.22.0

  # HTTP client for API calls
  http: ^1.2.0

  # Permission handling
  permission_handler: ^11.3.0

  # State management (optional)
  provider: ^6.1.1
Enter fullscreen mode Exit fullscreen mode

After updating dependencies, run flutter pub get to fetch packages.

Step 2.2: Platform Permissions

Configure permissions for both Android and iOS platforms:

<!-- android/app/src/main/AndroidManifest.xml -->
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
Enter fullscreen mode Exit fullscreen mode
<!-- ios/Runner/Info.plist -->
<key>NSMicrophoneUsageDescription</key>
<string>This app requires microphone access for voice assistant functionality</string>
Enter fullscreen mode Exit fullscreen mode

Note: Without these permissions, the app will not be able to capture audio for voice interaction.

Step 2.3: Application Configuration

Create a centralized configuration class to store app-wide constants and utility functions:

// lib/config/app_config.dart
class AppConfig {
  static const int appId = 1234567890;
  static const String serverUrl = 'https://your-deployment.vercel.app';

  static String generateSessionId() {
    return 's${DateTime.now().millisecondsSinceEpoch % 1000000}';
  }

  static String generateUserId(String sessionId) {
    return 'u$sessionId';
  }

  static String getUserStreamId(String sessionId) {
    return '${sessionId}_user';
  }
}
Enter fullscreen mode Exit fullscreen mode

Important: For Android emulator testing with a local backend, use http://10.0.2.2:3000 instead of localhost to access your development server.

Step 2.4: Network Service Layer

The API service handles all HTTP communication with your backend server. It manages three core operations: authentication, agent creation, and agent cleanup.

// lib/services/api_service.dart
import 'dart:convert';
import 'package:http/http.dart' as http;
import '../config/app_config.dart';

class ApiService {
  static final _client = http.Client();

  static Future<String?> authenticate(String userId) async {
    try {
      final response = await _client.post(
        Uri.parse('${AppConfig.serverUrl}/api/zego/token'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({'userId': userId}),
      );

      if (response.statusCode == 200) {
        final data = jsonDecode(response.body);
        if (data['code'] == 0 && data['data'] != null) {
          return data['data']['token'] as String?;
        }
      }
      return null;
    } catch (e) {
      debugPrint('[ApiService] Authentication error: $e');
      return null;
    }
  }

  static Future<String?> createAgent({
    required String roomId,
    required String userId,
    required String userStreamId,
  }) async {
    try {
      final response = await _client.post(
        Uri.parse('${AppConfig.serverUrl}/api/zego/start'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({
          'roomId': roomId,
          'userId': userId,
          'userStreamId': userStreamId,
        }),
      );

      if (response.statusCode == 200) {
        final data = jsonDecode(response.body);
        if (data['code'] == 0 && data['data'] != null) {
          return data['data']['agentStreamId'] as String?;
        }
      }
      return null;
    } catch (e) {
      debugPrint('[ApiService] Create agent error: $e');
      return null;
    }
  }

  static Future<bool> destroyAgent(String agentInstanceId) async {
    try {
      final response = await _client.post(
        Uri.parse('${AppConfig.serverUrl}/api/zego/stop'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({'agentInstanceId': agentInstanceId}),
      );
      return response.statusCode == 200;
    } catch (e) {
      debugPrint('[ApiService] Destroy agent error: $e');
      return false;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Implementation notes:

  • All methods are static for simple access without instantiation
  • Error handling returns null or false on failure — calling code should handle these cases
  • The http.Client is reused across requests for efficiency
  • Debug print statements help with development debugging

Step 2.5: RTC Engine Manager

The ZegoEngineManager class encapsulates all ZEGOCLOUD Express SDK operations, providing a clean interface for the rest of your application. It uses a singleton pattern to ensure only one engine instance exists:

// lib/services/zego_engine_manager.dart
import 'package:zego_express_engine/zego_express_engine.dart';
import '../config/app_config.dart';

class ZegoEngineManager {
  static final ZegoEngineManager _instance = ZegoEngineManager._internal();
  factory ZegoEngineManager() => _instance;
  ZegoEngineManager._internal();

  Function(String, ZegoRoomStateChangedReason, int)? onRoomStateChanged;
  bool _isInitialized = false;

  Future<void> initialize() async {
    if (_isInitialized) return;

    // Configure engine
    final config = ZegoEngineConfig(advancedConfig: {
      'set_audio_volume_ducking_mode': '1',
      'enable_rnd_volume_adaptive': 'true',
    });
    await ZegoExpressEngine.setEngineConfig(config);

    // Create engine with profile
    final profile = ZegoEngineProfile(
      AppConfig.appId,
      ZegoScenario.HighQualityChatroom,
      enablePlatformView: false,
    );
    await ZegoExpressEngine.createEngineWithProfile(profile);

    // Setup event handlers
    _setupEventHandlers();

    // Audio optimization for voice interaction
    await _configureAudioSettings();

    _isInitialized = true;
  }

  void _setupEventHandlers() {
    ZegoExpressEngine.onRoomStateChanged = (roomID, reason, errorCode, _) {
      onRoomStateChanged?.call(roomID, reason, errorCode);
    };
  }

  Future<void> _configureAudioSettings() async {
    final engine = ZegoExpressEngine.instance;
    await engine.enableAGC(true);      // Gain control
    await engine.enableANS(true);      // Noise suppression
    await engine.setANSMode(ZegoANSMode.Medium);
    await engine.enableAEC(true);      // Echo cancellation
    await engine.setAECMode(ZegoAECMode.AIBalanced);
  }

  Future<int> joinRoom(String roomId, String userId, String token) async {
    final user = ZegoUser(userId, userId);
    final config = ZegoRoomConfig(0, true, token);
    final result = await ZegoExpressEngine.instance.loginRoom(
      roomId, user, config: config,
    );
    return result.errorCode;
  }

  Future<void> publishAudio(String streamId) async {
    await ZegoExpressEngine.instance.muteMicrophone(false);
    await ZegoExpressEngine.instance.startPublishingStream(streamId);
  }

  Future<void> playAudio(String streamId) async {
    await ZegoExpressEngine.instance.startPlayingStream(streamId);
  }

  Future<void> leaveRoom(String roomId) async {
    await ZegoExpressEngine.instance.stopPublishingStream();
    await ZegoExpressEngine.instance.logoutRoom(roomId);
  }

  Future<void> destroy() async {
    if (!_isInitialized) return;
    await ZegoExpressEngine.destroyEngine();
    _isInitialized = false;
  }
}
Enter fullscreen mode Exit fullscreen mode

Audio optimization explanation:

Setting Purpose Recommended Value
AGC Automatic Gain Control — normalizes volume levels Enabled
ANS Automatic Noise Suppression — reduces background noise Medium mode
AEC Acoustic Echo Cancellation — prevents echo feedback AI Balanced mode

Step 2.6: Subtitle Integration

ZEGOCLOUD provides official subtitle components for Flutter that handle ASR and LLM message parsing. These components are part of the AI Agent SDK.

Download required files from ZEGO Flutter Subtitle Guide:

  • lib/audio/subtitles/model.dart
  • lib/audio/subtitles/view.dart
  • lib/audio/subtitles/message_protocol.dart
  • lib/audio/subtitles/message_dispatcher.dart

Add these files to your project structure, then create a handler wrapper:

// lib/services/subtitle_handler.dart
import '../audio/subtitles/model.dart';
import '../audio/subtitles/message_protocol.dart';
import '../audio/subtitles/message_dispatcher.dart';

class SubtitleHandler implements ZegoSubtitlesEventHandler {
  final ZegoSubtitlesViewModel model;

  SubtitleHandler(this.model) {
    ZegoSubtitlesMessageDispatcher().registerEventHandler(this);
  }

  @override
  void onRecvAsrChatMsg(ZegoSubtitlesMessageProtocol message) {
    model.handleRecvAsrMessage(message);
  }

  @override
  void onRecvLLMChatMsg(ZegoSubtitlesMessageProtocol message) {
    model.handleRecvLLMMessage(message);
  }

  void onExperimentalAPI(String content) {
    ZegoSubtitlesMessageDispatcher.handleExpressExperimentalAPIContent(content);
  }

  void dispose() {
    ZegoSubtitlesMessageDispatcher().unregisterEventHandler(this);
  }
}
Enter fullscreen mode Exit fullscreen mode

How subtitles work:

  1. The onRecvExperimentalAPI callback receives raw room channel messages
  2. ZegoSubtitlesMessageDispatcher parses and routes messages based on type
  3. SubtitleHandler forwards ASR/LLM messages to the ZegoSubtitlesViewModel
  4. The ZegoSubtitlesView widget displays messages with proper formatting

Step 2.7: UI Integration

Finally, create the main voice assistant page that ties all components together:

// lib/ui/voice_assistant_page.dart
import 'package:flutter/material.dart';
import 'package:zego_express_engine/zego_express_engine.dart';
import '../services/zego_engine_manager.dart';
import '../services/api_service.dart';
import '../services/subtitle_handler.dart';
import '../audio/subtitles/model.dart';
import '../audio/subtitles/view.dart';
import '../config/app_config.dart';

class VoiceAssistantPage extends StatefulWidget {
  const VoiceAssistantPage({super.key});

  @override
  State<VoiceAssistantPage> createState() => _VoiceAssistantPageState();
}

class _VoiceAssistantPageState extends State<VoiceAssistantPage> {
  final _engineManager = ZegoEngineManager();
  late final SubtitleHandler _subtitleHandler;
  late final ZegoSubtitlesViewModel _subtitleModel;

  String? _sessionId;
  String? _userId;
  String? _agentInstanceId;
  bool _isInSession = false;
  bool _isLoading = false;

  @override
  void initState() {
    super.initState();
    _subtitleModel = ZegoSubtitlesViewModel();
    _subtitleHandler = SubtitleHandler(_subtitleModel);
    _initializeEngine();
  }

  Future<void> _initializeEngine() async {
    await _engineManager.initialize();
    _engineManager.onRoomStateChanged = _onRoomStateChanged;
    ZegoExpressEngine.onRecvExperimentalAPI = _subtitleHandler.onExperimentalAPI;
  }

  void _onRoomStateChanged(String roomId, ZegoRoomStateChangedReason reason, int errorCode) {
    if (reason == ZegoRoomStateChangedReason.Logined && !_isInSession) {
      setState(() => _isInSession = true);
    } else if (reason == ZegoRoomStateChangedReason.Logout ||
               reason == ZegoRoomStateChangedReason.KickOut) {
      setState(() => _isInSession = false);
    }
  }

  Future<void> _startSession() async {
    if (_isLoading) return;

    setState(() { _isLoading = true; });

    try {
      _sessionId = AppConfig.generateSessionId();
      _userId = AppConfig.generateUserId(_sessionId!);
      final userStreamId = AppConfig.getUserStreamId(_sessionId!);

      // Authenticate
      final token = await ApiService.authenticate(_userId!);
      if (token == null) { _showError('Authentication failed'); return; }

      // Join room
      final loginCode = await _engineManager.joinRoom(_sessionId!, _userId!, token);
      if (loginCode != 0) { _showError('Room join failed: $loginCode'); return; }

      // Publish audio
      await _engineManager.publishAudio(userStreamId);

      // Create agent
      final agentStreamId = await ApiService.createAgent(
        roomId: _sessionId!,
        userId: _userId!,
        userStreamId: userStreamId,
      );
      if (agentStreamId == null) { _showError('Agent creation failed'); return; }

      _agentInstanceId = agentStreamId;
      await _engineManager.playAudio(agentStreamId);

      setState(() { _isInSession = true; });
    } catch (e) {
      _showError('Error: ${e.toString()}');
    } finally {
      setState(() { _isLoading = false; });
    }
  }

  Future<void> _endSession() async {
    if (_isLoading) return;

    setState(() { _isLoading = true; });

    try {
      if (_agentInstanceId != null) {
        await ApiService.destroyAgent(_agentInstanceId!);
      }
      if (_sessionId != null) {
        await _engineManager.leaveRoom(_sessionId!);
      }

      _sessionId = null;
      _userId = null;
      _agentInstanceId = null;
      setState(() { _isInSession = false; });
    } catch (e) {
      _showError('Error: ${e.toString()}');
    } finally {
      setState(() { _isLoading = false; });
    }
  }

  void _showError(String message) {
    if (!mounted) return;
    ScaffoldMessenger.of(context).showSnackBar(
      SnackBar(content: Text(message), backgroundColor: Colors.red),
    );
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: SafeArea(
        child: Column(
          children: [
            // Control panel
            _buildControlPanel(),
            const Divider(height: 1),
            // Subtitle display
            Expanded(child: ZegoSubtitlesView(model: _subtitleModel)),
          ],
        ),
      ),
    );
  }

  Widget _buildControlPanel() {
    return Container(
      padding: const EdgeInsets.symmetric(vertical: 40, horizontal: 20),
      color: Colors.grey[100],
      child: Column(
        children: [
          const Text('AI Voice Assistant',
              style: TextStyle(fontSize: 24, fontWeight: FontWeight.bold)),
          const SizedBox(height: 20),
          // Status indicator
          Row(
            mainAxisAlignment: MainAxisAlignment.center,
            children: [
              Container(
                width: 12, height: 12,
                decoration: BoxDecoration(
                  shape: BoxShape.circle,
                  color: _isInSession ? Colors.green : Colors.grey,
                ),
              ),
              const SizedBox(width: 8),
              Text(_isInSession ? 'Connected' : 'Disconnected',
                  style: TextStyle(
                    fontSize: 16,
                    color: _isInSession ? Colors.green : Colors.grey,
                  )),
            ],
          ),
          const SizedBox(height: 20),
          // Call button
          SizedBox(
            width: 120, height: 120,
            child: ElevatedButton(
              onPressed: _isLoading ? null : (_isInSession ? _endSession : _startSession),
              style: ElevatedButton.styleFrom(
                shape: const CircleBorder(),
                backgroundColor: _isInSession ? Colors.red : Colors.green,
                foregroundColor: Colors.white,
              ),
              child: _isLoading
                  ? const CircularProgressIndicator(color: Colors.white)
                  : Icon(_isInSession ? Icons.call_end : Icons.call, size: 48),
            ),
          ),
        ],
      ),
    );
  }

  @override
  void dispose() {
    _subtitleHandler.dispose();
    _engineManager.destroy();
    super.dispose();
  }
}
Enter fullscreen mode Exit fullscreen mode

Session flow summary:

Step Action Component
1 Generate session ID AppConfig
2 Request auth token ApiService.authenticate()
3 Join RTC room ZegoEngineManager.joinRoom()
4 Publish user audio ZegoEngineManager.publishAudio()
5 Create AI agent ApiService.createAgent()
6 Play AI audio ZegoEngineManager.playAudio()
7 End session destroyAgent() + leaveRoom()

Conclusion

This guide demonstrated how to build an AI voice assistant for Android using Flutter and ZEGOCLOUD's integrated platform. The architecture separates concerns between backend authentication, agent orchestration, and client-side media handling. Key implementation points include proper token management, RTC room coordination, and subtitle message parsing.

Developers can extend this foundation with custom wake words, multi-language support, or domain-specific conversation flows. The same architectural pattern applies across customer service, education, accessibility, and IoT control scenarios requiring natural voice interaction.

For production deployment, consider implementing connection retry logic, offline fallbacks, and comprehensive error handling based on your specific use case requirements. The Flutter approach additionally provides iOS deployment with minimal code changes.

Top comments (0)