Jigin Vp

Posted on Jul 27, 2025

Empathy AI-Your AI Help.

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

EmpathyAI is a real-time voice-powered mental health support application that provides compassionate AI-driven conversations for individuals experiencing emotional distress. The system processes spoken input through advanced speech recognition, analyzes emotional content using AI, and responds with empathetic voice-based support.

Demo

GitHub Repository

React frontend app
https://github.com/vpjigin/EmpathyAIReact.git
Spring-boot backend
https://github.com/vpjigin/EmpathyAISpringBoot.git

AssemblyAI Universal-Streaming Technology

This application demonstrates advanced real-time audio processing powered by AssemblyAI’s Universal-Streaming API. The system enables low-latency, turn-based, and secure transcription, enabling emotionally intelligent AI conversations.

Core Architecture

The architecture follows a multi-layered streaming pipeline:
Client Audio → WebSocket Handler → AssemblyAI Streaming → AI Processing → Response

AssemblyAI Streaming Implementation

Real-time WebSocket Connection The backend creates a persistent WebSocket connection to AssemblyAI’s streaming endpoint:

private static final String ASSEMBLYAI_STREAMING_URL = "wss://streaming.assemblyai.com/v3/ws";

public CompletableFuture<StreamingSession> createStreamingSession(String sessionId, TranscriptCallback callback) {
    String connectionUrl = ASSEMBLYAI_STREAMING_URL + "?sample_rate=16000&format_turns=true";

    Map<String, String> headers = new HashMap<>();
    headers.put("Authorization", apiKey);

    WebSocketClient client = new WebSocketClient(serverUri, headers) {
        @Override
        public void onMessage(String message) {
            JsonNode jsonMessage = objectMapper.readTree(message);
            if ("Turn".equals(messageType)) {
                String transcript = jsonMessage.get("transcript").asText();
                boolean isFormatted = jsonMessage.get("turn_is_formatted").asBoolean();
                if (isFormatted) {
                    callback.onTranscript(transcript, true);
                }
            }
        }
    };
}

Audio Streaming Handler The AudioStreamingWebSocketHandler component bridges client-side audio to the AssemblyAI session:

@Component
public class AudioStreamingWebSocketHandler implements WebSocketHandler {

    @Autowired
    private AssemblyAIStreamingServiceV2 assemblyAIStreamingService;

    private void handleBinaryMessage(WebSocketSession session, BinaryMessage message) {
        StreamingSessionV2 assemblySession = assemblyAISessions.get(session.getId());
        if (assemblySession != null) {
            ByteBuffer audioData = message.getPayload();
            byte[] audioBytes = new byte[audioData.remaining()];
            audioData.get(audioBytes);
            assemblySession.sendAudioData(audioBytes);
        }
    }

    private void startStreaming(WebSocketSession session, String conversationUuid) {
        assemblyAIStreamingService.createStreamingSession(session.getId(), new TranscriptCallback() {
            @Override
            public void onTranscript(String text, boolean isFinal) {
                if (isFinal) {
                    handleFinalTranscript(session, conversation, text);
                }
            }
        });
    }
}

Advanced Features Utilized
Turn-based Transcription: format_turns=true for human-like flow
16kHz Audio: sample_rate=16000 ensures clarity
TLS/SSL Security: Secured with valid certs
Concurrent Streaming: Multiple session support
Message Type Handling: Supports "Begin", "Turn", and "Termination" types
Dual Implementation Strategy
I implemented two parallel streaming strategies:

AssemblyAIStreamingService: Uses Java-WebSocket for low-level WebSocket handling
AssemblyAIStreamingServiceV2: Uses Spring’s StandardWebSocketClient for seamless Spring Boot integration

// Spring-based implementation
public CompletableFuture<StreamingSessionV2> createStreamingSession(String sessionId, TranscriptCallback callback) {
    StandardWebSocketClient client = new StandardWebSocketClient();
    WebSocketHttpHeaders headers = new WebSocketHttpHeaders();
    headers.add("Authorization", apiKey);

    WebSocketHandler handler = new WebSocketHandler() {
        @Override
        public void handleMessage(WebSocketSession session, WebSocketMessage<?> message) {
            // Handle messages using Spring WebSocket framework
        }
    };

    client.doHandshake(handler, headers, serverUri).get();
}

Technical Capabilities Leveraged

1.Real-time Binary Audio Streaming
2.Low-latency (<1s) Transcription
3.Turn-based Conversation Context
4.Error Recovery & Retry Mechanism
5.Scalable Concurrent Sessions

Project Structure (Brief)

├── controller/ ├── service/ ├── websocket/ ├── model/ ├── config/

DEV Community