This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
Supportly is a plug-and-play real-time voice & video support module that developers can integrate into any web application. It falls under the following challenge categories:
Business Automation – The voice agent records interactions between support agents and customers, saving them to a database. After each session, it generates a summary of the conversation, which is automatically sent to the customer emails address.
Real-Time Performance – provides live transcription during support calls.
The project empowers support teams to offer on-demand human assistance while using AssemblyAI’s streaming to:
Transcribe conversations live.
Demo
https://supportly-zzsu.onrender.com
GitHub Repository
https://github.com/GoldenThrust/Supportly
Supportly - Video Support Call Scheduling Platform
A modern video call customer support application built with React Router v7, TypeScript, and Tailwind CSS. This platform allows customers to easily schedule video calls with support teams to resolve issues and get product assistance.
🚀 Features
Customer Features
- Easy Session Booking: Schedule video support sessions with a simple form
- Real-time Video Calls: High-quality video calls with screen sharing capabilities
- Session Management: View upcoming and completed sessions
- Profile Management: Update personal information and preferences
- Session History: Track all past sessions with ratings and feedback
Admin/Support Team Features
- Admin Dashboard: Comprehensive overview of all support sessions
- Team Management: Manage support team members and their availability
- Schedule Management: Set available time slots and manage bookings
- Session Analytics: Track performance metrics and customer satisfaction
Technical Features
- 🎥 Video Call Integration: Browser-based video calls (no additional software…
Technical Implementation & AssemblyAI Integration
The Supportly application uses AssemblyAI's streaming transcription service to provide real-time speech-to-text functionality during video support sessions. The integration involves:
- Audio Processing: Capturing audio from user's microphone using Web Audio API
- Real-time Streaming: Sending audio chunks to AssemblyAI via WebSocket
- Live Transcription: Receiving and displaying transcripts in real-time
- Multi-user Support: Managing separate transcription sessions for each user
Architecture Components
1. AssemblyAI Configuration (config/assembyai.js
)
The main configuration class that handles the AssemblyAI streaming connection:
class AssemblyAIConfig {
constructor() {
try {
this.client = new AssemblyAI({
apiKey: process.env.ASSEMBLYAI_API_KEY,
});
this.transcriber = null;
this.isConnected = false;
this.isConnecting = false;
} catch (error) {
console.error(error);
}
}
async run() {
try {
// Prevent multiple concurrent connection attempts
if (this.isConnecting || this.isConnected) {
console.log('Connection already in progress or established...');
return;
}
this.isConnecting = true;
this.transcriber = this.client.streaming.transcriber({
sampleRate: 16_000,
formatTurns: true
});
// Set up event handlers
this.transcriber.on("open", ({ id }) => {
console.log(`Session opened with ID: ${id}`);
this.isConnected = true;
this.isConnecting = false;
});
this.transcriber.on("error", (error) => {
console.error("Transcriber error:", error);
this.isConnected = false;
this.isConnecting = false;
});
await this.transcriber.connect();
console.log("Starting streaming...");
} catch (error) {
console.error('Error in run():', error);
this.isConnected = false;
this.isConnecting = false;
}
}
transcribe(callBack) {
this.transcriber.on("turn", (turn) => {
if (!turn.transcript) {
return;
}
callBack(turn.transcript);
});
}
}
2. WebSocket Manager (config/websocket.js
)
Manages the connection between clients and handles AssemblyAI instances for each user:
class WebSocketManager {
constructor() {
this.io = null;
this.userTranscribers = new Map(); // Store AssemblyAI instance per user
}
async connect(io) {
io.on("connection", async (socket) => {
// Create a new AssemblyAI instance for this user
const assemblyai = new AssemblyAIConfigClass();
this.userTranscribers.set(socket.id, assemblyai);
socket.on("start-transcription", async () => {
console.log(`Starting transcription for ${socket.user.email}`);
const assemblyai = this.userTranscribers.get(socket.id);
if (assemblyai) {
// Check if already running to prevent duplicate starts
if (assemblyai.isConnected || assemblyai.isConnecting) {
console.log('Transcription already running or starting...');
return;
}
try {
await assemblyai.run();
assemblyai.transcribe((transcript) => {
console.log(`Transcription for ${socket.user.email}:`, transcript);
// Emit transcription to all users in the session
socket.to(sessionId).emit("transcription", transcript);
});
console.log('Transcription started successfully');
} catch (error) {
console.error('Error starting transcription:', error);
}
}
});
socket.on('audio-chunk', async (audioBlob) => {
const assemblyai = this.userTranscribers.get(socket.id);
if (assemblyai) {
try {
assemblyai.transcriber.sendAudio(Buffer.from(audioBlob));
} catch (error) {
console.error('Error processing audio chunk:', error);
}
}
});
socket.on("disconnect", async () => {
// Clean up transcription when user disconnects
const assemblyai = this.userTranscribers.get(socket.id);
if (assemblyai) {
await assemblyai.safeClose();
this.userTranscribers.delete(socket.id);
}
});
});
}
}
3. Audio Processing (public/audio-processor.js
)
Web Audio API worklet for processing audio in real-time:
const MAX_16BIT_INT = 32767
class AudioProcessor extends AudioWorkletProcessor {
process(inputs) {
try {
const input = inputs[0]
if (!input) throw new Error('No input')
const channelData = input[0]
if (!channelData) throw new Error('No channelData')
// Convert Float32 audio data to Int16 for AssemblyAI
const float32Array = Float32Array.from(channelData)
const int16Array = Int16Array.from(
float32Array.map((n) => n * MAX_16BIT_INT)
)
const buffer = int16Array.buffer
// Send processed audio to main thread
this.port.postMessage({ audio_data: buffer })
return true
} catch (error) {
console.error(error)
return false
}
}
}
registerProcessor('audio-processor', AudioProcessor)
4. Frontend Integration (app/routes/video-call.$sessionId.tsx
)
The React component that handles the UI and audio processing:
export default function VideoCall() {
const audioWorkletNodeRef = useRef<AudioWorkletNode | null>(null);
const audioBufferQueueRef = useRef<Int16Array>(new Int16Array(0));
const [transcripts, setTranscripts] = useState<Array<{
id: number;
text: string;
timestamp: Date;
speaker: string;
}>>([]);
const [currentTranscript, setCurrentTranscript] = useState("");
// Setup audio processor for real-time transcription
const setupAudioProcessor = async () => {
try {
if (!localStreamRef.current) return;
// Create audio context with 16kHz sample rate (required by AssemblyAI)
audioContextRef.current = new AudioContext({
sampleRate: 16000,
latencyHint: "balanced",
});
// Load audio processor worklet
await audioContextRef.current.audioWorklet.addModule(
"/audio-processor.js"
);
// Create audio worklet node
audioWorkletNodeRef.current = new AudioWorkletNode(
audioContextRef.current,
"audio-processor"
);
// Handle processed audio data
audioWorkletNodeRef.current.port.onmessage = (event) => {
const { audio_data } = event.data;
// Merge with previous buffer
const newBuffer = new Int16Array(audio_data);
audioBufferQueueRef.current = mergeBuffers(
audioBufferQueueRef.current,
newBuffer
);
// Send audio chunks when buffer reaches sufficient size
const CHUNK_SIZE = 1600; // 100ms at 16kHz
while (audioBufferQueueRef.current.length >= CHUNK_SIZE) {
const chunk = audioBufferQueueRef.current.slice(0, CHUNK_SIZE);
audioBufferQueueRef.current = audioBufferQueueRef.current.slice(CHUNK_SIZE);
// Send to server via WebSocket
socketRef.current?.emit('audio-chunk', chunk.buffer);
}
};
// Connect audio source to processor
const source = audioContextRef.current.createMediaStreamSource(
localStreamRef.current
);
source.connect(audioWorkletNodeRef.current);
audioWorkletNodeRef.current.connect(audioContextRef.current.destination);
// Start transcription
socketRef.current?.emit("start-transcription");
console.log("Audio processor setup completed");
} catch (error) {
console.error("Error setting up audio processor:", error);
}
};
// Handle incoming transcriptions
useEffect(() => {
if (socketRef.current) {
socketRef.current.on("transcription", (transcript: string) => {
console.log("Received transcription:", transcript);
// Update current live transcript
setCurrentTranscript(transcript);
// Add to transcript history if it's a complete sentence
if (transcript.trim().endsWith('.') ||
transcript.trim().endsWith('?') ||
transcript.trim().endsWith('!')) {
setTranscripts(prev => [...prev, {
id: Date.now(),
text: transcript,
timestamp: new Date(),
speaker: "Speaker" // Could be enhanced to identify speakers
}]);
setCurrentTranscript(""); // Clear current transcript
}
});
}
}, []);
function mergeBuffers(lhs: Int16Array, rhs: Int16Array) {
const merged = new Int16Array(lhs.length + rhs.length);
merged.set(lhs, 0);
merged.set(rhs, lhs.length);
return merged;
}
}
Data Flow
-
Audio Capture: User's microphone audio is captured via
getUserMedia()
- Audio Processing: Raw audio is processed through Web Audio API worklet
- Format Conversion: Float32 audio is converted to Int16 format at 16kHz sample rate
- Chunking: Audio is buffered and sent in chunks via WebSocket
- Server Processing: Node.js server receives audio chunks and forwards to AssemblyAI
- Transcription: AssemblyAI processes audio and returns transcripts
- Broadcasting: Transcripts are broadcast to all participants in the session
- UI Update: Frontend displays live and completed transcripts
Key Features
Real-time Transcription
- Live Updates: Transcripts appear as users speak
-
Turn-based: Uses AssemblyAI's
formatTurns: true
for better sentence structure - Low Latency: Optimized audio processing for minimal delay
Multi-user Support
- Isolated Sessions: Each user gets their own AssemblyAI transcriber instance
- Concurrent Processing: Multiple users can speak simultaneously
- Session Management: Proper cleanup when users disconnect
Audio Optimization
- 16kHz Sample Rate: Optimized for speech recognition
- Chunk-based Processing: Efficient real-time streaming
- Buffer Management: Prevents audio loss during processing
Configuration
Environment Variables
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
AssemblyAI Settings
this.transcriber = this.client.streaming.transcriber({
sampleRate: 16_000, // 16kHz for optimal speech recognition
formatTurns: true // Better sentence formatting
});
Error Handling
The integration includes comprehensive error handling:
- Connection Management: Prevents duplicate connections
- Graceful Cleanup: Proper resource disposal on disconnect
- Error Recovery: Automatic reconnection attempts
- State Tracking: Connection status monitoring
Usage in Video Calls
- Start Call: User joins video session
- Enable Transcription: Audio processor automatically starts
- Live Transcripts: Real-time transcripts appear in the UI
- Session History: Completed transcripts are stored during the session
- End Call: Resources are cleaned up when call ends
This integration provides a seamless real-time transcription experience that enhances accessibility and documentation for support sessions.
🔐 Tech Stack
Frontend: React + TailwindCSS
Video Calls: Socket.io and Simple Peer JS
Voice Streaming: AssemblyAI + Mic stream
Backend: Node.js + WebSocket + Mongoose
AI/NLP: AssemblyAI + Gemini
Top comments (1)
👏👏👏