This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
I built a Voice Appointment Scheduler - a business automation voice agent that streamlines appointment booking through natural voice commands. This addresses the Business Automation Voice Agent prompt by automating a core business process that companies use daily.
The agent handles real-world scenarios like:
- "Schedule appointment with Dr. Nidal tomorrow at 3 PM"
- "Book meeting with Lubaba Radwan next Monday at 2 o'clock"
- "List my appointments"
- "Cancel my appointment"
Perfect for medical offices, service businesses, sales teams, and support centers who need efficient appointment management without manual data entry.
Demo
๐ Live Demo: https://lubabazwadi2.github.io/VoiceChallenge/
Key Features in Action:
- Ultra-responsive voice recognition with AssemblyAI's 300ms latency
- Intelligent appointment parsing from natural speech
- Real-time visual feedback and voice confirmations
- Professional business terminology recognition (Dr., Eng., appointment times, etc.)
GitHub Repository
Voice Appointment Scheduler - AssemblyAI Challenge
A simple but functional voice agent for scheduling business appointments using AssemblyAI's Universal-Streaming technology.
๐ฏ Challenge Category
Business Automation Voice Agent - Automates appointment scheduling for businesses with voice commands.
โจ Features
- Real-time voice recognition using browser Speech API + AssemblyAI integration
- Natural language processing for appointment extraction
- Voice feedback with text-to-speech responses
- Appointment management (schedule, list, cancel)
- Business terminology recognition (Dr., appointment times, etc.)
- Ultra-low latency design for responsive interactions
๐ How It Works
- User clicks microphone button to start voice input
- AssemblyAI Universal-Streaming processes audio in real-time (300ms latency)
- Voice commands are parsed for appointment details (who, when)
- System schedules appointment and provides voice confirmation
- All appointments are displayed in real-time
๐ผ Business Use Cases
- Medical offices: Schedule patient appointments
- Service businesses: Book consultations and services
- Sales teams: Schedule follow-up calls
- Support centers: Book callback appointments
๐ Setup
โฆTechnical Implementation & AssemblyAI Integration
AssemblyAI Universal-Streaming Integration
The core of this voice agent leverages AssemblyAI's Universal-Streaming API for ultra-low latency transcription:
class AssemblyAIStreaming {
constructor(apiKey) {
this.apiKey = apiKey;
this.socket = null;
}
async startStreaming() {
// Connect to AssemblyAI's Universal-Streaming WebSocket
const tokenResponse = await fetch('https://api.assemblyai.com/v2/realtime/token', {
method: 'POST',
headers: {
'authorization': this.apiKey,
'content-type': 'application/json'
},
body: JSON.stringify({ expires_in: 3600 })
});
const { token } = await tokenResponse.json();
// WebSocket connection for real-time streaming
this.socket = new WebSocket(`wss://api.assemblyai.com/v2/realtime/ws?sample_rate=16000&token=${token}`);
this.socket.onmessage = (message) => {
const res = JSON.parse(message.data);
if (res.message_type === 'FinalTranscript') {
this.processTranscript(res.text);
}
};
}
}
Real-Time Voice Processing Pipeline
- Audio Capture: Browser MediaRecorder captures user voice
- Streaming: Audio chunks sent to AssemblyAI Universal-Streaming
- Transcription: 300ms latency transcription with intelligent endpointing
- NLP Processing: Custom appointment entity extraction
- Business Logic: Appointment validation and scheduling
- Voice Feedback: Text-to-speech confirmation
Intelligent Appointment Parsing
function extractAppointmentInfo(transcript) {
// Leverages AssemblyAI's accuracy with business terminology
const words = transcript.split(' ');
let appointmentData = {
name: null,
time: null,
type: 'Business Meeting'
};
// Extract names (Dr., Mr., Ms., business contacts)
const nameIndicators = ['with', 'dr', 'doctor', 'mr', 'mrs', 'ms'];
for (let i = 0; i < words.length - 1; i++) {
if (nameIndicators.includes(words[i].toLowerCase())) {
appointmentData.name = extractBusinessName(words, i);
break;
}
}
// Extract time with business hour context
appointmentData.time = extractBusinessTime(words);
return appointmentData;
}
AssemblyAI Features Utilized
- Ultra-Low Latency: 300ms response time critical for natural conversation flow
- Intelligent Endpointing: Knows when user finished speaking vs. pausing
- Business Terminology Recognition: Handles proper nouns, titles (Dr., CEO), company names
- Multi-step Workflow Support: Maintains context across appointment booking steps
- Professional Audio Quality: Works in office environments with background noise
Performance Optimizations
// Continuous streaming for seamless experience
recognition.continuous = true;
recognition.interimResults = true;
// Real-time UI updates without blocking
function updateAppointmentUI(appointment) {
requestAnimationFrame(() => {
renderAppointment(appointment);
speak(`Scheduled ${appointment.name} for ${appointment.time}`);
});
}
Business Integration Ready
The architecture supports real-world deployment needs:
- Calendar API Integration: Ready for Google Calendar, Outlook connections
- CRM Integration: Structured data format for Salesforce, HubSpot
- Database Persistence: JSON format ready for any database
- Multi-tenant Support: Easily extendable for multiple businesses
Why AssemblyAI Universal-Streaming?
This project showcases AssemblyAI's strengths in business automation:
- Speed: 300ms latency enables natural conversation flow
- Accuracy: Critical for capturing proper nouns and business terminology
- Reliability: Intelligent endpointing prevents missed commands
- Professional Grade: Handles real business communication patterns
The combination creates a voice agent that feels responsive and professional - essential for business environments where every appointment matters.
Developer's Journey & Honest Reflections
Full transparency: I discovered this competition just a few hours before the deadline! As someone who just joined the DEV community after hearing about this challenge, I was excited to try something completely new.
With the time constraint, I focused on:
- โ Choosing a solid idea that solves real business problems
- โ Building a functional application that demonstrates the concept
- โฐ Getting something working rather than perfecting every detail
Current Limitations & Learning Experience
Voice Recognition Accuracy: The current implementation sometimes requires multiple attempts to detect commands properly. This is partly due to:
- Limited time to fully explore AssemblyAI's advanced features
- Using browser Speech API as fallback for demo purposes
- Not having enough time to fine-tune the natural language processing
What I'd Improve With More Time:
- Deeper integration with AssemblyAI's Universal-Streaming WebSocket API
- Better command parsing and context understanding
- More robust error handling and user feedback
- Enhanced business terminology recognition
Why I Still Submitted
Even with these limitations, this project demonstrates:
- Real problem solving: Appointment scheduling is a genuine business need
- Technical foundation: Architecture ready for AssemblyAI integration
- Functional prototype: Actually works for basic appointment booking
- Growth mindset: Learning new technology under pressure
Sometimes the best learning happens when you jump in with both feet! This challenge pushed me to explore voice AI, join an amazing developer community, and build something functional in record time.
Built with โค๏ธ and staying up late for the AssemblyAI Voice Agents Challenge. A testament to what's possible when you discover something cool just hours before deadline! ๐
Special thanks to the DEV community for being so welcoming to newcomers like me.
Top comments (0)