This is a submission for the AssemblyAI Voice Agents Challenge
π‘ What I Built
A real-time, AI-powered academic listening coach designed to help you:
- Grasp any concept
- Articulate it in your own words
- Get real-time feedback from an AI mentor trained to respond like a domain-specific educator
Imagine a personalized Listening Toastmasters for academics, one that:
β
Listens while you speak
β
Transcribes in real-time
β
Analyzes your response
β
Gives constructive feedback
β
Grades your clarity, tone, and structure, not you π
Perfect for viva prep, thesis defense, placement interviews, just better grasping a concept or topic, or even explaining tough concepts out loud.
β¨ Why I Built It
I've always craved a mentor who could truly adapt to me -
One who listens without judgment.
Who cares how I speak, not just what I say.
Who waits when I pause. And helps me find the words when I blank out.
As a student juggling placements, exams,hackathons and life, I often find myself:
- Mumbling under pressure
- Rambling mid-response
- Or going completely blank during interviews
So, I built this for that version of me.
The nervous student. The silent developer.
The person who knows the answer, but just canβt say it clearly.
This is more than a tool.
Itβs a gentle, nerdy best friend in your laptop, reminding you:
βYouβve got this. Just speak, Iβll help you shape it.β
Oh, and if you're overusing filler words?
Itβll lovingly π roast you:
βBestie, you just said βummβ 27 times. Letβs fix that.β
π οΈ Features
1οΈβ£ ποΈ Mic On, Brain On
βLive voice input straight from your browser (no app install needed!)
2οΈβ£ βοΈ Real-Time Whispering
βInstant speech-to-text via β‘ AssemblyAIβs Streaming API
3οΈβ£ π Instant Report Card
Get scored out of 10 on key communication metrics:
- π£οΈ Fluency
- π§© Coherence
- π Redundancy
- π§ Technical Depth
- πͺ Confidence Markers
π Delivered in real-time β your growth, visualized.
4οΈβ£ π AI Educator Mode
βYour speech gets evaluated like you're explaining to a domain expert (Groq)
5οΈβ£ π Retry Until Itβs Right
βStumble? Speak again. Smarter each time. π
6οΈβ£ π― Focused Solo Practice
βA quiet dojo to train your mind-mouth connection π€π§ββοΈ
7οΈβ£ π§ͺ Built for the Serious Learners
βIdeal for:
ββ- 𧬠Viva / Thesis prep
ββ- π§βπ» Tech interviews
ββ- π Academic presentations
ββ- π€ Fluency drills
ββ- β¨ Better grasping any topic
8οΈβ£ π» Minimal UI, Max Results
βNo distractions. Just you, your thoughts, and your growth π₯
π¬ Demo
Here's my live project:- Grasp Articulate Refine
β οΈ Works best in Chrome. Firefox sulks. Brave is brave. Safari is... shy.
You can check me showcasing my project here:-
GitHub Repository
ππ
π§ β¨π Grasp Articulate Refine
Your smart study coach, powered by AI - designed to help you truly understand what you learn, speak it with confidence, and get thoughtful feedback so you grow smarter, faster.
My project at a glimpse:-

Check it out here live:- Grasp Articulate Refine
β¨ Features
- Adaptive Content Generation: Creates 2000-3000 word educational content tailored to your academic level
- Voice-Based Assessment: Uses Assembly AI for speech-to-text transcription
- AI-Powered Analysis: Acts as a globally renowned educator providing detailed feedback
- Intelligent Grading: Grades responses out of 10 with detailed explanations
- Progress Tracking: Students must score 9+ to advance to next topics
- Celebration System: 3-second emoji overlay for excellent performance (π₯³ππ)
- Mobile Responsive: Darker blue theme with high contrast design
- Real References: Provides working, relevant reference links for the explanation provided
- Multiple Academic Levels: High School, Undergraduate, Graduate, Professional
- Custom Subjectβ¦
You can check out my repo above if you are more of a code person, or want to analyse my code π€, get inspiration, fork it, clone it, and work on it on your device locally.
Technical Implementation & AssemblyAI Integration
Here are the code snippets demonstrating the technical implementation and AssemblyAI integration in this project:-
π― 1. AssemblyAI Initialization & Configuration
python
# utils/voice_manager.py - AssemblyAI Setup
import assemblyai as aai
class VoiceManager:
def __init__(self, api_keys: Dict[str, str]):
self.api_keys = api_keys
self.assemblyai_available = False
self._init_assemblyai()
def _init_assemblyai(self):
if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'):
try:
aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY']
test_config = aai.TranscriptionConfig(
language_detection=True,
punctuate=True,
format_text=True,
speaker_labels=False,
auto_highlights=False
)
self.assemblyai_available = True
print("β
AssemblyAI initialized successfully")
except Exception as e:
print(f"β AssemblyAI initialization failed: {e}")
self.assemblyai_available = False
Sets up AssemblyAI SDK with API key and configures transcription settings including language detection, punctuation, and text
formatting. Initializes the VoiceManager class with enhanced features optimized for educational content transcription.
π€ 2. Core Audio Transcription Implementation
python
def transcribe_audio(self, audio_file_path: str) -> str:
if not os.path.exists(audio_file_path):
return "β Audio file not found"
# Primary Method: AssemblyAI SDK
if self.assemblyai_available:
try:
print("π Trying AssemblyAI SDK...")
config = aai.TranscriptionConfig(
language_detection=True,
punctuate=True,
format_text=True,
speaker_labels=False,
auto_highlights=False )
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe(audio_file_path)
if transcript.status == "completed":
print("β
AssemblyAI SDK transcription successful")
return self._clean_transcription(transcript.text)
elif transcript.status == "error":
print(f"β AssemblyAI SDK error: {transcript.error}")
return f"β Transcription error: {transcript.error}"
except Exception as e:
print(f"β AssemblyAI SDK error: {e}")
# Fallback Method: Direct API
if self.api_keys.get('ASSEMBLYAI_API_KEY'):
try:
print("π Trying AssemblyAI Direct API...")
result = self._transcribe_with_api(audio_file_path)
if result and not result.startswith("β"):
print("β
AssemblyAI API transcription successful")
return self._clean_transcription(result)
except Exception as e:
print(f"β AssemblyAI API error: {e}")
return "β Transcription failed. Please check API configuration."
Main transcription function using dual-mode approach: primary AssemblyAI SDK method with fallback to direct API. Handles audio file
validation, processes transcription with enhanced configuration, and includes comprehensive error handling for reliable speech-to-
text conversion.
π§ 3. Direct API Implementation with Enhanced Features
python
# utils/voice_manager.py - Direct API Implementation
def _transcribe_with_api(self, audio_file_path: str) -> str:
"""
Direct AssemblyAI API implementation with robust error handling
"""
try:
headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']}
print("π€ Uploading audio file...")
with open(audio_file_path, 'rb') as f:
response = requests.post(
'https://api.assemblyai.com/v2/upload',
headers=headers,
files={'file': f},
timeout=60
)
if response.status_code != 200:
return f"β Upload failed: {response.status_code} - {response.text}"
upload_url = response.json()['upload_url']
print(f"β
File uploaded: {upload_url}")
print("π Requesting transcription...")
data = {
'audio_url': upload_url,
'language_detection': True,
'punctuate': True,
'format_text': True,
'speaker_labels': False,
'auto_highlights': False
}
response = requests.post(
'https://api.assemblyai.com/v2/transcript',
headers=headers,
json=data,
timeout=30
)
if response.status_code != 200:
return f"β Transcription request failed: {response.status_code}"
transcript_id = response.json()['id']
print(f"π Transcription ID: {transcript_id}")
print("β³ Waiting for transcription to complete...")
max_attempts = 60 # 2-minute timeout
attempt = 0
while attempt < max_attempts:
response = requests.get(
f'https://api.assemblyai.com/v2/transcript/{transcript_id}',
headers=headers,
timeout=30
)
if response.status_code != 200:
return f"β Status check failed: {response.status_code}"
result = response.json()
status = result['status']
if status == 'completed':
print("β
Transcription completed")
return result['text'] or "β No text in transcription result"
elif status == 'error':
error_msg = result.get('error', 'Unknown error')
return f"β Transcription error: {error_msg}"
elif status in ['queued', 'processing']:
print(f"β³ Status: {status} (attempt {attempt + 1}/{max_attempts})")
import time
time.sleep(2) # 2-second polling interval
attempt += 1
else:
return f"β Unknown status: {status}"
return "β Transcription timeout - took too long to process"
except requests.exceptions.Timeout:
return "β Request timeout - please try again"
except Exception as e:
return f"β Unexpected error: {str(e)}"
Implements direct AssemblyAI API calls as fallback method. Handles file upload, transcription request with enhanced features, and
intelligent polling with 2-minute timeout. Provides robust error handling for network issues and API failures.
π§Ή 4. Advanced Text Processing & Cleaning
python
# utils/voice_manager.py - Text Processing
def _clean_transcription(self, text: str) -> str:
if not text:
return "β Empty transcription result"
text = text.strip()
text = re.sub(r'\s+', ' ', text)
text = re.sub(r'([.!?])\s*([a-z])',
lambda m: m.group(1) + ' ' + m.group(2).upper(), text)
if text and not text[0].isupper():
text = text[0].upper() + text[1:]
if text and text[-1] not in '.!?':
text += '.'
return text
def validate_audio_file(self, file_path: str) -> Dict[str, any]:
if not os.path.exists(file_path):
return {
'valid': False,
'error': 'File does not exist',
'file_size': 0
}
file_size = os.path.getsize(file_path)
max_size = 100 * 1024 * 1024 # 100MB limit
if file_size > max_size:
return {
'valid': False,
'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)',
'file_size': file_size
}
if file_size < 1000: # Minimum 1KB
return {
'valid': False,
'error': 'File too small - may be empty or corrupted',
'file_size': file_size
}
return {
'valid': True,
'error': None,
'file_size': file_size,
'file_size_mb': file_size / (1024 * 1024)
}
Post-processes transcription results with text cleaning, whitespace normalization, sentence capitalization fixes, and proper
punctuation. Includes audio file validation checking size limits and file integrity for optimal transcription quality.
π 5. Flask Integration & API Endpoints
python
# app.py - Flask Integration
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
try:
audio_file = request.files.get('audio')
if not audio_file:
return jsonify({'error': 'No audio file provided'}), 400
session_id = session.get('session_id', 'unknown')
temp_path = f"temp/audio_{session_id}.wav"
os.makedirs('temp', exist_ok=True)
audio_file.save(temp_path)
print(f"π Starting transcription of {temp_path}")
validation = voice_manager.validate_audio_file(temp_path)
if not validation['valid']:
if os.path.exists(temp_path):
os.remove(temp_path)
return jsonify({'error': validation['error']}), 400
transcription = voice_manager.transcribe_audio(temp_path)
if os.path.exists(temp_path):
os.remove(temp_path)
print(f"β
Transcription result: {transcription[:100]}...")
return jsonify({
'success': True,
'transcription': transcription,
'file_size_mb': validation.get('file_size_mb', 0)
})
except Exception as e:
print(f"β Transcription error: {e}")
return jsonify({'error': f'Transcription failed: {str(e)}'}), 500
@app.route('/voice_status')
def voice_status():
return jsonify(voice_manager.get_voice_status())
Flask endpoints for audio transcription with session-based temporary file handling. Includes comprehensive error handling, file
validation, and cleanup. Provides voice status endpoint for real-time feature availability monitoring and diagnostics.
π 6. Status Monitoring & Diagnostics
python
# utils/voice_manager.py - Status Monitoring
def get_voice_status(self) -> Dict[str, bool]:
return {
'assemblyai_available': self.assemblyai_available,
'voice_recording_available': self.assemblyai_available,
'transcription_available': self.assemblyai_available,
'api_key_configured': bool(self.api_keys.get('ASSEMBLYAI_API_KEY')),
'sdk_available': ASSEMBLYAI_AVAILABLE,
'direct_api_available': bool(self.api_keys.get('ASSEMBLYAI_API_KEY'))
}
def _print_status(self):
print("\nπ€ VOICE FEATURES STATUS:")
print(f" AssemblyAI Available: {'β
' if self.assemblyai_available else 'β'}")
print(f" Voice Recording: {'β
' if self.assemblyai_available else 'β'}")
print(f" Audio Transcription: {'β
' if self.assemblyai_available else 'β'}")
print(f" API Key Configured: {'β
' if self.api_keys.get('ASSEMBLYAI_API_KEY') else 'β'}")
print()
Comprehensive system for monitoring AssemblyAI feature availability including SDK status, API key configuration, and transcription
capabilities. Provides detailed status reporting for troubleshooting and system health monitoring.
π 7. HTTPS Configuration for Microphone Access
python
def create_self_signed_cert():
try:
from cryptography import x509
from cryptography.x509.oid import NameOID
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization
import datetime
import ipaddress
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048,
)
subject = issuer = x509.Name([
x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "Local"),
x509.NameAttribute(NameOID.LOCALITY_NAME, "Local"),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, "AI Learning Platform"),
x509.NameAttribute(NameOID.COMMON_NAME, "localhost"),
])
cert = x509.CertificateBuilder().subject_name(
subject
).issuer_name(
issuer
).public_key(
private_key.public_key()
).serial_number(
x509.random_serial_number()
).not_valid_before(
datetime.datetime.utcnow()
).not_valid_after(
datetime.datetime.utcnow() + datetime.timedelta(days=365)
).add_extension(
x509.SubjectAlternativeName([
x509.DNSName("localhost"),
x509.DNSName("127.0.0.1"),
x509.IPAddress(ipaddress.IPv4Address("127.0.0.1")),
]),
).sign(private_key, hashes.SHA256()) certificate and key
with open("cert.pem", "wb") as f:
f.write(cert.public_bytes(serialization.Encoding.PEM))
with open("key.pem", "wb") as f:.private_bytes(
encoding=serialization.Encoding.PEM,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption()
))
print("β
Self-signed certificate created")
return True
except Exception as e:
print(f"β Failed to create certificate: {e}")
return False
Creates self-signed SSL certificates required for browser microphone access. Generates cryptographic certificates for localhost with
proper domain configuration, enabling secure audio recording in web browsers for the educational platform.
Tech Stack Used
Backend
- Python + Flask - For handling sessions and inference
- AssemblyAI - Real-time transcription (Streaming API)
- Groq (LLaMA3-8B) - For instant feedback
Frontend
- JavaScript - Audio streaming + Web Audio API
- HTML/CSS - Minimal, responsive, focused on clarity
π Final Thoughts
This wasnβt just a submission.
This was a love letter ππ to every shy, nerdy student who ever wished their thoughts could come out clearer.
Itβs funny.
We spend years learning things, but no one ever bothered teaching us how to say them well.
This project is my way of fixing that - with code, care, and a mic.
Would I build more on top of this? Absolutely.
Would I cry if I win? Probably.
Would I still keep improving it if I lose? Without a question. π₯Ή
π«Ά Thank You for Listening (Literally)
To the judges, mentors, and every dev reading this β
Letβs speak better. Letβs build louder.
And maybeβ¦ letβs stutter a little less along the way.
π€π
Divya Singh
Thank you for reading till the end
Top comments (21)
Nice!
You don't miss any dev Challenge, do you! π
All the best π₯³
Not 100% of them, i just create multiple submissions for a single challenge mostly π
Thank you π₯Ή
That's a great strategy!
I barely ever get time to submit one project, and that too only on some challenges. π
You may like my last submission though. π
It seems useful, but is the github repo completely updated?
I'll add some more instructions and polishing, but it already does what is advertised on the post. YES!
I will check it out then π
I hope you win this one!
I hope so as well π
Thank you π
this is really cool π₯
Thank you for checking it out ππ
Nice work! Is there a way to run the above app/demonstrate the functionality in a non-metered environment?
Non- metered as in without the api?
Smart minds always get the right meaning, yes but without the api-key to be precise or with some public demo key....
The project's main feature is listening, understanding and then analysis of it, and feedback for the learner.
It needs these 2 apis , or any 2 ig, for the audio part and the analysis + feedback part.
you are a rockstar, completed 3 challenges and all are awesome.
Great work, liked all of your three challenges. β€οΈ
Wish you all the very best for these challengesβ¨β¨β¨
Thank you Mr Ninja π
Not that awesome, plus it was a last minute rush, but yup, ultimately submitted it all before the deadline.
amazingπ₯
Glad you liked it π
thanks
Some comments may only be visible to logged-in visitors. Sign in to view all comments.