DEV Community

Cover image for ๐ŸŽ“๐Ÿง  Grasp, Articulate & Refine: Your Real-Time Voice Coach for Smarter Recall & Academic Mastery ๐ŸŽค๐Ÿ“šโšก
Divya
Divya Subscriber

Posted on

๐ŸŽ“๐Ÿง  Grasp, Articulate & Refine: Your Real-Time Voice Coach for Smarter Recall & Academic Mastery ๐ŸŽค๐Ÿ“šโšก

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

๐Ÿ’ก What I Built

A real-time, AI-powered academic listening coach designed to help you:

  • Grasp any concept
  • Articulate it in your own words
  • Get real-time feedback from an AI mentor trained to respond like a domain-specific educator

project landing page

Imagine a personalized Listening Toastmasters for academics, one that:

โœ… Listens while you speak
โœ… Transcribes in real-time
โœ… Analyzes your response
โœ… Gives constructive feedback
โœ… Grades your clarity, tone, and structure, not you ๐Ÿ˜…

Perfect for viva prep, thesis defense, placement interviews, just better grasping a concept or topic, or even explaining tough concepts out loud.


โœจ Why I Built It

I've always craved a mentor who could truly adapt to me -
One who listens without judgment.
Who cares how I speak, not just what I say.
Who waits when I pause. And helps me find the words when I blank out.

As a student juggling placements, exams,hackathons and life, I often find myself:

  • Mumbling under pressure
  • Rambling mid-response
  • Or going completely blank during interviews

So, I built this for that version of me.
The nervous student. The silent developer.
The person who knows the answer, but just canโ€™t say it clearly.

This is more than a tool.
Itโ€™s a gentle, nerdy best friend in your laptop, reminding you:

โ€œYouโ€™ve got this. Just speak, Iโ€™ll help you shape it.โ€

Oh, and if you're overusing filler words?
Itโ€™ll lovingly ๐Ÿ’˜ roast you:

โ€œBestie, you just said โ€˜ummโ€™ 27 times. Letโ€™s fix that.โ€


๐Ÿ› ๏ธ Features

1๏ธโƒฃ ๐ŸŽ™๏ธ Mic On, Brain On

โ€ƒLive voice input straight from your browser (no app install needed!)

2๏ธโƒฃ โœ๏ธ Real-Time Whispering

โ€ƒInstant speech-to-text via โšก AssemblyAIโ€™s Streaming API

3๏ธโƒฃ ๐Ÿ“Š Instant Report Card

Get scored out of 10 on key communication metrics:

  • ๐Ÿ—ฃ๏ธ Fluency
  • ๐Ÿงฉ Coherence
  • ๐Ÿ” Redundancy
  • ๐Ÿง  Technical Depth
  • ๐Ÿ’ช Confidence Markers

๐Ÿ“ˆ Delivered in real-time โ†’ your growth, visualized.

4๏ธโƒฃ ๐ŸŽ“ AI Educator Mode

โ€ƒYour speech gets evaluated like you're explaining to a domain expert (Groq)

5๏ธโƒฃ ๐Ÿ” Retry Until Itโ€™s Right

โ€ƒStumble? Speak again. Smarter each time. ๐Ÿ”

6๏ธโƒฃ ๐ŸŽฏ Focused Solo Practice

โ€ƒA quiet dojo to train your mind-mouth connection ๐Ÿค๐Ÿง˜โ€โ™€๏ธ

7๏ธโƒฃ ๐Ÿงช Built for the Serious Learners

โ€ƒIdeal for:

โ€ƒโ€ƒ- ๐Ÿงฌ Viva / Thesis prep

โ€ƒโ€ƒ- ๐Ÿง‘โ€๐Ÿ’ป Tech interviews

โ€ƒโ€ƒ- ๐Ÿ“š Academic presentations

โ€ƒโ€ƒ- ๐ŸŽค Fluency drills

โ€ƒโ€ƒ- โœจ Better grasping any topic

8๏ธโƒฃ ๐Ÿ’ป Minimal UI, Max Results

โ€ƒNo distractions. Just you, your thoughts, and your growth ๐Ÿ’ฅ


๐ŸŽฌ Demo

Here's my live project:- Grasp Articulate Refine

โš ๏ธ Works best in Chrome. Firefox sulks. Brave is brave. Safari is... shy.

You can check me showcasing my project here:-


GitHub Repository

๐Ÿ‘‰๐Ÿ‘‡

๐Ÿง โœจ๐Ÿ“ˆ Grasp Articulate Refine

Your smart study coach, powered by AI - designed to help you truly understand what you learn, speak it with confidence, and get thoughtful feedback so you grow smarter, faster.

My project at a glimpse:-

screencapture-localhost-5000-2025-07-28-02_44_24

Check it out here live:- Grasp Articulate Refine


โœจ Features

  • Adaptive Content Generation: Creates 2000-3000 word educational content tailored to your academic level
  • Voice-Based Assessment: Uses Assembly AI for speech-to-text transcription
  • AI-Powered Analysis: Acts as a globally renowned educator providing detailed feedback
  • Intelligent Grading: Grades responses out of 10 with detailed explanations
  • Progress Tracking: Students must score 9+ to advance to next topics
  • Celebration System: 3-second emoji overlay for excellent performance (๐Ÿฅณ๐ŸŽ‰๐ŸŽŠ)
  • Mobile Responsive: Darker blue theme with high contrast design
  • Real References: Provides working, relevant reference links for the explanation provided
  • Multiple Academic Levels: High School, Undergraduate, Graduate, Professional
  • Custom Subjectโ€ฆ

You can check out my repo above if you are more of a code person, or want to analyse my code ๐Ÿค”, get inspiration, fork it, clone it, and work on it on your device locally.


Technical Implementation & AssemblyAI Integration

Here are the code snippets demonstrating the technical implementation and AssemblyAI integration in this project:-

๐ŸŽฏ 1. AssemblyAI Initialization & Configuration

python
# utils/voice_manager.py - AssemblyAI Setup
import assemblyai as aai

class VoiceManager:
    def __init__(self, api_keys: Dict[str, str]):
        self.api_keys = api_keys
        self.assemblyai_available = False
        self._init_assemblyai()

    def _init_assemblyai(self):
        if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'):
            try:
                aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY']

                test_config = aai.TranscriptionConfig(
                    language_detection=True,   
                    punctuate=True,           
                    format_text=True,          
                    speaker_labels=False,      
                    auto_highlights=False      
                )

                self.assemblyai_available = True
                print("โœ… AssemblyAI initialized successfully")

            except Exception as e:
                print(f"โŒ AssemblyAI initialization failed: {e}")
                self.assemblyai_available = False
Enter fullscreen mode Exit fullscreen mode

Sets up AssemblyAI SDK with API key and configures transcription settings including language detection, punctuation, and text
formatting. Initializes the VoiceManager class with enhanced features optimized for educational content transcription.

๐ŸŽค 2. Core Audio Transcription Implementation

python
def transcribe_audio(self, audio_file_path: str) -> str:

    if not os.path.exists(audio_file_path):
        return "โŒ Audio file not found"

    # Primary Method: AssemblyAI SDK
    if self.assemblyai_available:
        try:
            print("๐Ÿ”„ Trying AssemblyAI SDK...")

            config = aai.TranscriptionConfig(
                language_detection=True,    
                punctuate=True,            
                format_text=True,          
                speaker_labels=False,      
                auto_highlights=False                  )

            transcriber = aai.Transcriber(config=config)
            transcript = transcriber.transcribe(audio_file_path)

            if transcript.status == "completed":
                print("โœ… AssemblyAI SDK transcription successful")
                return self._clean_transcription(transcript.text)
            elif transcript.status == "error":
                print(f"โŒ AssemblyAI SDK error: {transcript.error}")
                return f"โŒ Transcription error: {transcript.error}"

        except Exception as e:
            print(f"โŒ AssemblyAI SDK error: {e}")

    # Fallback Method: Direct API
    if self.api_keys.get('ASSEMBLYAI_API_KEY'):
        try:
            print("๐Ÿ”„ Trying AssemblyAI Direct API...")
            result = self._transcribe_with_api(audio_file_path)
            if result and not result.startswith("โŒ"):
                print("โœ… AssemblyAI API transcription successful")
                return self._clean_transcription(result)
        except Exception as e:
            print(f"โŒ AssemblyAI API error: {e}")

    return "โŒ Transcription failed. Please check API configuration."
Enter fullscreen mode Exit fullscreen mode

Main transcription function using dual-mode approach: primary AssemblyAI SDK method with fallback to direct API. Handles audio file
validation, processes transcription with enhanced configuration, and includes comprehensive error handling for reliable speech-to-
text conversion.

๐Ÿ”ง 3. Direct API Implementation with Enhanced Features

python
# utils/voice_manager.py - Direct API Implementation
def _transcribe_with_api(self, audio_file_path: str) -> str:
    """
    Direct AssemblyAI API implementation with robust error handling
    """
    try:
        headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']}

        print("๐Ÿ“ค Uploading audio file...")
        with open(audio_file_path, 'rb') as f:
            response = requests.post(
                'https://api.assemblyai.com/v2/upload',
                headers=headers,
                files={'file': f},
                timeout=60
            )

        if response.status_code != 200:
            return f"โŒ Upload failed: {response.status_code} - {response.text}"

        upload_url = response.json()['upload_url']
        print(f"โœ… File uploaded: {upload_url}")

        print("๐Ÿ”„ Requesting transcription...")
        data = {
            'audio_url': upload_url,
            'language_detection': True,    
            'punctuate': True,             
            'format_text': True,           
            'speaker_labels': False,       
            'auto_highlights': False       
        }

        response = requests.post(
            'https://api.assemblyai.com/v2/transcript',
            headers=headers,
            json=data,
            timeout=30
        )

        if response.status_code != 200:
            return f"โŒ Transcription request failed: {response.status_code}"

        transcript_id = response.json()['id']
        print(f"๐Ÿ”„ Transcription ID: {transcript_id}")

        print("โณ Waiting for transcription to complete...")
        max_attempts = 60  # 2-minute timeout
        attempt = 0

        while attempt < max_attempts:
            response = requests.get(
                f'https://api.assemblyai.com/v2/transcript/{transcript_id}',
                headers=headers,
                timeout=30
            )

            if response.status_code != 200:
                return f"โŒ Status check failed: {response.status_code}"

            result = response.json()
            status = result['status']

            if status == 'completed':
                print("โœ… Transcription completed")
                return result['text'] or "โŒ No text in transcription result"
            elif status == 'error':
                error_msg = result.get('error', 'Unknown error')
                return f"โŒ Transcription error: {error_msg}"
            elif status in ['queued', 'processing']:
                print(f"โณ Status: {status} (attempt {attempt + 1}/{max_attempts})")
                import time
                time.sleep(2)  # 2-second polling interval
                attempt += 1
            else:
                return f"โŒ Unknown status: {status}"

        return "โŒ Transcription timeout - took too long to process"

    except requests.exceptions.Timeout:
        return "โŒ Request timeout - please try again"
    except Exception as e:
        return f"โŒ Unexpected error: {str(e)}"
Enter fullscreen mode Exit fullscreen mode

Implements direct AssemblyAI API calls as fallback method. Handles file upload, transcription request with enhanced features, and
intelligent polling with 2-minute timeout. Provides robust error handling for network issues and API failures.

๐Ÿงน 4. Advanced Text Processing & Cleaning

python
# utils/voice_manager.py - Text Processing
def _clean_transcription(self, text: str) -> str:
    if not text:
        return "โŒ Empty transcription result"

    text = text.strip()

    text = re.sub(r'\s+', ' ', text)

    text = re.sub(r'([.!?])\s*([a-z])',
                  lambda m: m.group(1) + ' ' + m.group(2).upper(), text)

    if text and not text[0].isupper():
        text = text[0].upper() + text[1:]

    if text and text[-1] not in '.!?':
        text += '.'

    return text

def validate_audio_file(self, file_path: str) -> Dict[str, any]:
    if not os.path.exists(file_path):
        return {
            'valid': False,
            'error': 'File does not exist',
            'file_size': 0
        }

    file_size = os.path.getsize(file_path)
    max_size = 100 * 1024 * 1024  # 100MB limit

    if file_size > max_size:
        return {
            'valid': False,
            'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)',
            'file_size': file_size
        }

    if file_size < 1000:  # Minimum 1KB
        return {
            'valid': False,
            'error': 'File too small - may be empty or corrupted',
            'file_size': file_size
        }

    return {
        'valid': True,
        'error': None,
        'file_size': file_size,
        'file_size_mb': file_size / (1024 * 1024)
    }
Enter fullscreen mode Exit fullscreen mode

Post-processes transcription results with text cleaning, whitespace normalization, sentence capitalization fixes, and proper
punctuation. Includes audio file validation checking size limits and file integrity for optimal transcription quality.

๐ŸŒ 5. Flask Integration & API Endpoints

python
# app.py - Flask Integration
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
    try:
        audio_file = request.files.get('audio')
        if not audio_file:
            return jsonify({'error': 'No audio file provided'}), 400

        session_id = session.get('session_id', 'unknown')
        temp_path = f"temp/audio_{session_id}.wav"
        os.makedirs('temp', exist_ok=True)
        audio_file.save(temp_path)

        print(f"๐Ÿ”„ Starting transcription of {temp_path}")

        validation = voice_manager.validate_audio_file(temp_path)
        if not validation['valid']:
            if os.path.exists(temp_path):
                os.remove(temp_path)
            return jsonify({'error': validation['error']}), 400

        transcription = voice_manager.transcribe_audio(temp_path)

        if os.path.exists(temp_path):
            os.remove(temp_path)

        print(f"โœ… Transcription result: {transcription[:100]}...")

        return jsonify({
            'success': True,
            'transcription': transcription,
            'file_size_mb': validation.get('file_size_mb', 0)
        })

    except Exception as e:
        print(f"โŒ Transcription error: {e}")
        return jsonify({'error': f'Transcription failed: {str(e)}'}), 500

@app.route('/voice_status')
def voice_status():
    return jsonify(voice_manager.get_voice_status())
Enter fullscreen mode Exit fullscreen mode

Flask endpoints for audio transcription with session-based temporary file handling. Includes comprehensive error handling, file
validation, and cleanup. Provides voice status endpoint for real-time feature availability monitoring and diagnostics.

๐Ÿ“Š 6. Status Monitoring & Diagnostics

python
# utils/voice_manager.py - Status Monitoring
def get_voice_status(self) -> Dict[str, bool]:
    return {
        'assemblyai_available': self.assemblyai_available,
        'voice_recording_available': self.assemblyai_available,
        'transcription_available': self.assemblyai_available,
        'api_key_configured': bool(self.api_keys.get('ASSEMBLYAI_API_KEY')),
        'sdk_available': ASSEMBLYAI_AVAILABLE,
        'direct_api_available': bool(self.api_keys.get('ASSEMBLYAI_API_KEY'))
    }

def _print_status(self):
    print("\n๐ŸŽค VOICE FEATURES STATUS:")
    print(f"   AssemblyAI Available: {'โœ…' if self.assemblyai_available else 'โŒ'}")
    print(f"   Voice Recording: {'โœ…' if self.assemblyai_available else 'โŒ'}")
    print(f"   Audio Transcription: {'โœ…' if self.assemblyai_available else 'โŒ'}")
    print(f"   API Key Configured: {'โœ…' if self.api_keys.get('ASSEMBLYAI_API_KEY') else 'โŒ'}")
    print()
Enter fullscreen mode Exit fullscreen mode

Comprehensive system for monitoring AssemblyAI feature availability including SDK status, API key configuration, and transcription
capabilities. Provides detailed status reporting for troubleshooting and system health monitoring.

๐Ÿ”’ 7. HTTPS Configuration for Microphone Access

python
def create_self_signed_cert():
    try:
        from cryptography import x509
        from cryptography.x509.oid import NameOID
        from cryptography.hazmat.primitives import hashes
        from cryptography.hazmat.primitives.asymmetric import rsa
        from cryptography.hazmat.primitives import serialization
        import datetime
        import ipaddress

        private_key = rsa.generate_private_key(
            public_exponent=65537,
            key_size=2048,
        )

        subject = issuer = x509.Name([
            x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
            x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "Local"),
            x509.NameAttribute(NameOID.LOCALITY_NAME, "Local"),
            x509.NameAttribute(NameOID.ORGANIZATION_NAME, "AI Learning Platform"),
            x509.NameAttribute(NameOID.COMMON_NAME, "localhost"),
        ])

        cert = x509.CertificateBuilder().subject_name(
            subject
        ).issuer_name(
            issuer
        ).public_key(
            private_key.public_key()
        ).serial_number(
            x509.random_serial_number()
        ).not_valid_before(
            datetime.datetime.utcnow()
        ).not_valid_after(
            datetime.datetime.utcnow() + datetime.timedelta(days=365)
        ).add_extension(
            x509.SubjectAlternativeName([
                x509.DNSName("localhost"),
                x509.DNSName("127.0.0.1"),
                x509.IPAddress(ipaddress.IPv4Address("127.0.0.1")),
            ]),
        ).sign(private_key, hashes.SHA256()) certificate and key
        with open("cert.pem", "wb") as f:
            f.write(cert.public_bytes(serialization.Encoding.PEM))

        with open("key.pem", "wb") as f:.private_bytes(
                encoding=serialization.Encoding.PEM,
                format=serialization.PrivateFormat.PKCS8,
                encryption_algorithm=serialization.NoEncryption()
            ))

        print("โœ… Self-signed certificate created")
        return True

    except Exception as e:
        print(f"โŒ Failed to create certificate: {e}")
        return False
Enter fullscreen mode Exit fullscreen mode

Creates self-signed SSL certificates required for browser microphone access. Generates cryptographic certificates for localhost with
proper domain configuration, enabling secure audio recording in web browsers for the educational platform.


Tech Stack Used

Backend

  • Python + Flask - For handling sessions and inference
  • AssemblyAI - Real-time transcription (Streaming API)
  • Groq (LLaMA3-8B) - For instant feedback

Frontend

  • JavaScript - Audio streaming + Web Audio API
  • HTML/CSS - Minimal, responsive, focused on clarity

๐Ÿ’ญ Final Thoughts

This wasnโ€™t just a submission.
This was a love letter ๐Ÿ’Œ๐Ÿ’Œ to every shy, nerdy student who ever wished their thoughts could come out clearer.

Itโ€™s funny.
We spend years learning things, but no one ever bothered teaching us how to say them well.
This project is my way of fixing that - with code, care, and a mic.

Would I build more on top of this? Absolutely.
Would I cry if I win? Probably.
Would I still keep improving it if I lose? Without a question. ๐Ÿฅน


๐Ÿซถ Thank You for Listening (Literally)

To the judges, mentors, and every dev reading this โ€”

Letโ€™s speak better. Letโ€™s build louder.
And maybeโ€ฆ letโ€™s stutter a little less along the way.

๐ŸŽค๐Ÿ’™
Divya Singh

Thank you for reading till the end

bow gif

Top comments (21)

Collapse
 
fm profile image
Fayaz

Nice!

You don't miss any dev Challenge, do you! ๐Ÿ˜‡

All the best ๐Ÿฅณ

Collapse
 
divyasinghdev profile image
Divya

Not 100% of them, i just create multiple submissions for a single challenge mostly ๐Ÿ˜…

Thank you ๐Ÿฅน

Collapse
 
fm profile image
Fayaz

That's a great strategy!

I barely ever get time to submit one project, and that too only on some challenges. ๐Ÿ˜’

You may like my last submission though. ๐Ÿ˜

Thread Thread
 
divyasinghdev profile image
Divya

It seems useful, but is the github repo completely updated?

Thread Thread
 
fm profile image
Fayaz

I'll add some more instructions and polishing, but it already does what is advertised on the post. YES!

Thread Thread
 
divyasinghdev profile image
Divya

I will check it out then ๐Ÿ˜

Collapse
 
rohan_sharma profile image
Rohan Sharma

I hope you win this one!

Collapse
 
divyasinghdev profile image
Divya

I hope so as well ๐Ÿ˜…

Thank you ๐Ÿ™

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

this is really cool ๐Ÿ”ฅ

Collapse
 
divyasinghdev profile image
Divya

Thank you for checking it out ๐Ÿ˜Š๐Ÿ˜Š

Collapse
 
meenakshi052003 profile image
Meenakshi Agarwal

Nice work! Is there a way to run the above app/demonstrate the functionality in a non-metered environment?

Collapse
 
divyasinghdev profile image
Divya

Non- metered as in without the api?

Collapse
 
meenakshi052003 profile image
Meenakshi Agarwal • Edited

Smart minds always get the right meaning, yes but without the api-key to be precise or with some public demo key....

Thread Thread
 
divyasinghdev profile image
Divya

The project's main feature is listening, understanding and then analysis of it, and feedback for the learner.

It needs these 2 apis , or any 2 ig, for the audio part and the analysis + feedback part.

Collapse
 
dummy001 profile image
dummy

you are a rockstar, completed 3 challenges and all are awesome.
Great work, liked all of your three challenges. โค๏ธ
Wish you all the very best for these challengesโœจโœจโœจ

Collapse
 
divyasinghdev profile image
Divya

Thank you Mr Ninja ๐Ÿ˜

Not that awesome, plus it was a last minute rush, but yup, ultimately submitted it all before the deadline.

Collapse
 
techboss profile image
Robert Thomas

amazing๐Ÿ”ฅ

Collapse
 
divyasinghdev profile image
Divya

Glad you liked it ๐Ÿ˜Š

Collapse
 
techboss profile image
Robert Thomas

thanks

Some comments may only be visible to logged-in visitors. Sign in to view all comments.