DEV Community

Cover image for πŸŽ“πŸ§  Grasp, Articulate & Refine: Your Real-Time Voice Coach for Smarter Recall & Academic Mastery πŸŽ€πŸ“šβš‘
Divya
Divya Subscriber

Posted on

πŸŽ“πŸ§  Grasp, Articulate & Refine: Your Real-Time Voice Coach for Smarter Recall & Academic Mastery πŸŽ€πŸ“šβš‘

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

πŸ’‘ What I Built

A real-time, AI-powered academic listening coach designed to help you:

  • Grasp any concept
  • Articulate it in your own words
  • Get real-time feedback from an AI mentor trained to respond like a domain-specific educator

project landing page

Imagine a personalized Listening Toastmasters for academics, one that:

βœ… Listens while you speak
βœ… Transcribes in real-time
βœ… Analyzes your response
βœ… Gives constructive feedback
βœ… Grades your clarity, tone, and structure, not you πŸ˜…

Perfect for viva prep, thesis defense, placement interviews, just better grasping a concept or topic, or even explaining tough concepts out loud.


✨ Why I Built It

I've always craved a mentor who could truly adapt to me -
One who listens without judgment.
Who cares how I speak, not just what I say.
Who waits when I pause. And helps me find the words when I blank out.

As a student juggling placements, exams,hackathons and life, I often find myself:

  • Mumbling under pressure
  • Rambling mid-response
  • Or going completely blank during interviews

So, I built this for that version of me.
The nervous student. The silent developer.
The person who knows the answer, but just can’t say it clearly.

This is more than a tool.
It’s a gentle, nerdy best friend in your laptop, reminding you:

β€œYou’ve got this. Just speak, I’ll help you shape it.”

Oh, and if you're overusing filler words?
It’ll lovingly πŸ’˜ roast you:

β€œBestie, you just said β€˜umm’ 27 times. Let’s fix that.”


πŸ› οΈ Features

1️⃣ πŸŽ™οΈ Mic On, Brain On

 Live voice input straight from your browser (no app install needed!)

2️⃣ ✍️ Real-Time Whispering

 Instant speech-to-text via ⚑ AssemblyAI’s Streaming API

3️⃣ πŸ“Š Instant Report Card

Get scored out of 10 on key communication metrics:

  • πŸ—£οΈ Fluency
  • 🧩 Coherence
  • πŸ” Redundancy
  • 🧠 Technical Depth
  • πŸ’ͺ Confidence Markers

πŸ“ˆ Delivered in real-time β†’ your growth, visualized.

4️⃣ πŸŽ“ AI Educator Mode

 Your speech gets evaluated like you're explaining to a domain expert (Groq)

5️⃣ πŸ” Retry Until It’s Right

 Stumble? Speak again. Smarter each time. πŸ”

6️⃣ 🎯 Focused Solo Practice

 A quiet dojo to train your mind-mouth connection πŸ€πŸ§˜β€β™€οΈ

7️⃣ πŸ§ͺ Built for the Serious Learners

 Ideal for:

  - 🧬 Viva / Thesis prep

  - πŸ§‘β€πŸ’» Tech interviews

  - πŸ“š Academic presentations

  - 🎀 Fluency drills

  - ✨ Better grasping any topic

8️⃣ πŸ’» Minimal UI, Max Results

 No distractions. Just you, your thoughts, and your growth πŸ’₯


🎬 Demo

Here's my live project:- Grasp Articulate Refine

⚠️ Works best in Chrome. Firefox sulks. Brave is brave. Safari is... shy.

You can check me showcasing my project here:-


GitHub Repository

πŸ‘‰πŸ‘‡

πŸ§ βœ¨πŸ“ˆ Grasp Articulate Refine

Your smart study coach, powered by AI - designed to help you truly understand what you learn, speak it with confidence, and get thoughtful feedback so you grow smarter, faster.

My project at a glimpse:-

screencapture-localhost-5000-2025-07-28-02_44_24

Check it out here live:- Grasp Articulate Refine


✨ Features

  • Adaptive Content Generation: Creates 2000-3000 word educational content tailored to your academic level
  • Voice-Based Assessment: Uses Assembly AI for speech-to-text transcription
  • AI-Powered Analysis: Acts as a globally renowned educator providing detailed feedback
  • Intelligent Grading: Grades responses out of 10 with detailed explanations
  • Progress Tracking: Students must score 9+ to advance to next topics
  • Celebration System: 3-second emoji overlay for excellent performance (πŸ₯³πŸŽ‰πŸŽŠ)
  • Mobile Responsive: Darker blue theme with high contrast design
  • Real References: Provides working, relevant reference links for the explanation provided
  • Multiple Academic Levels: High School, Undergraduate, Graduate, Professional
  • Custom Subject…

You can check out my repo above if you are more of a code person, or want to analyse my code πŸ€”, get inspiration, fork it, clone it, and work on it on your device locally.


Technical Implementation & AssemblyAI Integration

Here are the code snippets demonstrating the technical implementation and AssemblyAI integration in this project:-

🎯 1. AssemblyAI Initialization & Configuration

python
# utils/voice_manager.py - AssemblyAI Setup
import assemblyai as aai

class VoiceManager:
    def __init__(self, api_keys: Dict[str, str]):
        self.api_keys = api_keys
        self.assemblyai_available = False
        self._init_assemblyai()

    def _init_assemblyai(self):
        if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'):
            try:
                aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY']

                test_config = aai.TranscriptionConfig(
                    language_detection=True,   
                    punctuate=True,           
                    format_text=True,          
                    speaker_labels=False,      
                    auto_highlights=False      
                )

                self.assemblyai_available = True
                print("βœ… AssemblyAI initialized successfully")

            except Exception as e:
                print(f"❌ AssemblyAI initialization failed: {e}")
                self.assemblyai_available = False
Enter fullscreen mode Exit fullscreen mode

Sets up AssemblyAI SDK with API key and configures transcription settings including language detection, punctuation, and text
formatting. Initializes the VoiceManager class with enhanced features optimized for educational content transcription.

🎀 2. Core Audio Transcription Implementation

python
def transcribe_audio(self, audio_file_path: str) -> str:

    if not os.path.exists(audio_file_path):
        return "❌ Audio file not found"

    # Primary Method: AssemblyAI SDK
    if self.assemblyai_available:
        try:
            print("πŸ”„ Trying AssemblyAI SDK...")

            config = aai.TranscriptionConfig(
                language_detection=True,    
                punctuate=True,            
                format_text=True,          
                speaker_labels=False,      
                auto_highlights=False                  )

            transcriber = aai.Transcriber(config=config)
            transcript = transcriber.transcribe(audio_file_path)

            if transcript.status == "completed":
                print("βœ… AssemblyAI SDK transcription successful")
                return self._clean_transcription(transcript.text)
            elif transcript.status == "error":
                print(f"❌ AssemblyAI SDK error: {transcript.error}")
                return f"❌ Transcription error: {transcript.error}"

        except Exception as e:
            print(f"❌ AssemblyAI SDK error: {e}")

    # Fallback Method: Direct API
    if self.api_keys.get('ASSEMBLYAI_API_KEY'):
        try:
            print("πŸ”„ Trying AssemblyAI Direct API...")
            result = self._transcribe_with_api(audio_file_path)
            if result and not result.startswith("❌"):
                print("βœ… AssemblyAI API transcription successful")
                return self._clean_transcription(result)
        except Exception as e:
            print(f"❌ AssemblyAI API error: {e}")

    return "❌ Transcription failed. Please check API configuration."
Enter fullscreen mode Exit fullscreen mode

Main transcription function using dual-mode approach: primary AssemblyAI SDK method with fallback to direct API. Handles audio file
validation, processes transcription with enhanced configuration, and includes comprehensive error handling for reliable speech-to-
text conversion.

πŸ”§ 3. Direct API Implementation with Enhanced Features

python
# utils/voice_manager.py - Direct API Implementation
def _transcribe_with_api(self, audio_file_path: str) -> str:
    """
    Direct AssemblyAI API implementation with robust error handling
    """
    try:
        headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']}

        print("πŸ“€ Uploading audio file...")
        with open(audio_file_path, 'rb') as f:
            response = requests.post(
                'https://api.assemblyai.com/v2/upload',
                headers=headers,
                files={'file': f},
                timeout=60
            )

        if response.status_code != 200:
            return f"❌ Upload failed: {response.status_code} - {response.text}"

        upload_url = response.json()['upload_url']
        print(f"βœ… File uploaded: {upload_url}")

        print("πŸ”„ Requesting transcription...")
        data = {
            'audio_url': upload_url,
            'language_detection': True,    
            'punctuate': True,             
            'format_text': True,           
            'speaker_labels': False,       
            'auto_highlights': False       
        }

        response = requests.post(
            'https://api.assemblyai.com/v2/transcript',
            headers=headers,
            json=data,
            timeout=30
        )

        if response.status_code != 200:
            return f"❌ Transcription request failed: {response.status_code}"

        transcript_id = response.json()['id']
        print(f"πŸ”„ Transcription ID: {transcript_id}")

        print("⏳ Waiting for transcription to complete...")
        max_attempts = 60  # 2-minute timeout
        attempt = 0

        while attempt < max_attempts:
            response = requests.get(
                f'https://api.assemblyai.com/v2/transcript/{transcript_id}',
                headers=headers,
                timeout=30
            )

            if response.status_code != 200:
                return f"❌ Status check failed: {response.status_code}"

            result = response.json()
            status = result['status']

            if status == 'completed':
                print("βœ… Transcription completed")
                return result['text'] or "❌ No text in transcription result"
            elif status == 'error':
                error_msg = result.get('error', 'Unknown error')
                return f"❌ Transcription error: {error_msg}"
            elif status in ['queued', 'processing']:
                print(f"⏳ Status: {status} (attempt {attempt + 1}/{max_attempts})")
                import time
                time.sleep(2)  # 2-second polling interval
                attempt += 1
            else:
                return f"❌ Unknown status: {status}"

        return "❌ Transcription timeout - took too long to process"

    except requests.exceptions.Timeout:
        return "❌ Request timeout - please try again"
    except Exception as e:
        return f"❌ Unexpected error: {str(e)}"
Enter fullscreen mode Exit fullscreen mode

Implements direct AssemblyAI API calls as fallback method. Handles file upload, transcription request with enhanced features, and
intelligent polling with 2-minute timeout. Provides robust error handling for network issues and API failures.

🧹 4. Advanced Text Processing & Cleaning

python
# utils/voice_manager.py - Text Processing
def _clean_transcription(self, text: str) -> str:
    if not text:
        return "❌ Empty transcription result"

    text = text.strip()

    text = re.sub(r'\s+', ' ', text)

    text = re.sub(r'([.!?])\s*([a-z])',
                  lambda m: m.group(1) + ' ' + m.group(2).upper(), text)

    if text and not text[0].isupper():
        text = text[0].upper() + text[1:]

    if text and text[-1] not in '.!?':
        text += '.'

    return text

def validate_audio_file(self, file_path: str) -> Dict[str, any]:
    if not os.path.exists(file_path):
        return {
            'valid': False,
            'error': 'File does not exist',
            'file_size': 0
        }

    file_size = os.path.getsize(file_path)
    max_size = 100 * 1024 * 1024  # 100MB limit

    if file_size > max_size:
        return {
            'valid': False,
            'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)',
            'file_size': file_size
        }

    if file_size < 1000:  # Minimum 1KB
        return {
            'valid': False,
            'error': 'File too small - may be empty or corrupted',
            'file_size': file_size
        }

    return {
        'valid': True,
        'error': None,
        'file_size': file_size,
        'file_size_mb': file_size / (1024 * 1024)
    }
Enter fullscreen mode Exit fullscreen mode

Post-processes transcription results with text cleaning, whitespace normalization, sentence capitalization fixes, and proper
punctuation. Includes audio file validation checking size limits and file integrity for optimal transcription quality.

🌐 5. Flask Integration & API Endpoints

python
# app.py - Flask Integration
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
    try:
        audio_file = request.files.get('audio')
        if not audio_file:
            return jsonify({'error': 'No audio file provided'}), 400

        session_id = session.get('session_id', 'unknown')
        temp_path = f"temp/audio_{session_id}.wav"
        os.makedirs('temp', exist_ok=True)
        audio_file.save(temp_path)

        print(f"πŸ”„ Starting transcription of {temp_path}")

        validation = voice_manager.validate_audio_file(temp_path)
        if not validation['valid']:
            if os.path.exists(temp_path):
                os.remove(temp_path)
            return jsonify({'error': validation['error']}), 400

        transcription = voice_manager.transcribe_audio(temp_path)

        if os.path.exists(temp_path):
            os.remove(temp_path)

        print(f"βœ… Transcription result: {transcription[:100]}...")

        return jsonify({
            'success': True,
            'transcription': transcription,
            'file_size_mb': validation.get('file_size_mb', 0)
        })

    except Exception as e:
        print(f"❌ Transcription error: {e}")
        return jsonify({'error': f'Transcription failed: {str(e)}'}), 500

@app.route('/voice_status')
def voice_status():
    return jsonify(voice_manager.get_voice_status())
Enter fullscreen mode Exit fullscreen mode

Flask endpoints for audio transcription with session-based temporary file handling. Includes comprehensive error handling, file
validation, and cleanup. Provides voice status endpoint for real-time feature availability monitoring and diagnostics.

πŸ“Š 6. Status Monitoring & Diagnostics

python
# utils/voice_manager.py - Status Monitoring
def get_voice_status(self) -> Dict[str, bool]:
    return {
        'assemblyai_available': self.assemblyai_available,
        'voice_recording_available': self.assemblyai_available,
        'transcription_available': self.assemblyai_available,
        'api_key_configured': bool(self.api_keys.get('ASSEMBLYAI_API_KEY')),
        'sdk_available': ASSEMBLYAI_AVAILABLE,
        'direct_api_available': bool(self.api_keys.get('ASSEMBLYAI_API_KEY'))
    }

def _print_status(self):
    print("\n🎀 VOICE FEATURES STATUS:")
    print(f"   AssemblyAI Available: {'βœ…' if self.assemblyai_available else '❌'}")
    print(f"   Voice Recording: {'βœ…' if self.assemblyai_available else '❌'}")
    print(f"   Audio Transcription: {'βœ…' if self.assemblyai_available else '❌'}")
    print(f"   API Key Configured: {'βœ…' if self.api_keys.get('ASSEMBLYAI_API_KEY') else '❌'}")
    print()
Enter fullscreen mode Exit fullscreen mode

Comprehensive system for monitoring AssemblyAI feature availability including SDK status, API key configuration, and transcription
capabilities. Provides detailed status reporting for troubleshooting and system health monitoring.

πŸ”’ 7. HTTPS Configuration for Microphone Access

python
def create_self_signed_cert():
    try:
        from cryptography import x509
        from cryptography.x509.oid import NameOID
        from cryptography.hazmat.primitives import hashes
        from cryptography.hazmat.primitives.asymmetric import rsa
        from cryptography.hazmat.primitives import serialization
        import datetime
        import ipaddress

        private_key = rsa.generate_private_key(
            public_exponent=65537,
            key_size=2048,
        )

        subject = issuer = x509.Name([
            x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
            x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "Local"),
            x509.NameAttribute(NameOID.LOCALITY_NAME, "Local"),
            x509.NameAttribute(NameOID.ORGANIZATION_NAME, "AI Learning Platform"),
            x509.NameAttribute(NameOID.COMMON_NAME, "localhost"),
        ])

        cert = x509.CertificateBuilder().subject_name(
            subject
        ).issuer_name(
            issuer
        ).public_key(
            private_key.public_key()
        ).serial_number(
            x509.random_serial_number()
        ).not_valid_before(
            datetime.datetime.utcnow()
        ).not_valid_after(
            datetime.datetime.utcnow() + datetime.timedelta(days=365)
        ).add_extension(
            x509.SubjectAlternativeName([
                x509.DNSName("localhost"),
                x509.DNSName("127.0.0.1"),
                x509.IPAddress(ipaddress.IPv4Address("127.0.0.1")),
            ]),
        ).sign(private_key, hashes.SHA256()) certificate and key
        with open("cert.pem", "wb") as f:
            f.write(cert.public_bytes(serialization.Encoding.PEM))

        with open("key.pem", "wb") as f:.private_bytes(
                encoding=serialization.Encoding.PEM,
                format=serialization.PrivateFormat.PKCS8,
                encryption_algorithm=serialization.NoEncryption()
            ))

        print("βœ… Self-signed certificate created")
        return True

    except Exception as e:
        print(f"❌ Failed to create certificate: {e}")
        return False
Enter fullscreen mode Exit fullscreen mode

Creates self-signed SSL certificates required for browser microphone access. Generates cryptographic certificates for localhost with
proper domain configuration, enabling secure audio recording in web browsers for the educational platform.


Tech Stack Used

Backend

  • Python + Flask - For handling sessions and inference
  • AssemblyAI - Real-time transcription (Streaming API)
  • Groq (LLaMA3-8B) - For instant feedback

Frontend

  • JavaScript - Audio streaming + Web Audio API
  • HTML/CSS - Minimal, responsive, focused on clarity

πŸ’­ Final Thoughts

This wasn’t just a submission.
This was a love letter πŸ’ŒπŸ’Œ to every shy, nerdy student who ever wished their thoughts could come out clearer.

It’s funny.
We spend years learning things, but no one ever bothered teaching us how to say them well.
This project is my way of fixing that - with code, care, and a mic.

Would I build more on top of this? Absolutely.
Would I cry if I win? Probably.
Would I still keep improving it if I lose? Without a question. πŸ₯Ή


🫢 Thank You for Listening (Literally)

To the judges, mentors, and every dev reading this β€”

Let’s speak better. Let’s build louder.
And maybe… let’s stutter a little less along the way.

πŸŽ€πŸ’™
Divya Singh

Thank you for reading till the end

bow gif

Top comments (21)

Collapse
 
fm profile image
Fayaz

Nice!

You don't miss any dev Challenge, do you! πŸ˜‡

All the best πŸ₯³

Collapse
 
divyasinghdev profile image
Divya

Not 100% of them, i just create multiple submissions for a single challenge mostly πŸ˜…

Thank you πŸ₯Ή

Collapse
 
fm profile image
Fayaz

That's a great strategy!

I barely ever get time to submit one project, and that too only on some challenges. πŸ˜’

You may like my last submission though. 😁

Thread Thread
 
divyasinghdev profile image
Divya

It seems useful, but is the github repo completely updated?

Thread Thread
 
fm profile image
Fayaz

I'll add some more instructions and polishing, but it already does what is advertised on the post. YES!

Thread Thread
 
divyasinghdev profile image
Divya

I will check it out then 😁

Collapse
 
rohan_sharma profile image
Rohan Sharma

I hope you win this one!

Collapse
 
divyasinghdev profile image
Divya

I hope so as well πŸ˜…

Thank you πŸ™

Collapse
 
anmolbaranwal profile image
Anmol Baranwal

this is really cool πŸ”₯

Collapse
 
divyasinghdev profile image
Divya

Thank you for checking it out 😊😊

Collapse
 
meenakshi052003 profile image
Meenakshi Agarwal

Nice work! Is there a way to run the above app/demonstrate the functionality in a non-metered environment?

Collapse
 
divyasinghdev profile image
Divya

Non- metered as in without the api?

Collapse
 
meenakshi052003 profile image
Meenakshi Agarwal • Edited

Smart minds always get the right meaning, yes but without the api-key to be precise or with some public demo key....

Thread Thread
 
divyasinghdev profile image
Divya

The project's main feature is listening, understanding and then analysis of it, and feedback for the learner.

It needs these 2 apis , or any 2 ig, for the audio part and the analysis + feedback part.

Collapse
 
dummy001 profile image
dummy

you are a rockstar, completed 3 challenges and all are awesome.
Great work, liked all of your three challenges. ❀️
Wish you all the very best for these challenges✨✨✨

Collapse
 
divyasinghdev profile image
Divya

Thank you Mr Ninja 😁

Not that awesome, plus it was a last minute rush, but yup, ultimately submitted it all before the deadline.

Collapse
 
techboss profile image
Robert Thomas

amazingπŸ”₯

Collapse
 
divyasinghdev profile image
Divya

Glad you liked it 😊

Collapse
 
techboss profile image
Robert Thomas

thanks

Some comments may only be visible to logged-in visitors. Sign in to view all comments.