Divya

Posted on Jul 28

🎓🧠 Grasp, Articulate & Refine: Your Real-Time Voice Coach for Smarter Recall & Academic Mastery 🎤📚⚡

#devchallenge #assemblyaichallenge #ai #api

AssemblyAI Voice Agents Challenge: Real-Time

This is a submission for the AssemblyAI Voice Agents Challenge

💡 What I Built

A real-time, AI-powered academic listening coach designed to help you:

Grasp any concept
Articulate it in your own words
Get real-time feedback from an AI mentor trained to respond like a domain-specific educator

Imagine a personalized Listening Toastmasters for academics, one that:

✅ Listens while you speak
✅ Transcribes in real-time
✅ Analyzes your response
✅ Gives constructive feedback
✅ Grades your clarity, tone, and structure, not you 😅

Perfect for viva prep, thesis defense, placement interviews, just better grasping a concept or topic, or even explaining tough concepts out loud.

✨ Why I Built It

I've always craved a mentor who could truly adapt to me -
One who listens without judgment.
Who cares how I speak, not just what I say.
Who waits when I pause. And helps me find the words when I blank out.

As a student juggling placements, exams,hackathons and life, I often find myself:

Mumbling under pressure
Rambling mid-response
Or going completely blank during interviews

So, I built this for that version of me.
The nervous student. The silent developer.
The person who knows the answer, but just can’t say it clearly.

This is more than a tool.
It’s a gentle, nerdy best friend in your laptop, reminding you:

“You’ve got this. Just speak, I’ll help you shape it.”

Oh, and if you're overusing filler words?
It’ll lovingly 💘 roast you:

“Bestie, you just said ‘umm’ 27 times. Let’s fix that.”

🛠️ Features

1️⃣ 🎙️ Mic On, Brain On

Live voice input straight from your browser (no app install needed!)

2️⃣ ✍️ Real-Time Whispering

Instant speech-to-text via ⚡ AssemblyAI’s Streaming API

3️⃣ 📊 Instant Report Card

Get scored out of 10 on key communication metrics:

🗣️ Fluency
🧩 Coherence
🔁 Redundancy
🧠 Technical Depth
💪 Confidence Markers

📈 Delivered in real-time → your growth, visualized.

4️⃣ 🎓 AI Educator Mode

Your speech gets evaluated like you're explaining to a domain expert (Groq)

5️⃣ 🔁 Retry Until It’s Right

Stumble? Speak again. Smarter each time. 🔁

6️⃣ 🎯 Focused Solo Practice

A quiet dojo to train your mind-mouth connection 🤐🧘‍♀️

7️⃣ 🧪 Built for the Serious Learners

Ideal for:

- 🧬 Viva / Thesis prep

- 🧑‍💻 Tech interviews

- 📚 Academic presentations

- 🎤 Fluency drills

- ✨ Better grasping any topic

8️⃣ 💻 Minimal UI, Max Results

No distractions. Just you, your thoughts, and your growth 💥

🎬 Demo

Here's my live project:- Grasp Articulate Refine

⚠️ Works best in Chrome. Firefox sulks. Brave is brave. Safari is... shy.

You can check me showcasing my project here:-

GitHub Repository

👉👇

Divya4879 / Academic-Coach

🧠✨📈 Grasp Articulate Refine

Your smart study coach, powered by AI - designed to help you truly understand what you learn, speak it with confidence, and get thoughtful feedback so you grow smarter, faster.

My project at a glimpse:-

screencapture-localhost-5000-2025-07-28-02_44_24

Check it out here live:- Grasp Articulate Refine

✨ Features

Adaptive Content Generation: Creates 2000-3000 word educational content tailored to your academic level
Voice-Based Assessment: Uses Assembly AI for speech-to-text transcription
AI-Powered Analysis: Acts as a globally renowned educator providing detailed feedback
Intelligent Grading: Grades responses out of 10 with detailed explanations
Progress Tracking: Students must score 9+ to advance to next topics
Celebration System: 3-second emoji overlay for excellent performance (🥳🎉🎊)
Mobile Responsive: Darker blue theme with high contrast design
Real References: Provides working, relevant reference links for the explanation provided
Multiple Academic Levels: High School, Undergraduate, Graduate, Professional
Custom Subject…

View on GitHub

You can check out my repo above if you are more of a code person, or want to analyse my code 🤔, get inspiration, fork it, clone it, and work on it on your device locally.

Technical Implementation & AssemblyAI Integration

Here are the code snippets demonstrating the technical implementation and AssemblyAI integration in this project:-

🎯 1. AssemblyAI Initialization & Configuration

python
# utils/voice_manager.py - AssemblyAI Setup
import assemblyai as aai

class VoiceManager:
    def __init__(self, api_keys: Dict[str, str]):
        self.api_keys = api_keys
        self.assemblyai_available = False
        self._init_assemblyai()

    def _init_assemblyai(self):
        if ASSEMBLYAI_AVAILABLE and self.api_keys.get('ASSEMBLYAI_API_KEY'):
            try:
                aai.settings.api_key = self.api_keys['ASSEMBLYAI_API_KEY']

                test_config = aai.TranscriptionConfig(
                    language_detection=True,   
                    punctuate=True,           
                    format_text=True,          
                    speaker_labels=False,      
                    auto_highlights=False      
                )

                self.assemblyai_available = True
                print("✅ AssemblyAI initialized successfully")

            except Exception as e:
                print(f"❌ AssemblyAI initialization failed: {e}")
                self.assemblyai_available = False

Sets up AssemblyAI SDK with API key and configures transcription settings including language detection, punctuation, and text
formatting. Initializes the VoiceManager class with enhanced features optimized for educational content transcription.

🎤 2. Core Audio Transcription Implementation

python
def transcribe_audio(self, audio_file_path: str) -> str:

    if not os.path.exists(audio_file_path):
        return "❌ Audio file not found"

    # Primary Method: AssemblyAI SDK
    if self.assemblyai_available:
        try:
            print("🔄 Trying AssemblyAI SDK...")

            config = aai.TranscriptionConfig(
                language_detection=True,    
                punctuate=True,            
                format_text=True,          
                speaker_labels=False,      
                auto_highlights=False                  )

            transcriber = aai.Transcriber(config=config)
            transcript = transcriber.transcribe(audio_file_path)

            if transcript.status == "completed":
                print("✅ AssemblyAI SDK transcription successful")
                return self._clean_transcription(transcript.text)
            elif transcript.status == "error":
                print(f"❌ AssemblyAI SDK error: {transcript.error}")
                return f"❌ Transcription error: {transcript.error}"

        except Exception as e:
            print(f"❌ AssemblyAI SDK error: {e}")

    # Fallback Method: Direct API
    if self.api_keys.get('ASSEMBLYAI_API_KEY'):
        try:
            print("🔄 Trying AssemblyAI Direct API...")
            result = self._transcribe_with_api(audio_file_path)
            if result and not result.startswith("❌"):
                print("✅ AssemblyAI API transcription successful")
                return self._clean_transcription(result)
        except Exception as e:
            print(f"❌ AssemblyAI API error: {e}")

    return "❌ Transcription failed. Please check API configuration."

Main transcription function using dual-mode approach: primary AssemblyAI SDK method with fallback to direct API. Handles audio file
validation, processes transcription with enhanced configuration, and includes comprehensive error handling for reliable speech-to-
text conversion.

🔧 3. Direct API Implementation with Enhanced Features

python
# utils/voice_manager.py - Direct API Implementation
def _transcribe_with_api(self, audio_file_path: str) -> str:
    """
    Direct AssemblyAI API implementation with robust error handling
    """
    try:
        headers = {'authorization': self.api_keys['ASSEMBLYAI_API_KEY']}

        print("📤 Uploading audio file...")
        with open(audio_file_path, 'rb') as f:
            response = requests.post(
                'https://api.assemblyai.com/v2/upload',
                headers=headers,
                files={'file': f},
                timeout=60
            )

        if response.status_code != 200:
            return f"❌ Upload failed: {response.status_code} - {response.text}"

        upload_url = response.json()['upload_url']
        print(f"✅ File uploaded: {upload_url}")

        print("🔄 Requesting transcription...")
        data = {
            'audio_url': upload_url,
            'language_detection': True,    
            'punctuate': True,             
            'format_text': True,           
            'speaker_labels': False,       
            'auto_highlights': False       
        }

        response = requests.post(
            'https://api.assemblyai.com/v2/transcript',
            headers=headers,
            json=data,
            timeout=30
        )

        if response.status_code != 200:
            return f"❌ Transcription request failed: {response.status_code}"

        transcript_id = response.json()['id']
        print(f"🔄 Transcription ID: {transcript_id}")

        print("⏳ Waiting for transcription to complete...")
        max_attempts = 60  # 2-minute timeout
        attempt = 0

        while attempt < max_attempts:
            response = requests.get(
                f'https://api.assemblyai.com/v2/transcript/{transcript_id}',
                headers=headers,
                timeout=30
            )

            if response.status_code != 200:
                return f"❌ Status check failed: {response.status_code}"

            result = response.json()
            status = result['status']

            if status == 'completed':
                print("✅ Transcription completed")
                return result['text'] or "❌ No text in transcription result"
            elif status == 'error':
                error_msg = result.get('error', 'Unknown error')
                return f"❌ Transcription error: {error_msg}"
            elif status in ['queued', 'processing']:
                print(f"⏳ Status: {status} (attempt {attempt + 1}/{max_attempts})")
                import time
                time.sleep(2)  # 2-second polling interval
                attempt += 1
            else:
                return f"❌ Unknown status: {status}"

        return "❌ Transcription timeout - took too long to process"

    except requests.exceptions.Timeout:
        return "❌ Request timeout - please try again"
    except Exception as e:
        return f"❌ Unexpected error: {str(e)}"

Implements direct AssemblyAI API calls as fallback method. Handles file upload, transcription request with enhanced features, and
intelligent polling with 2-minute timeout. Provides robust error handling for network issues and API failures.

🧹 4. Advanced Text Processing & Cleaning

python
# utils/voice_manager.py - Text Processing
def _clean_transcription(self, text: str) -> str:
    if not text:
        return "❌ Empty transcription result"

    text = text.strip()

    text = re.sub(r'\s+', ' ', text)

    text = re.sub(r'([.!?])\s*([a-z])',
                  lambda m: m.group(1) + ' ' + m.group(2).upper(), text)

    if text and not text[0].isupper():
        text = text[0].upper() + text[1:]

    if text and text[-1] not in '.!?':
        text += '.'

    return text

def validate_audio_file(self, file_path: str) -> Dict[str, any]:
    if not os.path.exists(file_path):
        return {
            'valid': False,
            'error': 'File does not exist',
            'file_size': 0
        }

    file_size = os.path.getsize(file_path)
    max_size = 100 * 1024 * 1024  # 100MB limit

    if file_size > max_size:
        return {
            'valid': False,
            'error': f'File too large: {file_size / (1024*1024):.1f}MB (max 100MB)',
            'file_size': file_size
        }

    if file_size < 1000:  # Minimum 1KB
        return {
            'valid': False,
            'error': 'File too small - may be empty or corrupted',
            'file_size': file_size
        }

    return {
        'valid': True,
        'error': None,
        'file_size': file_size,
        'file_size_mb': file_size / (1024 * 1024)
    }

Post-processes transcription results with text cleaning, whitespace normalization, sentence capitalization fixes, and proper
punctuation. Includes audio file validation checking size limits and file integrity for optimal transcription quality.

🌐 5. Flask Integration & API Endpoints

python
# app.py - Flask Integration
@app.route('/transcribe_audio', methods=['POST'])
def transcribe_audio():
    try:
        audio_file = request.files.get('audio')
        if not audio_file:
            return jsonify({'error': 'No audio file provided'}), 400

        session_id = session.get('session_id', 'unknown')
        temp_path = f"temp/audio_{session_id}.wav"
        os.makedirs('temp', exist_ok=True)
        audio_file.save(temp_path)

        print(f"🔄 Starting transcription of {temp_path}")

        validation = voice_manager.validate_audio_file(temp_path)
        if not validation['valid']:
            if os.path.exists(temp_path):
                os.remove(temp_path)
            return jsonify({'error': validation['error']}), 400

        transcription = voice_manager.transcribe_audio(temp_path)

        if os.path.exists(temp_path):
            os.remove(temp_path)

        print(f"✅ Transcription result: {transcription[:100]}...")

        return jsonify({
            'success': True,
            'transcription': transcription,
            'file_size_mb': validation.get('file_size_mb', 0)
        })

    except Exception as e:
        print(f"❌ Transcription error: {e}")
        return jsonify({'error': f'Transcription failed: {str(e)}'}), 500

@app.route('/voice_status')
def voice_status():
    return jsonify(voice_manager.get_voice_status())

Flask endpoints for audio transcription with session-based temporary file handling. Includes comprehensive error handling, file
validation, and cleanup. Provides voice status endpoint for real-time feature availability monitoring and diagnostics.

📊 6. Status Monitoring & Diagnostics

python
# utils/voice_manager.py - Status Monitoring
def get_voice_status(self) -> Dict[str, bool]:
    return {
        'assemblyai_available': self.assemblyai_available,
        'voice_recording_available': self.assemblyai_available,
        'transcription_available': self.assemblyai_available,
        'api_key_configured': bool(self.api_keys.get('ASSEMBLYAI_API_KEY')),
        'sdk_available': ASSEMBLYAI_AVAILABLE,
        'direct_api_available': bool(self.api_keys.get('ASSEMBLYAI_API_KEY'))
    }

def _print_status(self):
    print("\n🎤 VOICE FEATURES STATUS:")
    print(f"   AssemblyAI Available: {'✅' if self.assemblyai_available else '❌'}")
    print(f"   Voice Recording: {'✅' if self.assemblyai_available else '❌'}")
    print(f"   Audio Transcription: {'✅' if self.assemblyai_available else '❌'}")
    print(f"   API Key Configured: {'✅' if self.api_keys.get('ASSEMBLYAI_API_KEY') else '❌'}")
    print()

Comprehensive system for monitoring AssemblyAI feature availability including SDK status, API key configuration, and transcription
capabilities. Provides detailed status reporting for troubleshooting and system health monitoring.

🔒 7. HTTPS Configuration for Microphone Access

python
def create_self_signed_cert():
    try:
        from cryptography import x509
        from cryptography.x509.oid import NameOID
        from cryptography.hazmat.primitives import hashes
        from cryptography.hazmat.primitives.asymmetric import rsa
        from cryptography.hazmat.primitives import serialization
        import datetime
        import ipaddress

        private_key = rsa.generate_private_key(
            public_exponent=65537,
            key_size=2048,
        )

        subject = issuer = x509.Name([
            x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
            x509.NameAttribute(NameOID.STATE_OR_PROVINCE_NAME, "Local"),
            x509.NameAttribute(NameOID.LOCALITY_NAME, "Local"),
            x509.NameAttribute(NameOID.ORGANIZATION_NAME, "AI Learning Platform"),
            x509.NameAttribute(NameOID.COMMON_NAME, "localhost"),
        ])

        cert = x509.CertificateBuilder().subject_name(
            subject
        ).issuer_name(
            issuer
        ).public_key(
            private_key.public_key()
        ).serial_number(
            x509.random_serial_number()
        ).not_valid_before(
            datetime.datetime.utcnow()
        ).not_valid_after(
            datetime.datetime.utcnow() + datetime.timedelta(days=365)
        ).add_extension(
            x509.SubjectAlternativeName([
                x509.DNSName("localhost"),
                x509.DNSName("127.0.0.1"),
                x509.IPAddress(ipaddress.IPv4Address("127.0.0.1")),
            ]),
        ).sign(private_key, hashes.SHA256()) certificate and key
        with open("cert.pem", "wb") as f:
            f.write(cert.public_bytes(serialization.Encoding.PEM))

        with open("key.pem", "wb") as f:.private_bytes(
                encoding=serialization.Encoding.PEM,
                format=serialization.PrivateFormat.PKCS8,
                encryption_algorithm=serialization.NoEncryption()
            ))

        print("✅ Self-signed certificate created")
        return True

    except Exception as e:
        print(f"❌ Failed to create certificate: {e}")
        return False

Creates self-signed SSL certificates required for browser microphone access. Generates cryptographic certificates for localhost with
proper domain configuration, enabling secure audio recording in web browsers for the educational platform.

Tech Stack Used

Backend

Python + Flask - For handling sessions and inference
AssemblyAI - Real-time transcription (Streaming API)
Groq (LLaMA3-8B) - For instant feedback

Frontend

JavaScript - Audio streaming + Web Audio API
HTML/CSS - Minimal, responsive, focused on clarity

💭 Final Thoughts

This wasn’t just a submission.
This was a love letter 💌💌 to every shy, nerdy student who ever wished their thoughts could come out clearer.

It’s funny.
We spend years learning things, but no one ever bothered teaching us how to say them well.
This project is my way of fixing that - with code, care, and a mic.

Would I build more on top of this? Absolutely.
Would I cry if I win? Probably.
Would I still keep improving it if I lose? Without a question. 🥹

🫶 Thank You for Listening (Literally)

To the judges, mentors, and every dev reading this —

Let’s speak better. Let’s build louder.
And maybe… let’s stutter a little less along the way.

🎤💙
Divya Singh

Thank you for reading till the end

Top comments (21)

Fayaz • Jul 28

Nice!

You don't miss any dev Challenge, do you! 😇

All the best 🥳

Divya • Jul 28

Not 100% of them, i just create multiple submissions for a single challenge mostly 😅

Thank you 🥹

Fayaz • Jul 28

That's a great strategy!

I barely ever get time to submit one project, and that too only on some challenges. 😒

You may like my last submission though. 😁

Divya • Jul 28

It seems useful, but is the github repo completely updated?

Fayaz • Jul 28

I'll add some more instructions and polishing, but it already does what is advertised on the post. YES!

Divya • Jul 28

I will check it out then 😁

Rohan Sharma • Jul 28

I hope you win this one!

Divya • Jul 28

I hope so as well 😅

Thank you 🙏

Anmol Baranwal • Jul 28

this is really cool 🔥

Divya • Jul 28

Thank you for checking it out 😊😊

Meenakshi Agarwal • Jul 28

Nice work! Is there a way to run the above app/demonstrate the functionality in a non-metered environment?

Divya • Jul 28

Non- metered as in without the api?

Meenakshi Agarwal • Jul 28 • Edited

Smart minds always get the right meaning, yes but without the api-key to be precise or with some public demo key....

Divya • Jul 28

The project's main feature is listening, understanding and then analysis of it, and feedback for the learner.

It needs these 2 apis , or any 2 ig, for the audio part and the analysis + feedback part.

dummy • Jul 30

you are a rockstar, completed 3 challenges and all are awesome.
Great work, liked all of your three challenges. ❤️
Wish you all the very best for these challenges✨✨✨