YOGARATHNAM-S

Posted on Feb 17

🔊 VoxVerify AI - Voice Detection

#deved #learngoogleaistudio #ai #gemini

Education Track: Build Apps with Google AI Studio

DEV Education Track Submission Draft

What I Built

I built VoxVerify AI, a web application that analyzes audio samples and detects whether a voice is AI-generated or human using Google Gemini and OpenAI audio models.

The app allows users to:

Record audio in real time
Upload audio files
Analyze speech across multiple languages
View forensic metrics and confidence scores

To generate parts of the application and analysis logic, I used prompt-based audio analysis workflows in Google AI Studio, focusing on:

Speech naturalness detection
Pitch and prosody evaluation
Structured output for confidence scoring

I also implemented a visual dashboard to display forensic audio metrics such as spectral purity, prosodic fluidity, and pitch variance.

Demo

Features demonstrated:

Real-time audio recording
Upload and analysis workflow
AI-generated vs Human classification
Confidence scoring and explanations
Metrics visualization dashboard

You can include here:

Screenshots of:
- Recording interface
- Results dashboard
- Confidence and explanation output

If deployed, add:

Live Demo: https://voxverify-ai-voice-detection.vercel.app/
GitHub Repo: https://github.com/YOGARATHNAM-S/voxverify-ai-voice-detection.git

My Experience

Working through the Build Apps with Google AI Studio track was a valuable learning experience.

Key Takeaways

1. Prompt Engineering for Structured Outputs
I learned how to design prompts that return structured and interpretable results instead of plain text responses. This was important for generating metrics and confidence scores.

2. Multimodal AI Capabilities
It was interesting to see how Gemini can process not only text but also audio inputs, enabling real-world applications like speech analysis.

3. Integrating AI into Real Applications
One of the most useful lessons was understanding how to integrate AI APIs into a full-stack application using:

React
TypeScript
REST API workflows

4. UX Matters in AI Apps
Presenting AI results clearly through charts and explanations significantly improves usability and trust.

What Surprised Me

What surprised me most was how effective AI models can be at identifying subtle speech characteristics like:

Prosody patterns
Pitch irregularities
Acoustic consistency

This opened my perspective on how AI can assist in digital forensics and misinformation detection.

What I’d Improve Next

If I continue developing this project, I plan to add:

Real-time streaming detection
Speaker fingerprinting
Model-assisted waveform and MFCC visualization
Batch audio analysis

Closing Thoughts

This track helped me understand how to move from:
Experimenting with AI → Building real applications with AI

That shift from experimentation to engineering was the most valuable part of the experience.

DEV Community