DEV Community

Cover image for Vergilian - The speech coach
Kjue
Kjue

Posted on

Vergilian - The speech coach

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

I'm thrilled to introduce Vergilian, a revolutionary grading game designed to transform the way you learn and master new languages. This isn't just another flashcard app. Vergilian dives deep, providing an unprecedented level of real-time analysis on your spoken language. Bring your own text, read a comic book out to it, and check that you understand the story. Learn the story and retry your pronunciation so you get it. You'll get an instant score from 0-100 reflecting the quality of your pronunciation, along with an English translation for clarity. You'll immediately hear a perfectly pronounced audio sample of what the sentence should be like, giving you an immediate goal to strive for. No vague advice, just clear, direct feedback.

We named this app in honor of Publius Vergilius Maro, the legendary Roman poet known for his beautiful and masterful use of language. Virgil understood that true command of a language wasn't just about knowing the words—it was about speaking them with grace and power. Vergilian is built to continue his legacy, making expert pronunciation accessible to everyone.

You can replay any of your previous attempts, switch difficulty levels on the fly, and even listen back to your own voice to track your progress. It's about empowering you to take control of your language journey with a powerful, intuitive tool!

Demo

Ready to hear the difference? Dive in and try Vergilian for yourself!

Vergilian - The speech coach

Hope you enjoy it! Tell your friends too!

How to Use It:

  1. Choose Your Language: Select the language you're eager to practice.
  2. Speak Your Phrase: Say a word or phrase into the microphone. Don't worry about perfection on your first try – that's what the app is for!
  3. Get Instant Feedback: Receive your score, the English translation, and hear a perfectly pronounced version of your input.
  4. Practice & Improve: Re-record your attempt, adjust the difficulty, and listen to your progress!

Main application UI

How I Used Google AI Studio

Building an app with such sophisticated real-time language analysis capabilities might sound like a monumental task, but thanks to the incredible power of Google AI Studio and Gemini 2.5 Pro, it was an exhilarating journey! My primary approach involved leveraging Gemini 2.5 Pro as the core intelligence behind Vergilian. I prompted Gemini 2.5 Pro to generate the foundational code and logic for the application, specifically focusing on its multimodal understanding and generation capabilities. This allowed me to concentrate on the user experience while relying on Gemini to handle the heavy lifting of language processing. I engaged in an iterative prompting process, making several adjustments and refinements, which Gemini handled seamlessly, adapting to my evolving requirements for the grading system, translation, and audio generation.

Multimodal Features

The "multimodal" aspect of Vergilian is what truly sets it apart, and it's where Google AI Studio's capabilities shone brightest.

  1. Speech-to-Text & Language Understanding: The app listens to your spoken input (audio), processes it, and understands the language. This multimodal input (audio) is crucial for evaluating pronunciation and translating the phrase.
  2. Text Generation (Translation & Scoring): Based on your input, the app generates text output in two forms: a precise English translation and a numerical score (0-100). This provides immediate visual feedback.
  3. Text-to-Speech (Pronunciation Model): This is perhaps the most impactful multimodal feature. The app takes the correctly pronounced text (generated internally based on your input's meaning) and converts it back into high-quality spoken audio. This allows users to immediately compare their pronunciation with an ideal version, fostering rapid improvement.
  4. Image generation: The cover image for this post is generated with Gemini app. It is created in the theme that the applications' name implies.

These features combine to create an incredibly rich and immersive learning experience. By seamlessly integrating audio input, text analysis, and audio output, Vergilian provides a comprehensive feedback loop that is far more engaging and effective than traditional methods. It’s not just about reading or writing; it’s about truly hearing, understanding, and speaking the language with confidence!

Top comments (0)