DEV Community

Amine ZAMANI
Amine ZAMANI

Posted on

✨Vocabia: Multilingual Story-Based Vocabulary Learning with Gemini 2.5 Flash & Imagen 4.0

What I Built

I built Vocabia, an app that helps students at three levels—Primary, Middle School, and High School—enrich their vocabulary in three languages (English, French, Spanish).

Instead of passively reading, learners actively rebuild stories sentence by sentence:

Each sentence is broken down into words.

Learners must connect the words in the correct order.

They can flip any word to see its translation in the other two languages.

If the word represents a physical object, they can view a flashcard-style image of it.

A timer adds challenge and motivation.

Once the story is completed, Vocabia reads it aloud and generates a single illustration that represents the whole narrative.

Vocabia transforms vocabulary building into an interactive, visual, and multilingual journey.

Demo

Here’s a walkthrough of Vocabia in action:

Choose a difficulty level (Primary / Middle School / High School).

Choose a language (English / Français / Español).

Connect the word nodes to form sentences.

Flip words to see translations in two other languages.

If it’s an object, view a flashcard-style image.

Complete all sentences before time runs out.

Get the narrated story + a generated illustration as a reward.

Visual presentation

Word linking (React Flow graph)

Flip-to-translate feature

Object image pop-up

Confetti on success

Final story narration + illustration

Video Demo 🎥

Live demo 🚀

How I Used Google AI Studio

I leveraged Google AI Studio to integrate Gemini’s multimodal capabilities into Vocabia:

Story generation (Gemini 2.5 Flash) → Generates stories in structured JSON format (sentences → words → translations → object flag).

Per-word translation & classification → For each word, Gemini provides translations in the two other languages and identifies whether it’s a visualizable object.

Per-word images (Imagen 4.0) → Generates simple flashcard-like images of objects (clean white background, no text).

Final story illustration (Imagen 4.0) → Generates a single, colorful illustration summarizing the entire story.

Code generation & debugging → Gemini helped scaffold React components, manage state transitions, and integrate React Flow.

This mix of AI-driven structured outputs and image generation made Vocabia both engaging for learners and efficient to build.

Multimodal Features

Vocabia enhances the learning experience through multimodality:

Text → Structured JSON: Gemini generates coherent stories broken down word by word.

Text + Image Generation: Each object word gets a visual flashcard image, reinforcing memory.

Text → Full Story Illustration: Completed stories are turned into illustrations to reward learners.

Text-to-Speech: Stories are read aloud for pronunciation and listening practice.

By combining language learning, storytelling, visuals, and narration, Vocabia delivers an immersive, playful, and multilingual experience.

Top comments (0)