DEV Community

harshini hegde
harshini hegde

Posted on

Mentori: Turning Documents Into Interactive AI Tutors with Gemini Live

🐒 Mentori

Disclosure: This article and the Mentori project were created for the purposes of entering the Google Gemini Live Agent Challenge hackathon.


The Problem We Wanted to Solve

Every student studies from documents β€” PDFs, lecture slides, research papers, technical documentation.

But the experience is usually the same.

You sit there staring at a page trying to understand a concept. If it doesn't make sense, you reread the same paragraph again. And if it still doesn’t make sense, you open another tab and start searching for explanations somewhere else.

That breaks the learning flow.

Modern AI assistants help a little, but they have another problem: they answer from general knowledge, not from the exact document you're studying.

That means explanations often don't match the terminology, context, or examples used in the material.

We started thinking:What if the document itself could become the tutor?
Not a chatbot. Not a search tool.
A tutor that understands the document and can teach, explain, and interview you on the material.

That's how 🐒 Mentori was born.


Meet 🐒 Mentori

Mentori turns any document into an interactive AI tutor.

Upload a textbook chapter, lecture notes, or research paper and Mentori will:

  • explain the material conversationally
  • answer questions grounded in the document
  • switch languages when needed for better understanding
  • interview you to test your knowledge

Instead of passively reading documents, students can talk to the material and learn interactively.

Mentori uses Retrieval-Augmented Generation (RAG) with Gemini models so answers always stay grounded in the uploaded document.

And with Gemini Live API, the interaction becomes natural β€” students can speak to the tutor and receive responses in real time.


Why This Matters

Learning from documents is still the backbone of education.

But the process is inefficient:

  • Students reread sections repeatedly
  • Concepts remain unclear
  • Questions interrupt learning flow
  • Understanding is rarely tested properly

🐒 Mentori addresses these problems directly.

Passive Reading

Most studying is passive.
Mentori transforms reading into interactive conversation-based learning.

Context Loss

Traditional AI assistants don't know your document.
Mentori uses RAG so responses always come from the uploaded material.

No Feedback Loop

Reading alone doesn't confirm understanding.
Mentori's Interview Mode actively tests comprehension.


The Two Core Experiences

🐒 Mentori focuses on two key learning workflows.


Conversational Learning

After a document is uploaded, Mentori analyzes it and creates a structured learning session.

Instead of expecting the student to read everything alone, Mentori walks through the material and explains concepts step-by-step.

The interaction happens through real-time voice conversation powered by Gemini Live API.

Students can:

  • interrupt the tutor
  • ask follow-up questions
  • request clarification
  • ask for explanations in another language

For example:

"I didn't understand that. Can you explain it in Spanish?"

Mentori will immediately switch languages while still explaining the concept based on the document context.

The experience feels much closer to learning with a real tutor.


Interview Mode

🐒 Mentori also includes an Interview Mode designed to test understanding.

After processing the document, Mentori generates a curated set of important questions from the material.

The experience works like a real interview.

Mentori asks a question and the student answers using voice or text.

If the answer is incomplete or incorrect, Mentori guides the student toward the correct reasoning instead of immediately revealing the answer.

At the end of each question Mentori provides:

  • the correct answer
  • feedback on the response
  • suggestions for improvement

This turns studying into active knowledge reinforcement.


How 🐒 Mentori Works

Mentori is built as a real-time AI learning platform on Google Cloud combining document retrieval, conversational AI, and voice interaction.


Architecture Overview

Architecture

Mentori Architecture: RAG-powered document tutoring using Gemini Flash 2.5 for reasoning and Gemini Live API for real-time conversational learning.

At a high level, Mentori combines document processing, vector retrieval, and live AI interaction to create a responsive tutoring experience.

The system consists of several key components:

  • React Frontend (Firebase Hosting) Handles document uploads, learning sessions, and real-time interaction with the AI tutor.

  • Cloud Run Services (Python / FastAPI) Backend APIs responsible for document ingestion, session management, and AI orchestration.

  • Gemini Live API Enables low-latency conversational interaction and streaming voice responses.

  • Gemini Flash 2.5 Generates grounded explanations using retrieved document context.

  • Vertex AI Embeddings Converts document chunks into semantic vector representations.

  • Vertex AI Vector Search Retrieves relevant sections of the document during tutoring conversations.

  • Cloud Storage Stores uploaded documents.

  • Firestore Maintains session metadata and document references.

  • WebSockets Enable real-time communication between the frontend and AI services.


Document Processing (RAG Pipeline)

When a document is uploaded, Mentori processes it through an ingestion pipeline.

The system:

  1. stores the document in Cloud Storage
  2. splits it into semantic chunks
  3. generates embeddings using Vertex AI Embeddings
  4. indexes them in Vertex AI Vector Search

This creates a searchable knowledge base for the tutor.

The document is processed only once, and the session is stored so users can return later without uploading the document again.


Real-Time Learning Interaction

When students interact with the tutor:

  1. Questions arrive through WebSockets
  2. Relevant document sections are retrieved from Vector Search
  3. Context is sent to Gemini Flash 2.5
  4. Responses are streamed back through Gemini Live API

This enables low-latency conversational tutoring grounded in the document.


Technology Stack

Layer Technology
Frontend React
Backend Python + FastAPI
AI Models Gemini Flash 2.5 + Gemini Live API
Retrieval Vertex AI Embeddings + Vector Search
Storage Cloud Storage
Database Firestore
Hosting Google Cloud Run
Frontend Hosting Firebase Hosting
Infrastructure Terraform
CI/CD GitHub + GitHub Actions

Lessons Learned

Context matters

AI responses become far more useful when grounded in the exact material a user is studying.

Voice interaction changes the learning experience

Talking to a tutor is much more natural than typing questions.

RAG improves reliability

Retrieval ensures responses stay aligned with the document instead of drifting into generic AI explanations.

Architecture matters

Combining retrieval, AI reasoning, and live interaction requires careful design β€” but when done well it creates powerful learning experiences.


What's Next

🐒 Mentori is just getting started.

Future improvements include:

  • learning progress tracking
  • personalized learning paths
  • diagram and chart understanding
  • collaborative study sessions
  • integration with learning platforms

The vision is simple:

every document should become a personalized learning experience.


Try It

Project repository
https://github.com/HarshiniHegde/Mentrova

Demo video


πŸ‘₯ Creators

Mentori was built by:

Harshini Hegde

πŸ”— LinkedIn: https://www.linkedin.com/in/harshini-hegde-9806797a/

Rishi Muruganandha

πŸ”— LinkedIn: https://www.linkedin.com/in/rishi-muruganandha/


Final Thoughts

We often think about AI as a tool that answers questions.
But in education, the real opportunity is something different.
Not an AI that answers questions.
An AI that teaches.
🐒 Mentori is a small step in that direction.

Top comments (0)