Disclosure: This article and the Mentori project were created for the purposes of entering the Google Gemini Live Agent Challenge hackathon.
The Problem We Wanted to Solve
Every student studies from documents β PDFs, lecture slides, research papers, technical documentation.
But the experience is usually the same.
You sit there staring at a page trying to understand a concept. If it doesn't make sense, you reread the same paragraph again. And if it still doesnβt make sense, you open another tab and start searching for explanations somewhere else.
That breaks the learning flow.
Modern AI assistants help a little, but they have another problem: they answer from general knowledge, not from the exact document you're studying.
That means explanations often don't match the terminology, context, or examples used in the material.
We started thinking:What if the document itself could become the tutor?
Not a chatbot. Not a search tool.
A tutor that understands the document and can teach, explain, and interview you on the material.
That's how π’ Mentori was born.
Meet π’ Mentori
Mentori turns any document into an interactive AI tutor.
Upload a textbook chapter, lecture notes, or research paper and Mentori will:
- explain the material conversationally
- answer questions grounded in the document
- switch languages when needed for better understanding
- interview you to test your knowledge
Instead of passively reading documents, students can talk to the material and learn interactively.
Mentori uses Retrieval-Augmented Generation (RAG) with Gemini models so answers always stay grounded in the uploaded document.
And with Gemini Live API, the interaction becomes natural β students can speak to the tutor and receive responses in real time.
Why This Matters
Learning from documents is still the backbone of education.
But the process is inefficient:
- Students reread sections repeatedly
- Concepts remain unclear
- Questions interrupt learning flow
- Understanding is rarely tested properly
π’ Mentori addresses these problems directly.
Passive Reading
Most studying is passive.
Mentori transforms reading into interactive conversation-based learning.
Context Loss
Traditional AI assistants don't know your document.
Mentori uses RAG so responses always come from the uploaded material.
No Feedback Loop
Reading alone doesn't confirm understanding.
Mentori's Interview Mode actively tests comprehension.
The Two Core Experiences
π’ Mentori focuses on two key learning workflows.
Conversational Learning
After a document is uploaded, Mentori analyzes it and creates a structured learning session.
Instead of expecting the student to read everything alone, Mentori walks through the material and explains concepts step-by-step.
The interaction happens through real-time voice conversation powered by Gemini Live API.
Students can:
- interrupt the tutor
- ask follow-up questions
- request clarification
- ask for explanations in another language
For example:
"I didn't understand that. Can you explain it in Spanish?"
Mentori will immediately switch languages while still explaining the concept based on the document context.
The experience feels much closer to learning with a real tutor.
Interview Mode
π’ Mentori also includes an Interview Mode designed to test understanding.
After processing the document, Mentori generates a curated set of important questions from the material.
The experience works like a real interview.
Mentori asks a question and the student answers using voice or text.
If the answer is incomplete or incorrect, Mentori guides the student toward the correct reasoning instead of immediately revealing the answer.
At the end of each question Mentori provides:
- the correct answer
- feedback on the response
- suggestions for improvement
This turns studying into active knowledge reinforcement.
How π’ Mentori Works
Mentori is built as a real-time AI learning platform on Google Cloud combining document retrieval, conversational AI, and voice interaction.
Architecture Overview
Mentori Architecture: RAG-powered document tutoring using Gemini Flash 2.5 for reasoning and Gemini Live API for real-time conversational learning.
At a high level, Mentori combines document processing, vector retrieval, and live AI interaction to create a responsive tutoring experience.
The system consists of several key components:
React Frontend (Firebase Hosting) Handles document uploads, learning sessions, and real-time interaction with the AI tutor.
Cloud Run Services (Python / FastAPI) Backend APIs responsible for document ingestion, session management, and AI orchestration.
Gemini Live API Enables low-latency conversational interaction and streaming voice responses.
Gemini Flash 2.5 Generates grounded explanations using retrieved document context.
Vertex AI Embeddings Converts document chunks into semantic vector representations.
Vertex AI Vector Search Retrieves relevant sections of the document during tutoring conversations.
Cloud Storage Stores uploaded documents.
Firestore Maintains session metadata and document references.
WebSockets Enable real-time communication between the frontend and AI services.
Document Processing (RAG Pipeline)
When a document is uploaded, Mentori processes it through an ingestion pipeline.
The system:
- stores the document in Cloud Storage
- splits it into semantic chunks
- generates embeddings using Vertex AI Embeddings
- indexes them in Vertex AI Vector Search
This creates a searchable knowledge base for the tutor.
The document is processed only once, and the session is stored so users can return later without uploading the document again.
Real-Time Learning Interaction
When students interact with the tutor:
- Questions arrive through WebSockets
- Relevant document sections are retrieved from Vector Search
- Context is sent to Gemini Flash 2.5
- Responses are streamed back through Gemini Live API
This enables low-latency conversational tutoring grounded in the document.
Technology Stack
| Layer | Technology |
|---|---|
| Frontend | React |
| Backend | Python + FastAPI |
| AI Models | Gemini Flash 2.5 + Gemini Live API |
| Retrieval | Vertex AI Embeddings + Vector Search |
| Storage | Cloud Storage |
| Database | Firestore |
| Hosting | Google Cloud Run |
| Frontend Hosting | Firebase Hosting |
| Infrastructure | Terraform |
| CI/CD | GitHub + GitHub Actions |
Lessons Learned
Context matters
AI responses become far more useful when grounded in the exact material a user is studying.
Voice interaction changes the learning experience
Talking to a tutor is much more natural than typing questions.
RAG improves reliability
Retrieval ensures responses stay aligned with the document instead of drifting into generic AI explanations.
Architecture matters
Combining retrieval, AI reasoning, and live interaction requires careful design β but when done well it creates powerful learning experiences.
What's Next
π’ Mentori is just getting started.
Future improvements include:
- learning progress tracking
- personalized learning paths
- diagram and chart understanding
- collaborative study sessions
- integration with learning platforms
The vision is simple:
every document should become a personalized learning experience.
Try It
Project repository
https://github.com/HarshiniHegde/Mentrova
Demo video
π₯ Creators
Mentori was built by:
Harshini Hegde
π LinkedIn: https://www.linkedin.com/in/harshini-hegde-9806797a/
Rishi Muruganandha
π LinkedIn: https://www.linkedin.com/in/rishi-muruganandha/
Final Thoughts
We often think about AI as a tool that answers questions.
But in education, the real opportunity is something different.
Not an AI that answers questions.
An AI that teaches.
π’ Mentori is a small step in that direction.


Top comments (0)