This is a submission for the Google AI Studio Multimodal Challenge
What I Built
LearnSphere AI is a comprehensive, multimodal AI learning companion that transforms traditional education from passive consumption into an interactive, personalized learning experience. Built on Google AI Studio and deployed on Cloud Run, it leverages Geminiβs powerful multimodal capabilities to create a complete learning ecosystem that adapts to each student's academic level, curriculum, and learning preferences.
The platform acts as an intelligent personal tutor, seamlessly integrating image understanding, document processing, and content generation to provide students & learners with tailored educational materials and study tools.
Features
π― Intelligent Syllabus Processing
- Image-to-Structure Conversion: Upload syllabus photos; gemini-2.5-flash extracts all units and topics into structured JSON.
- Manual Entry Fallback: Structured text input with automatic parsing for accessibility.
- Smart Organization: AI automatically categorizes and structures course content.
π§ Active Learning Tools
β’ Interactive Quiz Engine: Auto-generated quizzes with SWOT analysis and personalized study suggestions.
β’ Flashcard System: Spaced repetition testing with mastery tracking, when in testing mode, users are given the same flashcards after a certain no of flashcards until they really master it.
β’ Feynman Technique Module: AI-powered explanation evaluation for deep understanding. User can test themselves here, for if we truly understand a topic clearly enough, we can explain it to anyone, even to a young kid, no matter how complex it is.
β’ Progress Tracking: Visual progress indicators in flashcards.
π Multimodal Content Generation
β’ 11 Content Types: In-depth explanations, study notes, flashcards, quizzes, mindmaps, case studies, and more.
β’ PDF Document Processing: Upload lecture notes, textbooks, or research papers (PDF files only); gemini-2.5-flash extracts content into educational material.
β’ Structured Learning: AI-generated text-based mindmaps using gemini-2.5-flash
for organized concept visualization.
β’ Adaptive Complexity: Content automatically adjusts to user's academic level (Primary to PhD)
π AI Study Planner
β’ Automated Scheduling: Generate day-by-day study plans from the syllabus (uploaded image).
β’ Customizable Timelines: Set exam dates and get optimized study schedules
β’ Export Functionality: Download plans as CSV, the plan will be saved to your device.
π Iterative Learning
β’ Content Refinement: Real-time feedback integration to improve generated content. Users can provide feedback to iterate upon any content as per their needs.
β’ Multiple Perspectives: Switch between content types for the same topic, no need to start all over again.
β’ Source Integration: Combine AI knowledge with user-provided materials.
Problem It Solves
Generic Study Tools Crisis
Traditional study applications offer one-size-fits-all solutions that fail to address individual learning needs, academic levels, or specific course materials. Students struggle with:
- Manual content creation and organization
- Lack of personalization for their specific curriculum
- Disconnected study tools that donβt work together
- Inability to process their own course materials effectively
The LearnSphere Solution
LearnSphere AI creates a unified, intelligent learning ecosystem that:
- Personalizes Everything: Adapts content complexity and style to individual academic levels
- Processes Real Materials: Transforms syllabus images and PDF documents into interactive study tools
- Integrates Seamlessly: Connects planning, content generation, and active recall in one platform
- Scales Intelligently: Works for any subject, from primary school to PhD-level research
Ultimately, LearnSphere AI creates an experience where the student is in complete control, able to turn their own course materials into a suite of powerful, custom-built study aids.
Demo
π Live Application
Check it out here:- LearnSphere AI
πΉ Project Demo
Watch my project demo workflow here:-
πΈ Project Snapshots
My project snapshots showcasing its sections & features:-
Study Planner
LearnSphere AI Learning Studio
How I Used Google AI Studio
Google AI Studio served as the backbone of LearnSphere AI, transforming my vision into a fully functional multimodal applet. I am the architect and tester, defining flows, features, and design, and everything else, while Google AI Studio handled code generation. Every step was validated, refined, and iterated through systematic prompt engineering, ensuring reliability and scalability.
π§ͺ Prompt Engineering
- Crafted expert educator personas with academic-level awareness.
- Designed prompts for 11+ content formats(quizzes, flashcards, mindmaps, study plans).
- Iteratively refined prompts for accuracy, consistency, and context alignment.
π Structured Output Mastery
- Built robust JSON schemas for syllabus parsing, quizzes, flashcards, and schedules.
- Engineered complex quiz outputs with MCQs, subjective Qs, SWOT analysis, and grades.
- Perfected image-to-JSON conversion for course structure extraction.
- Enabled day-by-day AI study planning with schema-driven reliability.
π Multimodal Capabilities Implemented
- Image Understanding: Extracted typed/handwritten syllabus(image format) using Gemini 2.5 Flash.
- PDF Analysis: Parsed lecture notes & research papers into structured learning content.
- Mindmap Visualization: Generated dynamic, exportable PNG mindmaps for topic overviews.
- Fallback Handling: Built flows to handle poor image quality & broken PDFs.
βοΈ Google AI Studio as My Development Partner
- Checkpoint Restore: Reverted to earlier builds when new features caused errors.
- Architectural Control: I directed the user journey, testing, and feature design.
- Iterative Enhancement: Adopted/refined/rejected Studioβs suggestions to maintain my original vision for the project, all the while iterating upon the features & functionalities, enhancing it.
- Model Tuning: Adjusted temperature, tokens, and error-handling strategies for balance.
π Outcome
With only free-tier tools at hand, LearnSphere AI delivers:
- πΈ Image β JSON syllabus parsing
- π PDF β contextual content generation
- π― Interactive quizzes, flashcards & SWOT feedback
- π Exportable AI study plans
All powered by Google AI Studioβs multimodal strengths, proving how human vision + AI execution can create a complete, impactful ecosystem.
Multimodal Features
πΈ Syllabus-to-Structure (Image Understanding)
- Model: gemini-2.5-flash
- Input: Syllabus images (JPG, PNG, GIF, WebP)
- Output: Structured JSON with units/topics
User Experience Enhancement
- Eliminates manual syllabus entry with simple photo upload.
- Instantly converts static documents into interactive curriculum.
- Works with handwritten, printed, or digital syllabus.
- Reduces errors compared to manual transcription.
π PDF-to-Learning-Content (Document Processing)
- Model: gemini-2.5-flash
- Input: PDF documents up to 20MB
- Output: Educational content for explanations, quizzes, flashcards, and study notes
User Experience Enhancement
- Generates study aids from actual course materials (notes, textbooks, research papers).
- Ensures accuracy by reflecting professor-specific content.
- Transforms passive reading into active learning tools.
- Covers large documents for complete course coverage.
π§ Concept-to-Mindmap (Structured Text Generation)
β’ Model: gemini-2.5-flash
β’ Input: Topic text + academic context
β’ Output: Hierarchically formatted text-based mindmaps
User Experience Enhancement
- Supports visual learners with organized, structured text layouts.
- Clarifies complex relationships through indentation and formatting.
- Boosts retention by organizing dense information into readable hierarchies.
- Provides copy-paste ready content for external mindmap tools.
π Cross-Modal Integration
Functionality
- Syllabus images β transformed into structured study plans
- PDFs β provide the knowledge base for quizzes, flashcards, and explanations
- Mindmaps β visually reinforce & complement text-based learning
User Experience Enhancement
- Creates a seamless workflow where each modality enhances the next.
- Maintains contextual continuity, so learning materials always stay relevant.
- Supports multiple input types (images, PDFs, text) for maximum flexibility.
- Delivers comprehensive outputs tailored to diverse learning styles.
Multimodal Models Used
Feature | Model | Input Type | Output Type |
---|---|---|---|
Syllabus Image Processing | gemini-2.5-flash | Image (JPG, PNG, GIF, WebP) | Structured JSON |
PDF Document Processing | gemini-2.5-flash | PDF (β€20MB) | Course-specific learning content |
Content Generation | gemini-2.5-flash | Text Prompts | Notes, quizzes, flashcards, study guides |
Mindmap Generation | gemini-2.5-flash | Topic text + context | Structured text-based mindmaps |
Study Planning | gemini-2.5-flash | Syllabus JSON + timeline | Day-by-day study schedule (CSV/PDF) |
β¨ LearnSphere AI integrates multimodal AI capabilities at critical points in the user journey to remove friction and create a seamless, personalized, and powerful learning experience.
π Thank You
Thank you for checking out LearnSphere AI and taking the time to explore this project and article.
Your interest and support means a lot, itβs what drives builders like me to keep creating & improving.
β¨ Try it yourself, share your own learning journey, or drop any questions/thoughts in the comment section below π¬
Top comments (0)