Introduction
"Hand your textbooks and notes to AI — get quizzes, flashcards, Cornell notes, and podcasts. One platform handles both input and output."
This is Part 23 of the "Open Source Project of the Day" series. Today we explore PageLM (GitHub), open-sourced by CaviraOSS.
Google's NotebookLM turns documents into a "personal AI" you can converse with and generate audio from — but it's closed-source and tied to Google's ecosystem. PageLM is positioned as a community-driven, NotebookLM-style education platform: upload learning materials (PDF, DOCX, Markdown, TXT) and get contextual chat, SmartNotes, flashcards, quizzes, AI podcasts, plus voice transcription, assignment planning, exam simulation, debate practice, and a study companion. The backend supports multiple LLMs (Gemini, GPT, Claude, Grok, Ollama, OpenRouter) and multiple TTS engines (Edge TTS, ElevenLabs, Google TTS); the frontend is built with Vite + React + Tailwind — self-hostable, extensible, and suitable for students, teachers, and researchers.
What You'll Learn
- PageLM's positioning: an open-source, multi-modal "learning materials → interactive resources" platform
- Core capabilities: contextual chat, SmartNotes, flashcards, quizzes, AI podcasts, voice transcription, assignment planning, ExamLab, debate, study companion
- Tech stack: Node.js/TypeScript, LangChain/LangGraph, Vite/React, JSON or vector database storage
- How to run locally and deploy with Docker, plus key environment and configuration points
- Comparison with NotebookLM and similar education/notes tools
Prerequisites
- Basic understanding of RAG (Retrieval-Augmented Generation) and LLM APIs
- Experience with Node.js and npm/pnpm; familiarity with frontend/backend separated project structure
- For self-hosting, you'll need LLM/TTS API keys or a local Ollama instance
Project Background
Project Introduction
PageLM is an open-source, AI-driven education platform that transforms learning materials (PDF, DOCX, Markdown, TXT) into interactive learning resources: contextual Q&A, Cornell-style notes, flashcards, quizzes, AI podcasts, and more — plus voice transcription, assignment planning, exam simulation, debate practice, and personalized study companions. Inspired by NotebookLM, it emphasizes "document as context" and multi-modal output (text + audio), while supporting multiple LLMs, multiple TTS engines, JSON or vector database storage for easy self-hosting and extension.
Core problems the project solves:
- Want to use "upload document + AI" to create notes, quizzes, and podcasts, but don't want to depend on closed-source products
- Need support for multiple document formats and multiple models/voice engines, allowing flexible choices based on cost and scenario
- Educational institutions or individuals want to deploy in their own environment, controlling data and compliance
- Want a complete, extensible reference implementation with full frontend and backend (RAG, streaming output, file storage, etc.)
Target user groups:
- Students: Review, take notes, do practice problems, listen to podcasts during commutes
- Teachers and course designers: Generate quizzes, flashcards, and supplementary materials from lecture notes
- Researchers: Literature organization, summarization, and Q&A
- Developers: Learn the complete stack implementation of RAG + multi-model + education scenarios
Author/Team Introduction
- Organization: CaviraOSS (GitHub), an open-source organization focused on educational and tool-type projects like PageLM
- Project creation date: August 2025 (GitHub shows created_at 2025-08-31)
- Community: Discord, GitHub Issues/Discussions, welcoming contributions and feedback
Project Stats
- ⭐ GitHub Stars: 1.3k+
- 🍴 Forks: 186+
- 📦 Version: No official version number; main branch is the trunk
- 📄 License: CaviraOSS Community License (free to use and modify for personal and educational use; commercial use or resale requires written permission from CaviraOSS — see LICENSE in repo for details)
- 🌐 Website: No independent website; primarily hosted on GitHub
- 💬 Community: Discord, GitHub Issues
Main Features
Core Purpose
PageLM's core purpose is to transform "static learning materials" into "interactive resources you can converse with, quiz yourself on, and listen to", completing upload, parsing, retrieval, and multi-modal generation on a single platform:
- Document upload and parsing: Supports PDF, DOCX, Markdown, TXT (using pdf-lib, mammoth, pdf-parse, etc.)
- Contextual Chat: RAG-based Q&A on uploaded documents with streaming output (WebSocket)
- SmartNotes: Automatically generate Cornell-style notes by topic or uploaded content
- Flashcards: Extract non-overlapping flashcards from content for spaced repetition
- Quizzes: Generate interactive quizzes with hints, explanations, and scoring
- AI Podcasts: Convert notes or topics to audio (Edge TTS, ElevenLabs, Google TTS) for commute learning
- Voice Transcription: Convert lecture recordings and voice notes to searchable text materials
- Assignment Planning, ExamLab, Debate, Study Companion: Plan assignments, simulate exams, practice debates, and personalized study support
Use Cases
-
Daily student learning
- Upload textbooks or lecture notes, use contextual chat for Q&A, generate notes and flashcards, listen to AI podcasts during commutes
-
Teacher lesson preparation and question creation
- Generate quizzes and flashcards from course materials, or use SmartNotes for supplementary teaching content
-
Meeting/lecture organization
- Upload recordings or transcripts, transcribe, summarize, and do follow-up Q&A and note-taking
-
Self-hosting and privacy
- Data stays on your own server, combined with local models like Ollama, meeting compliance and cost control needs
-
Secondary development and integration
- Based on LangChain/LangGraph and a clear frontend/backend structure, extend with new models, new question types, or integrate with existing systems
Quick Start
System requirements: Node.js v21.18+, npm or pnpm, ffmpeg (for podcast audio); Docker optional.
Local development:
git clone https://github.com/CaviraOSS/pagelm.git
cd pagelm
# Linux
chmod +x ./setup.sh
./setup.sh
# Or manually: install dependencies and configure
cd backend && npm install
cd ../frontend && npm install
cd ..
cp .env.example .env
# Edit .env, fill in LLM/TTS API keys, etc.
# Start separately (two terminals)
cd backend && npm run dev
cd frontend && npm run dev
Open in browser: http://localhost:5173
Docker:
# Development
docker compose up --build
# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --build
- Frontend: port 5173 (dev) / 8080 (prod); Backend: port 5000
Core Features
-
Multi-LLM support
- Google Gemini, OpenAI GPT, Anthropic Claude, xAI Grok, Ollama (local), OpenRouter — switchable in configuration
-
Multiple TTS and podcast generation
- Edge TTS, ElevenLabs, Google TTS — convert notes/topics to audio content
-
Multiple embedding and storage options
- Embedding options include OpenAI, Gemini, Ollama; storage supports JSON (default) or vector database for easy extension
-
WebSocket streaming output
- Chat, notes, podcast generation, etc. support real-time streaming responses for a smoother experience
-
Markdown and structured output
- Notes and answers are primarily in Markdown, easy to export and further edit
-
Modular configuration
- Select LLM, TTS, database, upload limits, and more via environment variables; see
.env.example
- Select LLM, TTS, database, upload limits, and more via environment variables; see
Project Advantages
| Comparison | PageLM | NotebookLM (Google) | General notes + single-point AI tools |
|---|---|---|---|
| Open source & deployment | Open source, self-hostable | Closed source, Google ecosystem only | Product-dependent |
| Document → quiz/podcast | Built-in quizzes, flashcards, podcasts, notes | Has chat and audio | Usually requires combining multiple tools |
| Models & TTS | Multiple LLMs, multiple TTS configurable | Fixed model and capabilities | Depends on each product |
| Data & privacy | Fully local possible (Ollama + self-hosted) | Data on Google | Product-dependent |
| Extension & development | LangChain/LangGraph, clear frontend/backend | Not extensible | Product-dependent |
Why choose PageLM?
- All-in-one education AI: Upload once and get chat, notes, flashcards, quizzes, podcasts, and more learning tools — no switching between multiple apps
- Open source and modifiable: Suitable for learning RAG/multi-model architecture, and for customizing to school or institution needs
- Multi-model and self-hostable: Choose cloud API or local Ollama based on cost and compliance; full data control
Detailed Project Analysis
Tech Stack Overview
- Backend: Node.js, TypeScript, LangChain, LangGraph; handles document parsing, embedding, retrieval, LLM calls, TTS, streaming responses, and persistence
- Frontend: Vite, React, TailwindCSS; provides upload, chat, notes, flashcards, quiz, and podcast playback interfaces
- Document parsing: pdf-lib, mammoth, pdf-parse, etc. for PDF/DOCX/Markdown/TXT
- Storage: Default JSON file persistence; can connect vector databases for large-scale retrieval
- Deployment: Docker / Docker Compose, supporting both development and production configurations
Core Pipeline Overview
- Upload and parse: User uploads files; backend parses to text and optionally chunks and embeds, writing to JSON or vector database
- Retrieve and generate: Contextual chat, SmartNotes, flashcards, quizzes, etc. all use retrieval + LLM generation based on current documents (or selected topics); some results can be further converted to podcasts via TTS
- Streaming and persistence: WebSocket pushes generation process and results; generated content can be persisted at the project/document level for reuse and export
Configuration and Extension
-
Environment variables: LLM provider, TTS engine, database backend, upload size/format limits, etc. are all configured in
.env— see.env.example - Extension directions: README lists contribution directions for AI model integration, mobile, performance, and accessibility; code structure makes it easy to add new tools (e.g., new question types, new export formats)
Important Notes
- License: CaviraOSS Community License — free to use and modify for personal and educational use; commercial use or resale requires prior written permission from CaviraOSS
- API costs: Using cloud LLM/TTS services incurs costs; using Ollama has zero API cost but requires local compute
- ffmpeg: Required to generate podcast audio
Project Resources
Official Resources
- 🌟 GitHub: github.com/CaviraOSS/pagelm
- 💬 Discord: Discord community
- 🐛 Issues: GitHub Issues
Related Resources
- NotebookLM (Google product that inspired PageLM's design)
Who Should Use This
- Students and self-learners: Want to use AI to turn textbooks and lecture notes into chat, notes, quizzes, and podcasts
- Teachers and course designers: Need to quickly generate quizzes, flashcards, and supplementary content from existing materials
- Developers: Want to learn or reuse a complete "document parsing + RAG + multi-LLM + TTS" implementation
- Institutions: Need a self-hostable, multi-model, customizable education AI platform
Welcome to visit my personal homepage for more useful knowledge and interesting products
Top comments (0)