WonderLab

Posted on Mar 28

pen Source Project of the Day (Part 23): PageLM - Open-Source AI Education Platform, Turning Learning Materials into Interactive Resources

#ai #opensource #llm #langchain

Introduction

"Hand your textbooks and notes to AI — get quizzes, flashcards, Cornell notes, and podcasts. One platform handles both input and output."

This is Part 23 of the "Open Source Project of the Day" series. Today we explore PageLM (GitHub), open-sourced by CaviraOSS.

Google's NotebookLM turns documents into a "personal AI" you can converse with and generate audio from — but it's closed-source and tied to Google's ecosystem. PageLM is positioned as a community-driven, NotebookLM-style education platform: upload learning materials (PDF, DOCX, Markdown, TXT) and get contextual chat, SmartNotes, flashcards, quizzes, AI podcasts, plus voice transcription, assignment planning, exam simulation, debate practice, and a study companion. The backend supports multiple LLMs (Gemini, GPT, Claude, Grok, Ollama, OpenRouter) and multiple TTS engines (Edge TTS, ElevenLabs, Google TTS); the frontend is built with Vite + React + Tailwind — self-hostable, extensible, and suitable for students, teachers, and researchers.

What You'll Learn

PageLM's positioning: an open-source, multi-modal "learning materials → interactive resources" platform
Core capabilities: contextual chat, SmartNotes, flashcards, quizzes, AI podcasts, voice transcription, assignment planning, ExamLab, debate, study companion
Tech stack: Node.js/TypeScript, LangChain/LangGraph, Vite/React, JSON or vector database storage
How to run locally and deploy with Docker, plus key environment and configuration points
Comparison with NotebookLM and similar education/notes tools

Prerequisites

Basic understanding of RAG (Retrieval-Augmented Generation) and LLM APIs
Experience with Node.js and npm/pnpm; familiarity with frontend/backend separated project structure
For self-hosting, you'll need LLM/TTS API keys or a local Ollama instance

Project Background

Project Introduction

PageLM is an open-source, AI-driven education platform that transforms learning materials (PDF, DOCX, Markdown, TXT) into interactive learning resources: contextual Q&A, Cornell-style notes, flashcards, quizzes, AI podcasts, and more — plus voice transcription, assignment planning, exam simulation, debate practice, and personalized study companions. Inspired by NotebookLM, it emphasizes "document as context" and multi-modal output (text + audio), while supporting multiple LLMs, multiple TTS engines, JSON or vector database storage for easy self-hosting and extension.

Core problems the project solves:

Want to use "upload document + AI" to create notes, quizzes, and podcasts, but don't want to depend on closed-source products
Need support for multiple document formats and multiple models/voice engines, allowing flexible choices based on cost and scenario
Educational institutions or individuals want to deploy in their own environment, controlling data and compliance
Want a complete, extensible reference implementation with full frontend and backend (RAG, streaming output, file storage, etc.)

Target user groups:

Students: Review, take notes, do practice problems, listen to podcasts during commutes
Teachers and course designers: Generate quizzes, flashcards, and supplementary materials from lecture notes
Researchers: Literature organization, summarization, and Q&A
Developers: Learn the complete stack implementation of RAG + multi-model + education scenarios

Author/Team Introduction

Organization: CaviraOSS (GitHub), an open-source organization focused on educational and tool-type projects like PageLM
Project creation date: August 2025 (GitHub shows created_at 2025-08-31)
Community: Discord, GitHub Issues/Discussions, welcoming contributions and feedback

Project Stats

⭐ GitHub Stars: 1.3k+
🍴 Forks: 186+
📦 Version: No official version number; main branch is the trunk
📄 License: CaviraOSS Community License (free to use and modify for personal and educational use; commercial use or resale requires written permission from CaviraOSS — see LICENSE in repo for details)
🌐 Website: No independent website; primarily hosted on GitHub
💬 Community: Discord, GitHub Issues

Main Features

Core Purpose

PageLM's core purpose is to transform "static learning materials" into "interactive resources you can converse with, quiz yourself on, and listen to", completing upload, parsing, retrieval, and multi-modal generation on a single platform:

Document upload and parsing: Supports PDF, DOCX, Markdown, TXT (using pdf-lib, mammoth, pdf-parse, etc.)
Contextual Chat: RAG-based Q&A on uploaded documents with streaming output (WebSocket)
SmartNotes: Automatically generate Cornell-style notes by topic or uploaded content
Flashcards: Extract non-overlapping flashcards from content for spaced repetition
Quizzes: Generate interactive quizzes with hints, explanations, and scoring
AI Podcasts: Convert notes or topics to audio (Edge TTS, ElevenLabs, Google TTS) for commute learning
Voice Transcription: Convert lecture recordings and voice notes to searchable text materials
Assignment Planning, ExamLab, Debate, Study Companion: Plan assignments, simulate exams, practice debates, and personalized study support

Use Cases

Daily student learning
- Upload textbooks or lecture notes, use contextual chat for Q&A, generate notes and flashcards, listen to AI podcasts during commutes
Teacher lesson preparation and question creation
- Generate quizzes and flashcards from course materials, or use SmartNotes for supplementary teaching content
Meeting/lecture organization
- Upload recordings or transcripts, transcribe, summarize, and do follow-up Q&A and note-taking
Self-hosting and privacy
- Data stays on your own server, combined with local models like Ollama, meeting compliance and cost control needs
Secondary development and integration
- Based on LangChain/LangGraph and a clear frontend/backend structure, extend with new models, new question types, or integrate with existing systems

Quick Start

System requirements: Node.js v21.18+, npm or pnpm, ffmpeg (for podcast audio); Docker optional.

Local development:

git clone https://github.com/CaviraOSS/pagelm.git
cd pagelm

# Linux
chmod +x ./setup.sh
./setup.sh

# Or manually: install dependencies and configure
cd backend && npm install
cd ../frontend && npm install
cd ..
cp .env.example .env
# Edit .env, fill in LLM/TTS API keys, etc.

# Start separately (two terminals)
cd backend && npm run dev
cd frontend && npm run dev

Open in browser: http://localhost:5173

Docker:

# Development
docker compose up --build

# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d --build

Frontend: port 5173 (dev) / 8080 (prod); Backend: port 5000

Core Features

Multi-LLM support
- Google Gemini, OpenAI GPT, Anthropic Claude, xAI Grok, Ollama (local), OpenRouter — switchable in configuration
Multiple TTS and podcast generation
- Edge TTS, ElevenLabs, Google TTS — convert notes/topics to audio content
Multiple embedding and storage options
- Embedding options include OpenAI, Gemini, Ollama; storage supports JSON (default) or vector database for easy extension
WebSocket streaming output
- Chat, notes, podcast generation, etc. support real-time streaming responses for a smoother experience
Markdown and structured output
- Notes and answers are primarily in Markdown, easy to export and further edit
Modular configuration
- Select LLM, TTS, database, upload limits, and more via environment variables; see .env.example

Project Advantages

Comparison	PageLM	NotebookLM (Google)	General notes + single-point AI tools
Open source & deployment	Open source, self-hostable	Closed source, Google ecosystem only	Product-dependent
Document → quiz/podcast	Built-in quizzes, flashcards, podcasts, notes	Has chat and audio	Usually requires combining multiple tools
Models & TTS	Multiple LLMs, multiple TTS configurable	Fixed model and capabilities	Depends on each product
Data & privacy	Fully local possible (Ollama + self-hosted)	Data on Google	Product-dependent
Extension & development	LangChain/LangGraph, clear frontend/backend	Not extensible	Product-dependent

Why choose PageLM?

All-in-one education AI: Upload once and get chat, notes, flashcards, quizzes, podcasts, and more learning tools — no switching between multiple apps
Open source and modifiable: Suitable for learning RAG/multi-model architecture, and for customizing to school or institution needs
Multi-model and self-hostable: Choose cloud API or local Ollama based on cost and compliance; full data control

Detailed Project Analysis

Tech Stack Overview

Backend: Node.js, TypeScript, LangChain, LangGraph; handles document parsing, embedding, retrieval, LLM calls, TTS, streaming responses, and persistence
Frontend: Vite, React, TailwindCSS; provides upload, chat, notes, flashcards, quiz, and podcast playback interfaces
Document parsing: pdf-lib, mammoth, pdf-parse, etc. for PDF/DOCX/Markdown/TXT
Storage: Default JSON file persistence; can connect vector databases for large-scale retrieval
Deployment: Docker / Docker Compose, supporting both development and production configurations

Core Pipeline Overview

Upload and parse: User uploads files; backend parses to text and optionally chunks and embeds, writing to JSON or vector database
Retrieve and generate: Contextual chat, SmartNotes, flashcards, quizzes, etc. all use retrieval + LLM generation based on current documents (or selected topics); some results can be further converted to podcasts via TTS
Streaming and persistence: WebSocket pushes generation process and results; generated content can be persisted at the project/document level for reuse and export

Configuration and Extension

Environment variables: LLM provider, TTS engine, database backend, upload size/format limits, etc. are all configured in .env — see .env.example
Extension directions: README lists contribution directions for AI model integration, mobile, performance, and accessibility; code structure makes it easy to add new tools (e.g., new question types, new export formats)

Important Notes

License: CaviraOSS Community License — free to use and modify for personal and educational use; commercial use or resale requires prior written permission from CaviraOSS
API costs: Using cloud LLM/TTS services incurs costs; using Ollama has zero API cost but requires local compute
ffmpeg: Required to generate podcast audio

Project Resources

Official Resources

🌟 GitHub: github.com/CaviraOSS/pagelm
💬 Discord: Discord community
🐛 Issues: GitHub Issues

Related Resources

NotebookLM (Google product that inspired PageLM's design)

Who Should Use This

Students and self-learners: Want to use AI to turn textbooks and lecture notes into chat, notes, quizzes, and podcasts
Teachers and course designers: Need to quickly generate quizzes, flashcards, and supplementary content from existing materials
Developers: Want to learn or reuse a complete "document parsing + RAG + multi-LLM + TTS" implementation
Institutions: Need a self-hostable, multi-model, customizable education AI platform

Welcome to visit my personal homepage for more useful knowledge and interesting products

DEV Community