Most of the time when watching long lecture videos, finding a specific concept later becomes difficult. You either have to rewatch the entire lecture or manually search through timestamps. I wanted a system where you could simply ask questions about a lecture video and get answers instantly.
So I built a Retrieval-Augmented Generation (RAG) based AI Teaching Assistant for video lectures.
The idea is simple: convert lecture videos into searchable knowledge.
Pipeline:
Video → Audio
The lecture video (MP4) is first converted into audio (MP3).
Audio → Transcript
The audio is transcribed into text so the system can understand the lecture content.
Chunking + Embeddings
The transcript is split into smaller chunks and converted into embeddings.
Vector Retrieval
The embeddings are stored in a vector index. When a question is asked, the system retrieves the most relevant lecture segments.
LLM Answer Generation
The retrieved lecture context is passed to a local LLM running with Ollama, which generates the final answer grounded in the lecture content.
Top comments (0)