How I Built a Markdown-to-Video Lecture Generator (And What I Learned)

#ai #startup #buildinpublic #programming

The problem I was trying to solve

I kept watching educators do the same painful thing: write detailed notes in Markdown, then manually convert them into slides, record themselves reading those slides, edit the video, upload it. A process that took 3 days for what should take 10 minutes.
I wanted to build something that collapses that entire workflow into one step.

What LectureCraft does

Upload any .md file. The tool:
Parses the Markdown and splits it into logical sections (2–5 segments, max 5 minutes each)
Generates visual slides for each section automatically
Adds AI narration via TTS — no microphone needed
Renders everything into short, downloadable video lectures
No signup. No editing. No video software. Try it here: [LectureCraft]

The technical decisions

Markdown parsing and splitting

The first challenge was splitting a .md file into meaningful segments — not just by character count, but by logical structure. I split on heading levels (H2, H3) and applied a max-duration constraint. If a section would produce more than 5 minutes of narration, it gets split further at paragraph boundaries.

This keeps each video focused and digestible — a design decision backed by research on optimal video length for learning (under 6 minutes retains attention significantly better).

Slide generation

Rather than pulling in a heavy presentation library, I generate slides directly from the Markdown structure. Each heading becomes a slide title. Bullet points and paragraphs become body content. Code blocks get syntax-highlighted automatically.
The constraint of working from structured Markdown actually makes this easier — the document structure is already meaningful, I'm just rendering it visually.

TTS narration

I used browser-native TTS for the narration layer — no external API calls, no cost, no latency from a third-party service. The tradeoff is voice quality, but for educational content where clarity matters more than naturalness, it works well. Future versions will support ElevenLabs or similar for higher quality voices.

Video rendering

This was the trickiest part. Combining slide frames with audio into a video file in the browser has real constraints — I used the Canvas API for frame rendering and MediaRecorder for capturing the output stream. The result is a .webm file that plays in any modern browser.

What I'd do differently

User testing earlier. I built the splitting algorithm based on my assumptions about what "logical sections" meant. Real educators split content differently than engineers do.
Quality settings. Some users want longer videos with more depth. The 5-minute max is opinionated — I'd make it configurable.
Voice selection. TTS voice quality varies significantly across browsers. I'd standardise this with a server-side TTS option early.

What's next

LectureCraft solves the video side. But the real bottleneck I keep seeing is earlier — educators struggle to write the course content in the first place, not just present it.
That's what I'm building next: VidhyaAI — an AI course content generator that produces full written learner guides, streams the output in real time, and exports to Word. It's what you use before LectureCraft.
If that sounds useful, join the early access waitlist here: VidhyaAI

Try LectureCraft free (no signup): LectureCraft
Follow me for updates on what I'm building: L.R.Sowmya

DEV Community