Mervin

Posted on Jun 11

The "open-source NotebookLM" lie — and the one repo that actually earns the label

#opensource #selfhosted #llm #notebooklm

TL;DR — lfnovo/open-notebook spins up in one Docker command, keeps every byte on your machine, and supports 18+ AI providers including Ollama. run-llama/notebookllama requires three paid API keys before you can ingest a single document. Both call themselves open-source. Only one earns it.

The gotcha nobody mentions

run-llama/notebookllama has a great README. It says "fully open-source." Then you look at the .env.example:

OPENAI_API_KEY=
LLAMACLOUD_API_KEY=
ELEVENLABS_API_KEY=

Three mandatory paid services. Document parsing and indexing run on LlamaCloud's servers — not locally. The UI runs on your machine. The intelligence layer doesn't.

That's not a criticism of the project — it's genuinely useful reference code for the LlamaIndex ecosystem. But "self-hosted" it is not.

lfnovo/open-notebook makes the harder architectural choice. Here's how they actually compare.

Install story

open-notebook — 3 lines:

curl -O https://raw.githubusercontent.com/lfnovo/open-notebook/main/docker-compose.yml
export OPEN_NOTEBOOK_ENCRYPTION_KEY=$(openssl rand -hex 32)
docker compose up -d
# UI: http://localhost:8502 · API: http://localhost:5055

Done. Configure your AI provider from the UI. Use Ollama and the whole stack costs $0.

notebookllama — 5 steps:

git clone https://github.com/run-llama/notebookllama
cp .env.example .env
# fill in 3 API keys first

docker compose up -d
uv run tools/create_llama_extract_agent.py   # setup wizard
uv run tools/create_llama_cloud_index.py     # setup wizard
uv run src/notebookllama/server.py &
streamlit run src/notebookllama/Home.py
# UI: http://localhost:8501

Before you can index a document, you're running two setup wizards that provision pipelines on LlamaCloud's servers.

Architecture side-by-side

	open-notebook	notebookllama
Truly local?	✅ Yes (Ollama path)	⚠️ Indexing runs on LlamaCloud
Required paid APIs	0	3
Database	SurrealDB (vector + full-text)	Postgres + Jaeger
AI providers	18+	OpenAI default
REST API	✅ Full (:5055)	⚠️ Limited
Podcast generation	✅ 1–4 speakers	✅ ElevenLabs
Citation quality	⚠️ Basic (being rebuilt)	✅ Solid
License	MIT	Apache 2.0
Best for	Production self-hosting	LlamaIndex exploration

One detail worth noting on SurrealDB: it handles both vector similarity search and full-text search in a single container. No second database to manage — operators self-hosting on Coolify or Hetzner appreciate this.

Five things neither gets right yet

This is the more interesting part. Both projects are missing the same things — and they're all solvable.

1. Inline citation UI

NotebookLM's actual killer feature isn't the chat. It's the citation highlighting — click a claim, the source passage lights up. Neither project does this well. open-notebook's maintainer has publicly acknowledged it's placeholder-quality and is rebuilding it. notebookllama inherits LlamaCloud's parser which is better, but the UI doesn't expose it the same way.

This is the single biggest UX gap between these projects and Google's product.

2. Graph layer on top of RAG

Both tools use standard RAG: embed chunks, retrieve by cosine similarity, inject into context. Fine for "what does document X say about topic Y." Breaks down for "what are the relationships between concept A across documents B, C, and D."

A graph layer — Neo4j, or SurrealDB's own native graph capabilities in open-notebook's case — sitting alongside the vector store would unlock cross-document reasoning that RAG alone can't deliver. The infrastructure is already there in open-notebook's stack. The wiring isn't.

3. Multi-user workspaces

Both tools are single-user by design. NotebookLM added collaborative notebooks as a flagship feature. For teams this is a dealbreaker.

open-notebook exposes a full REST API on :5055 — that's the foundation. The missing pieces are an auth layer and per-user notebook isolation. Not architecturally hard. Just not done yet.

4. Smarter document ingestion

Current ingestion pipeline: extract text → chunk → embed. For structured documents — financial reports, legal filings, spec sheets — this loses table structure, section hierarchy, and cross-reference relationships entirely.

LlamaParse handles this better. open-notebook's ingestion pipeline would benefit from a configurable parsing stage that understands document structure before chunking. The plugin surface is there; the parser isn't.

5. Local TTS for podcasts

NotebookLM's audio overview is free because Google's TTS is built-in. Both open-source alternatives route audio through ElevenLabs by default — adding cost and a third-party dependency.

Local TTS has improved dramatically in 2025 (Kokoro, Parler-TTS, Coqui XTTS). Either project could add a local TTS backend that makes podcast generation genuinely free. The architecture supports it. The wiring just isn't there yet.

Which one should you use?

Use open-notebook if:

Data sovereignty matters (client work, proprietary research, anything sensitive)
You want to run against Claude, local models, or anything other than OpenAI
You're deploying to a VPS, NAS, or home server
You want to build on top of it via the REST API

Use notebookllama if:

You're learning the LlamaIndex/LlamaCloud ecosystem
You already have LlamaCloud credits
You need the best available open-source citation quality right now

Build on open-notebook if you're serious about a NotebookLM competitor. The architecture is sounder, the install story is better, the API surface is richer, and MIT gives you more room. The citation gap and single-user limitation are solvable engineering problems — not dead ends.

One honest observation

The real insight from comparing these two repos isn't about which is better. It's about what "open-source" means in this context.

notebookllama trades the credibility of open-source while keeping the intelligence layer proprietary via LlamaCloud. open-notebook makes the harder choice and actually earns the label.

That gap matters if you're building a product on top of one of them. It matters less if you just want a self-hosted research workspace for personal use.

Either way: open-notebook shipped v1.9.0 on June 2 and hit GitHub trending the same day. The right time to pay attention to this category is now.

Found an error or something that's changed? Drop a comment — I'll update.

DEV Community