Introduction
"AI needs to know where it learned something from."
This is the 77th article in the "One Open Source Project a Day" series. Today's project is notebooklm-py.
Google NotebookLM is one of the best personal knowledge + AI Q&A tools available — you upload documents, it summarizes, generates podcasts, builds slide decks, and cites every answer back to a specific source passage. But it has one major limitation: no official API. Everything requires manual operation in the web UI.
notebooklm-py solves that. It reverse-engineers NotebookLM's full set of undocumented internal APIs, giving you complete programmatic control via Python or CLI — including capabilities the web interface doesn't expose. It also ships a Claude Code Skill that lets Claude query your NotebookLM knowledge base in real time during a conversation, combining Gemini's grounded knowledge retrieval with Claude's reasoning.
9,500+ Stars, MIT license.
What You Will Learn
- The three usage modes of notebooklm-py (Python API / CLI / Agent Skill)
- What it can do that the NotebookLM web UI cannot
- How the Claude Code Skill integration enables cross-vendor AI collaboration (Claude + NotebookLM)
- Why "source-cited answers" is the most direct engineering solution to AI hallucination
- The cost of reverse-engineering undocumented APIs: capability vs. fragility
Prerequisites
- Familiarity with Python basics and pip package management
- Experience with Google NotebookLM (understanding its core capabilities)
- Claude Code experience (relevant for the Skill integration section)
Project Background
Project Introduction
notebooklm-py is an unofficial Python client built by developer Teng Lin. It reverse-engineers Google NotebookLM's internal APIs, intercepting the same HTTP requests the NotebookLM web client uses to communicate with Google's servers.
The project contains two closely related components:
notebooklm-py
├── Python Library / CLI ← Full NotebookLM API wrapper
└── Claude Code Skill ← Lets Claude query NotebookLM mid-session
The two components solve different problems: the Python library answers "how do I automate NotebookLM operations?"; the Claude Code Skill answers "how do I make an AI assistant's responses grounded in my actual documents rather than its training data?"
Author
- Primary Author: Teng Lin (GitHub: @teng-lin)
- Project nature: Unofficial, community-driven, no affiliation with Google
-
Distribution: GitHub + PyPI (
pip install notebooklm-py) - Current version: v0.5.0 (Beta)
Project Data
- ⭐ GitHub Stars: 9,500+
- 📦 PyPI: notebooklm-py
- 📄 License: MIT
- 🐍 Python: ≥ 3.10
- 🖥️ Platforms: macOS, Linux, Windows
- ⚠️ Status: Beta (undocumented APIs — may break when Google changes internals)
- 🌐 Repository: teng-lin/notebooklm-py
Main Features
Core Utility
notebooklm-py covers the complete NotebookLM feature surface:
Notebook Management
├── Create, list, rename, delete, share
Source Import
├── URLs (web pages)
├── YouTube videos
├── PDF files
├── Google Drive documents
└── Pasted text
Knowledge Query
├── Q&A with conversation history (cited answers)
└── Citation pinning
Content Generation (Studio features)
├── Audio podcast (Audio Overview)
├── Video overview
├── Slide deck (PPTX)
├── Quiz
├── Flashcards
├── Mind map (JSON)
├── Report
└── Infographic
Research Agents
├── Web research (auto-import search results)
└── Drive research (auto-import cloud documents)
Three Usage Modes
Mode 1: Python API (Async, for automation pipelines)
import asyncio
from notebooklm import NotebookLM
async def main():
async with NotebookLM() as nlm:
# Create a notebook
notebook = await nlm.create_notebook("My Research Project")
# Import from multiple source types
await notebook.add_source("https://arxiv.org/abs/2310.06825")
await notebook.add_source("research.pdf")
await notebook.add_source("https://youtu.be/dQw4w9WgXcQ")
# Query the knowledge base (returns cited answer)
response = await notebook.query(
"What is the core contribution of this paper?",
include_citations=True
)
print(response.answer)
print(response.citations) # Each citation points to a specific source passage
# Generate audio podcast
podcast = await notebook.generate_audio_overview()
podcast.save("summary.mp3")
# Generate slide deck
slides = await notebook.generate_slides()
slides.save("presentation.pptx")
asyncio.run(main())
Mode 2: CLI (Shell scripts and CI/CD)
# Install
pip install notebooklm-py
# Basic operations
notebooklm notebook create "Product Docs Knowledge Base"
notebooklm source add --notebook "Product Docs Knowledge Base" https://docs.example.com
notebooklm source add --notebook "Product Docs Knowledge Base" ./spec-v2.pdf
# Query
notebooklm query --notebook "Product Docs Knowledge Base" \
"What changed in the authentication API in v2?"
# Generate content
notebooklm generate audio --notebook "Product Docs Knowledge Base"
notebooklm generate slides --notebook "Product Docs Knowledge Base" --output slides.pptx
# Batch sync (auto-update knowledge base in CI/CD)
notebooklm source sync --notebook "Product Docs Knowledge Base" ./docs/
Mode 3: Claude Code Skill (Claude queries NotebookLM in real time)
This is the most interesting part of the whole project.
# Install Skill (method 1: via plugin marketplace)
/plugin marketplace add teng-lin/notebooklm-py
# Install Skill (method 2: direct install into skills directory)
notebooklm skill install
# First-time authentication (browser Google login, then auto-maintained)
notebooklm auth login
Once installed, in a Claude Code session:
You: Explain our system's authentication architecture —
I think it's documented in the architecture docs.
Claude: [internally invokes the NotebookLM Skill, queries your "System Architecture" notebook]
Claude: According to your architecture document (source: architecture-v3.md, page 4),
the system uses a JWT + Refresh Token dual-token mechanism...
[every claim is backed by a traceable source passage, not training data generalization]
Deep Dive
How the Claude Code Skill Works
Understanding the Skill requires understanding its two-layer architecture:
Layer 1: Browser Automation
NotebookLM has no public API. All operations require Google account authentication. notebooklm-py uses browser automation (Playwright) to maintain authentication state, capturing auth tokens and then making direct HTTP requests.
This means:
- ✅ Full feature parity with the web UI
- ✅ Local execution (all data stays on your device)
- ⚠️ Requires local Claude Code (cloud/sandbox environments cannot run a browser)
- ⚠️ Depends on Google's internal APIs — may break without notice
Layer 2: Skill Integration
Claude Code conversation
↓ Determines knowledge base context is needed
Skill triggered
↓ Constructs NotebookLM query
notebooklm-py Python library
↓ HTTP request to Google servers
NotebookLM (Gemini 1.5 Pro backend)
↓ Returns cited answer
Skill parses response
↓ Injects into Claude's context window
Claude's final answer (grounded in real sources, with citations)
Why "Source-Cited Answers" Is the Engineering Solution to Hallucination
This is a design philosophy worth sitting with.
The core problem with AI hallucination isn't "the AI said something wrong." It's "the AI said something that might be wrong with full apparent confidence." Two strategies exist:
Strategy A: Better training data and RLHF
→ Mitigates but cannot eliminate (the model can never know what it doesn't know)
Strategy B: Force every claim to come from a verifiable source
→ Structural fix (if the source doesn't exist, the claim can't be generated)
The notebooklm-py Claude Code Skill takes Strategy B:
Traditional Claude answer:
"JWT is a token format that... (from training data, possibly outdated)"
Claude connected to NotebookLM:
"According to architecture-v3.md (page 4, paragraph 2),
your system's JWT implementation uses ECDSA signing...
(from your own uploaded document, fully traceable)"
This pattern is especially powerful for: codebase documentation, API specs, internal process handbooks, product requirements — knowledge that exists only within your organization, absent from any model's training data, retrievable only through document lookup.
Capabilities Beyond the Web Interface
notebooklm-py can do things the NotebookLM web UI cannot:
# Batch-create multiple notebooks (web requires manual clicking per notebook)
notebooks = await asyncio.gather(*[
nlm.create_notebook(f"Research Topic {i}")
for i in range(10)
])
# Programmatic citation pinning (a hidden web UI capability)
await notebook.pin_citation(citation_id="cit_abc123")
# Research agent: auto-search and import relevant web pages (beyond web UI)
await notebook.research_web(
query="latest quantum computing developments 2026",
auto_import=True,
max_sources=15
)
# Scheduled knowledge base updates (auto-sync docs in CI/CD)
async def sync_docs():
notebook = await nlm.get_notebook("Product Docs")
await notebook.sync_sources("./docs/") # only adds new files, no duplicates
Risks and Trade-offs
The most important technical limitations of this project deserve honest assessment:
| Risk | Impact | Mitigation |
|---|---|---|
| Undocumented APIs | Google can change internals at any time, breaking the tool | Monitor project updates, maintain fallback plan |
| Browser automation | Requires local environment; full features unavailable in headless CI/CD | CLI mode supports some headless operation |
| Unofficial client | Potential Google ToS violation risk | Author marks it for "personal use and prototype validation" |
| Beta status | API surface may change between versions | Pin version numbers, use cautiously in production |
The author is explicit in the documentation: this tool is designed for prototypes and personal projects, not production systems with strict reliability requirements.
Project Links & Resources
Official Resources
- 🌟 GitHub: https://github.com/teng-lin/notebooklm-py
- 📦 PyPI: pypi.org/project/notebooklm-py
- 📖 Skill Setup Guide: see
skills/README.mdin the repository
Installation Quick Reference
# Basic install
pip install notebooklm-py
# With browser automation support (required for Skill integration)
pip install "notebooklm-py[browser]"
# Install Claude Code Skill
notebooklm skill install
notebooklm auth login # First-time authentication
Target Audience
- Researchers: Automate literature review workflows, batch-import papers and query knowledge bases
- Development teams: Load API specs and architecture docs into Claude Code for source-grounded coding assistance
- Content creators: Batch-generate podcasts, slide decks, and mind maps from documents
- AI tool builders: Exploring cross-vendor AI capability composition (Claude + NotebookLM/Gemini)
Summary
Key Takeaways
- Complete API coverage: Notebook CRUD, multi-type source import, RAG queries, Studio content generation (audio/video/slides/flashcards) — all programmatic
- Three usage modes: Python async API (pipelines), CLI (scripts/CI), Claude Code Skill (AI session integration)
- Cross-vendor AI collaboration: Claude's reasoning + NotebookLM/Gemini's knowledge retrieval — complementary strengths
- Anti-hallucination engineering: Every claim tied to a traceable source, rather than relying on model training data
- Beyond the web interface: Batch operations, programmatic citation pinning, research agents with auto-import — only possible via API
One-Line Review
notebooklm-py did what Google wouldn't — turned NotebookLM into a programmable knowledge engine. Its Claude Code Skill does something even more interesting: makes two AI systems from different vendors collaborate, each doing what it does best.
Find more useful knowledge and interesting products on my Homepage
Top comments (0)