A while ago I realized my note-taking workflow was broken. I'd scrape a dozen articles, bookmark tutorials, download PDFs, and save YouTube summaries — only to never find them again when I actually needed them. My AI agent had no real memory beyond the current conversation, and even with a vector store, the cold start problem was painful: where does the knowledge come from in the first place?
I started building a personal knowledge pipeline, and it eventually grew into the Knowledge and Memory Management (KMM) project. It's an open-source extension layer for the hermes-memory-installer that tackles the full cycle: collect → analyze → store → sync.
The Architecture at a Glance
KMM is organized into three layers:
- Collection Layer (40+ tools) – web scraping, video/audio transcription, article extraction, document OCR, and even book auto-condensing.
- Analysis Layer – AI-powered note generation, knowledge graph extraction, NLI fact-checking, and discovery/recall.
- Storage Layer (Three-Tier Memory) – Hot (working memory via Memory tool), Warm (10K-node hindsight), Cold (11K-page gbrain).
Plus a cloud sync layer that wraps rclone and supports OneDrive, Google Drive, Dropbox, WebDAV, S3, and a dozen more providers.
Why I Built It
Existing tools solve only part of the problem. You can scrape with a browser plugin, but the data stays siloed. You can embed documents into a vector DB, but you still need to manually feed it. I wanted a unified pipeline that:
- Automatically collects from web, video, docs, and books
- Generates structured notes and extracts knowledge graphs
- Makes all that instantly retrievable via semantic search
- Keeps everything synced across my cloud drives
KMM doesn't replace the memory sidecar — it feeds it with high-quality, pre-processed knowledge.
Quick Start
After setting up hermes-memory-installer, install KMM:
git clone https://github.com/mage0535/Knowledge-and-Memory-Management.git
export AGENT_HOME=/path/to/your/agent
The project uses portable paths, so no hardcoded directories. Run a collection with one of the 40+ tools:
# Scrape a webpage and automatically generate a note
python src/knowledge_collector/web.py --url https://example.com --note
# Transcribe a YouTube video
python src/knowledge_collector/video.py --url https://youtube.com/watch?v=...
For cloud sync, configure rclone and run:
python src/cloud_sync/sync.py --remote onedrive:MyNotes
Everything flows into the three-tier memory, so your AI agent can recall it during conversations.
When to Use It (and When Not To)
- Great for: Developers building personal AI assistants, researchers who consume a lot of content, anyone tired of manual note-taking.
- Not for: Teams needing real-time collaboration (it's designed for single-agent setups), or users who want a zero-config SaaS product. This is a DIY pipeline.
Tech Stack
- Python 3.10+, yt-dlp, rclone, MarkItDown for document conversion, and a handful of AI APIs for analysis. The
docs/tool-versions.mdlists all verified dependencies.
The Result
I no longer manually organize knowledge. When my AI agent needs context, it finds it — from yesterday's blog post, last week's PDF, or last month's YouTube playlist. The sync layer keeps everything backed up and portable.
If you're building a memory-enhanced agent and struggling with input sources, give KMM a look. It's MIT-licensed and PRs are welcome.
Check it out on GitHub: github.com/mage0535/Knowledge-and-Memory-Management
Top comments (0)