TL;DR — DocuFlow is an open-source MCP server that gives AI agents (Claude, Copilot, Cursor) a persistent, structured wiki about your codebase. Instead of re-explaining your project every session, your agent reads once, remembers forever, and builds on previous knowledge.
npm install -g @doquflow/cli && docuflow init
The Problem: AI Agents Have Goldfish Memory
Here's a conversation most of us have had:
You: "Add a rate limiter to the auth routes."
Agent: "Sure! What authentication library are you using?"
You: "...we've been using JWT for the last 6 sessions."
Every new conversation, your AI agent starts from scratch. It re-reads files, re-discovers patterns, re-asks the same questions. This gets worse as your codebase grows. By the time you've given enough context to be useful, your context window is half-burned.
The standard answer is RAG (Retrieval-Augmented Generation). Pull relevant files, chunk them, embed them, retrieve on demand. But RAG has a hidden cost: the LLM does the same extraction work on every single query. There's no accumulation. Knowledge doesn't compound.
The LLM Wiki Pattern
DocuFlow implements a different approach called the LLM Wiki pattern:
Raw Sources (immutable)
↓ ingest once
Wiki Layer (LLM-maintained markdown pages)
↓ query anytime
Synthesized Answers + Citations
The key insight: let the LLM do the bookkeeping once, then compound that work.
When you add a new source document, DocuFlow reads it, extracts entities and concepts, integrates them into an existing wiki, updates cross-references, and flags contradictions. The next query is better because the wiki is richer — not because you did more chunking.
Introducing DocuFlow
DocuFlow is an MCP server (Model Context Protocol) that works with Claude, Copilot, Cursor, and any other MCP-compatible agent. It provides 15 tools organized into four groups:
| Category | Tools | What they do |
|---|---|---|
| Code Extraction |
read_module, list_modules, write_spec, read_specs
|
Scan files → extract classes, endpoints, DB tables, deps |
| Wiki Pipeline |
ingest_source, query_wiki, wiki_search, save_answer_as_page, … |
Build and query the living wiki |
| Health |
lint_wiki, get_schema_guidance, preview_generation
|
Keep the wiki accurate and complete |
| Dependency Graph | generate_dependency_graph |
Visual map of imports, shared tables, and coupling |
Plus an 8-command CLI and a React web UI — all from a single npm install.
A Real Example: The TaskFlow API
Let me walk through a real project. I have a TypeScript REST API called TaskFlow — JWT auth, RBAC, PostgreSQL, Express. The kind of project where every new developer asks the same 10 questions.
Step 1: Initialize
cd taskflow-api
docuflow init
This creates .docuflow/ in your project, registers the MCP server in Claude/Copilot, and writes a CLAUDE.md that auto-loads when your agent starts.
Step 2: Write source documents
DocuFlow's wiki starts from your curated docs — markdown files you drop into .docuflow/sources/. These are the authoritative descriptions you write once:
.docuflow/sources/
├── overview.md # What this project is, tech stack, env vars
├── auth-security.md # JWT lifecycle, bcrypt, RBAC, rate limiting
├── architecture.md # System diagram, request lifecycle, DB schema
├── api-reference.md # REST endpoints with request/response examples
└── developer-guide.md # Setup, conventions, deployment
Step 3: Sync
docuflow sync
DocuFlow ingests all 5 sources and generates 71 wiki pages — each entity, concept, and relationship gets its own page with cross-references:
✓ Ingested overview.md → 12 pages (JWT Authentication, RBAC, Rate Limiter, …)
✓ Ingested auth-security.md → 18 pages (Access Token, Refresh Token, bcrypt, …)
✓ Ingested architecture.md → 16 pages (Connection Pool, Task State Machine, …)
✓ Ingested api-reference.md → 14 pages (POST /auth/login, GET /tasks/:id, …)
✓ Ingested developer-guide.md → 11 pages (Environment Setup, Deployment, …)
Health: 96/100 · 71 pages · 5 sources
Now every AI session that opens this project can immediately answer:
You: "Explain the token refresh flow."
Agent (reading wiki, not re-scanning files): "TaskFlow uses short-lived access tokens (15 min) paired with rotating refresh tokens (7 days). On expiry, the client POSTs to
/auth/refreshwith the refresh token in ahttpOnlycookie. The server validates against the stored hash, issues a new access token, and rotates the refresh token. In production, a Redis denylist handles immediate revocation…"
The agent didn't read auth.ts. It read the wiki page for Refresh Token — which already has the context, cross-references to Access Token and JWT Authentication, and a link to the source document.
Step 4: The Web UI
docuflow ui
Opens http://localhost:48821 — a live React interface with six views:
Ask — Type any question, get a synthesized answer with source citations:
Ask view showing JWT auth question with synthesized answer
Wiki — Browse all 71 pages organized as Entities, Concepts, Syntheses, Timelines:
Wiki tree with entity pages listed
Graph — Your entire knowledge base as an interactive D3 force graph. Each node is a wiki page; edges are cross-references:
Graph view with color-coded node clusters, zoomed into auth cluster showing JWT Authentication hub
Health — Real-time quality score (96/100) with actionable issues:
Health dashboard showing score gauge and orphan page list
The UI auto-discovers all DocuFlow projects in ~/dev, ~/code, ~/Desktop. Drop a project picker in the corner, switch between all your codebases.
What AI Agents Can Do With It
Once DocuFlow is registered, your agent gets a new vocabulary. Instead of:
"Let me read your package.json, then src/auth.ts, then… can you paste your middleware?"
The agent uses MCP tools:
→ query_wiki("How does authentication work?")
← Synthesized answer: JWT with refresh rotation, bcrypt, RBAC middleware...
Sources: [entity/jwt_authentication, entity/refresh_token, concept/rbac]
→ generate_dependency_graph({ focus: "auth" })
← auth.ts imports: jsonwebtoken, bcryptjs, pg
Shared tables with users.ts: "users"
Most connected: auth.ts (hub for 4 modules)
→ lint_wiki()
← Health: 96/100
Issues: 3 orphan pages, 1 stale concept
Recommendation: link "Rate Limiter" from "Security Headers" page
Knowledge compounds across sessions. If the agent answers a complex question and saves it back:
→ save_answer_as_page({ question: "What happens when a refresh token expires?",
answer: "...", category: "synthesis" })
← Saved: wiki/syntheses/what_happens_when_refresh_token_expires.md
That page is now part of the wiki. The next agent to ask gets a better, richer answer — without anyone repeating themselves.
The CLI at a Glance
docuflow init # Init + MCP registration
docuflow init --interactive # Guided domain setup (Code/Research/Business/Personal)
docuflow status # Wiki stats, health score, version
docuflow suggest # 5 prioritized starting-point suggestions
docuflow sync # Re-ingest all sources, rebuild index
docuflow sync --ai # Sync + AI-powered doc generation
docuflow watch # Background daemon — ingests new sources in <1s
docuflow review --staged # Review staged git changes for issues
docuflow ui # Start web interface on port 48821
Multi-Language, Multi-Domain
DocuFlow's extraction engine is regex-based and language-agnostic. It works on TypeScript, JavaScript, Python, Go, Ruby, Java, C#, PHP, SQL — extracting classes, functions, imports, REST endpoints, DB tables, and environment variable references from any of them.
And it's not just for code. The domain system supports Research, Business, and Personal knowledge bases too. The schema adapts, the wiki page templates change, the suggest command gives domain-appropriate recommendations.
Under the Hood
DocuFlow is a standard MCP server built with @modelcontextprotocol/sdk. It runs as a local subprocess — Claude or Copilot connects to it via stdio, same as any other MCP tool. The web UI is a single Express server on port 48821 that serves both the React frontend and a REST API bridge to the same 15 MCP tools.
Everything lives in .docuflow/ in your project directory:
.docuflow/
├── sources/ ← Your curated inputs (immutable)
├── wiki/ ← LLM-generated pages (entities, concepts, syntheses, timelines)
├── specs/ ← Agent-written technical specs
├── schema.md ← Domain config (customize the wiki structure)
├── index.md ← Auto-maintained page catalog
└── log.md ← Operation history
The wiki is plain markdown files on disk — git-trackable, diffable, portable. No database, no cloud service, no API keys required.
Getting Started in 5 Minutes
# Install globally
npm install -g @doquflow/cli
# Initialize your project
cd your-project
docuflow init
# Write your first source document
cat > .docuflow/sources/overview.md << 'EOF'
# My Project
What it does, the tech stack, key concepts...
EOF
# Ingest and start the UI
docuflow sync && docuflow ui
Then open Claude/Copilot and ask anything about your project. The wiki is already loaded.
What's Next
DocuFlow is at v1.5.1 and actively developed. On the roadmap:
- Auto-sync on git hooks — wiki updates automatically on every commit
- Team mode — shared wiki across a team, conflict-free merge
- More MCP clients — first-class support for Windsurf, Zed, VS Code Chat
Try It
-
npm:
@doquflow/cliand@doquflow/server - GitHub: doquflows/docuflow
- Email: docuflow@sshabuj.com
If you're tired of re-explaining your codebase to an AI that forgot everything overnight, give DocuFlow a try. Your future self — and your future agent — will thank you.
Built with TypeScript, @modelcontextprotocol/sdk, D3, React 18, Express.
Top comments (0)