Shaiful Islam Shabuj

Posted on May 10

DocuFlow: Give Your AI Agent a Persistent Memory for Your Codebase

#ai #mcp #devtool #documentation

TL;DR — DocuFlow is an open-source MCP server that gives AI agents (Claude, Copilot, Cursor) a persistent, structured wiki about your codebase. Instead of re-explaining your project every session, your agent reads once, remembers forever, and builds on previous knowledge.

npm install -g @doquflow/cli && docuflow init

The Problem: AI Agents Have Goldfish Memory

Here's a conversation most of us have had:

You: "Add a rate limiter to the auth routes."

Agent: "Sure! What authentication library are you using?"

You: "...we've been using JWT for the last 6 sessions."

Every new conversation, your AI agent starts from scratch. It re-reads files, re-discovers patterns, re-asks the same questions. This gets worse as your codebase grows. By the time you've given enough context to be useful, your context window is half-burned.

The standard answer is RAG (Retrieval-Augmented Generation). Pull relevant files, chunk them, embed them, retrieve on demand. But RAG has a hidden cost: the LLM does the same extraction work on every single query. There's no accumulation. Knowledge doesn't compound.

The LLM Wiki Pattern

DocuFlow implements a different approach called the LLM Wiki pattern:

Raw Sources (immutable)
       ↓ ingest once
Wiki Layer (LLM-maintained markdown pages)
       ↓ query anytime
Synthesized Answers + Citations

The key insight: let the LLM do the bookkeeping once, then compound that work.

When you add a new source document, DocuFlow reads it, extracts entities and concepts, integrates them into an existing wiki, updates cross-references, and flags contradictions. The next query is better because the wiki is richer — not because you did more chunking.

Introducing DocuFlow

DocuFlow is an MCP server (Model Context Protocol) that works with Claude, Copilot, Cursor, and any other MCP-compatible agent. It provides 15 tools organized into four groups:

Category	Tools	What they do
Code Extraction	`read_module`, `list_modules`, `write_spec`, `read_specs`	Scan files → extract classes, endpoints, DB tables, deps
Wiki Pipeline	`ingest_source`, `query_wiki`, `wiki_search`, `save_answer_as_page`, …	Build and query the living wiki
Health	`lint_wiki`, `get_schema_guidance`, `preview_generation`	Keep the wiki accurate and complete
Dependency Graph	`generate_dependency_graph`	Visual map of imports, shared tables, and coupling

Plus an 8-command CLI and a React web UI — all from a single npm install.

A Real Example: The TaskFlow API

Let me walk through a real project. I have a TypeScript REST API called TaskFlow — JWT auth, RBAC, PostgreSQL, Express. The kind of project where every new developer asks the same 10 questions.

Step 1: Initialize

cd taskflow-api
docuflow init

This creates .docuflow/ in your project, registers the MCP server in Claude/Copilot, and writes a CLAUDE.md that auto-loads when your agent starts.

Step 2: Write source documents

DocuFlow's wiki starts from your curated docs — markdown files you drop into .docuflow/sources/. These are the authoritative descriptions you write once:

.docuflow/sources/
├── overview.md         # What this project is, tech stack, env vars
├── auth-security.md    # JWT lifecycle, bcrypt, RBAC, rate limiting
├── architecture.md     # System diagram, request lifecycle, DB schema
├── api-reference.md    # REST endpoints with request/response examples
└── developer-guide.md  # Setup, conventions, deployment

Step 3: Sync

docuflow sync

DocuFlow ingests all 5 sources and generates 71 wiki pages — each entity, concept, and relationship gets its own page with cross-references:

✓ Ingested overview.md          → 12 pages (JWT Authentication, RBAC, Rate Limiter, …)
✓ Ingested auth-security.md     → 18 pages (Access Token, Refresh Token, bcrypt, …)
✓ Ingested architecture.md      → 16 pages (Connection Pool, Task State Machine, …)
✓ Ingested api-reference.md     → 14 pages (POST /auth/login, GET /tasks/:id, …)
✓ Ingested developer-guide.md   → 11 pages (Environment Setup, Deployment, …)
Health: 96/100 · 71 pages · 5 sources

Now every AI session that opens this project can immediately answer:

You: "Explain the token refresh flow."

Agent (reading wiki, not re-scanning files): "TaskFlow uses short-lived access tokens (15 min) paired with rotating refresh tokens (7 days). On expiry, the client POSTs to /auth/refresh with the refresh token in a httpOnly cookie. The server validates against the stored hash, issues a new access token, and rotates the refresh token. In production, a Redis denylist handles immediate revocation…"

The agent didn't read auth.ts. It read the wiki page for Refresh Token — which already has the context, cross-references to Access Token and JWT Authentication, and a link to the source document.

Step 4: The Web UI

docuflow ui

Opens http://localhost:48821 — a live React interface with six views:

Ask — Type any question, get a synthesized answer with source citations:

Ask view showing JWT auth question with synthesized answer

Wiki — Browse all 71 pages organized as Entities, Concepts, Syntheses, Timelines:

Wiki tree with entity pages listed

Graph — Your entire knowledge base as an interactive D3 force graph. Each node is a wiki page; edges are cross-references:

Graph view with color-coded node clusters, zoomed into auth cluster showing JWT Authentication hub

Health — Real-time quality score (96/100) with actionable issues:

Health dashboard showing score gauge and orphan page list

The UI auto-discovers all DocuFlow projects in ~/dev, ~/code, ~/Desktop. Drop a project picker in the corner, switch between all your codebases.

What AI Agents Can Do With It

Once DocuFlow is registered, your agent gets a new vocabulary. Instead of:

"Let me read your package.json, then src/auth.ts, then… can you paste your middleware?"

The agent uses MCP tools:

→ query_wiki("How does authentication work?")
← Synthesized answer: JWT with refresh rotation, bcrypt, RBAC middleware...
   Sources: [entity/jwt_authentication, entity/refresh_token, concept/rbac]

→ generate_dependency_graph({ focus: "auth" })
← auth.ts imports: jsonwebtoken, bcryptjs, pg
  Shared tables with users.ts: "users"
  Most connected: auth.ts (hub for 4 modules)

→ lint_wiki()
← Health: 96/100
  Issues: 3 orphan pages, 1 stale concept
  Recommendation: link "Rate Limiter" from "Security Headers" page

Knowledge compounds across sessions. If the agent answers a complex question and saves it back:

→ save_answer_as_page({ question: "What happens when a refresh token expires?",
                         answer: "...", category: "synthesis" })
← Saved: wiki/syntheses/what_happens_when_refresh_token_expires.md

That page is now part of the wiki. The next agent to ask gets a better, richer answer — without anyone repeating themselves.

The CLI at a Glance

docuflow init               # Init + MCP registration
docuflow init --interactive # Guided domain setup (Code/Research/Business/Personal)
docuflow status             # Wiki stats, health score, version
docuflow suggest            # 5 prioritized starting-point suggestions
docuflow sync               # Re-ingest all sources, rebuild index
docuflow sync --ai          # Sync + AI-powered doc generation
docuflow watch              # Background daemon — ingests new sources in <1s
docuflow review --staged    # Review staged git changes for issues
docuflow ui                 # Start web interface on port 48821

Multi-Language, Multi-Domain

DocuFlow's extraction engine is regex-based and language-agnostic. It works on TypeScript, JavaScript, Python, Go, Ruby, Java, C#, PHP, SQL — extracting classes, functions, imports, REST endpoints, DB tables, and environment variable references from any of them.

And it's not just for code. The domain system supports Research, Business, and Personal knowledge bases too. The schema adapts, the wiki page templates change, the suggest command gives domain-appropriate recommendations.

Under the Hood

DocuFlow is a standard MCP server built with @modelcontextprotocol/sdk. It runs as a local subprocess — Claude or Copilot connects to it via stdio, same as any other MCP tool. The web UI is a single Express server on port 48821 that serves both the React frontend and a REST API bridge to the same 15 MCP tools.

Everything lives in .docuflow/ in your project directory:

.docuflow/
├── sources/       ← Your curated inputs (immutable)
├── wiki/          ← LLM-generated pages (entities, concepts, syntheses, timelines)
├── specs/         ← Agent-written technical specs
├── schema.md      ← Domain config (customize the wiki structure)
├── index.md       ← Auto-maintained page catalog
└── log.md         ← Operation history

The wiki is plain markdown files on disk — git-trackable, diffable, portable. No database, no cloud service, no API keys required.

Getting Started in 5 Minutes

# Install globally
npm install -g @doquflow/cli

# Initialize your project
cd your-project
docuflow init

# Write your first source document
cat > .docuflow/sources/overview.md << 'EOF'
# My Project
What it does, the tech stack, key concepts...
EOF

# Ingest and start the UI
docuflow sync && docuflow ui

Then open Claude/Copilot and ask anything about your project. The wiki is already loaded.

What's Next

DocuFlow is at v1.5.1 and actively developed. On the roadmap:

Auto-sync on git hooks — wiki updates automatically on every commit
Team mode — shared wiki across a team, conflict-free merge
More MCP clients — first-class support for Windsurf, Zed, VS Code Chat

Try It

npm: @doquflow/cli and @doquflow/server
GitHub: doquflows/docuflow
Email: docuflow@sshabuj.com

If you're tired of re-explaining your codebase to an AI that forgot everything overnight, give DocuFlow a try. Your future self — and your future agent — will thank you.

Built with TypeScript, @modelcontextprotocol/sdk, D3, React 18, Express.

Plus: waymark, tracking the agent's actions

DEV Community