DocShark: documentation for AI, served locally
Most AI tools are only as good as the documentation they can reach.
When the docs are spread across websites, rendered client-side, or buried behind a maze of pages, the context you get back is often incomplete or stale.
That is the problem DocShark is built to solve.
DocShark is a fast, local-first Model Context Protocol server that scrapes, indexes, and serves documentation from any website. It turns documentation into a local knowledge base that your AI tools can search instantly, without depending on a cloud service or an API key.
Live site:
GitHub: https://github.com/Michael-Obele/docshark
What DocShark does
At a high level, DocShark does four things well:
- Crawls documentation websites.
- Extracts the useful content and converts it to clean Markdown.
- Breaks pages into context-aware chunks.
- Makes the result searchable through MCP and the CLI.
The result is a local documentation layer that works with coding agents, desktop clients, and terminal workflows.
Why I built it
There are already tools that can fetch docs, but many of them only work for a narrow source type or rely on heavier infrastructure.
DocShark focuses on a simpler model:
- documentation websites, not just GitHub repos
- local storage, not a remote index
- SQLite FTS5, not a hosted search backend
- Bun-first tooling, not a large runtime stack
- MCP compatibility, so AI clients can use it directly
That combination makes it useful both for individual developers and for people building agent workflows.
How DocShark compares
Context7 is the obvious comparison point because it also solves the "AI needs current documentation" problem. It is strong when you want a hosted documentation service that injects up-to-date library docs and examples into your prompt.
DocShark takes a different path. It is better when you want to index real documentation websites, keep everything local, avoid API keys and rate limits, and use one tool for both MCP and CLI workflows.
Here is the practical tradeoff:
| Tool | Strengths | Limitations | Where DocShark wins |
|---|---|---|---|
| Context7 | Fresh version-specific docs, code examples, MCP integration, polished onboarding | Cloud service, API key/rate-limit considerations, focused on supported libraries rather than arbitrary websites | DocShark is better if you want a local-first index for any documentation site and no external dependency |
| Docfork | Broad library coverage, up-to-date docs, open source, easy access to software library docs | Optimized for library documentation rather than arbitrary rendered documentation sites | DocShark is better for crawling and indexing any docs website, including custom or rendered docs |
| Deepcon | Strong documentation retrieval for AI workflows, cloud-hosted convenience | More service-oriented than local-first, and it is narrower in how you manage your own source set | DocShark is better if you want to own the index and control exactly what gets crawled and stored |
| GitMCP / GitHub repo tools | Great for repository-centric docs and code browsing | Best when the source of truth lives in GitHub, not when the docs are published on a separate site | DocShark is better for public docs sites, rendered pages, and documentation that is not tied to one repo |
| Per-library MCP servers | Very targeted, often simple to set up for one project | They do not scale well when you need to switch between many libraries | DocShark is better as a single general-purpose server for multiple sources |
If you want the shortest summary: Context7 is a strong hosted documentation service, but DocShark is the better alternative for local-first workflows, broader website coverage, and users who want to keep the whole documentation layer under their control.
Core features
The workflow
Using DocShark usually looks like this:
1. Add a documentation site
Point DocShark at a docs URL to begin crawling:
bunx docshark add https://svelte.dev/docs
2. Search the indexed content
Once the content is indexed, you can search for the exact topic you need.
bunx docshark search "query syntax"
3. Connect your AI tool
Because DocShark speaks MCP, you can connect it to compatible clients and let the assistant query your documentation library directly.
{
"mcpServers": {
"docshark": {
"command": "bunx",
"args": ["-y", "docshark", "start", "--stdio"]
}
}
}
CLI features
DocShark includes a practical set of commands for day-to-day use:
| Command | What it does |
|---|---|
start |
Runs the MCP server in HTTP or STDIO mode |
add |
Adds a new documentation source and starts crawling |
rename |
Renames an existing library without changing content |
search |
Searches the indexed documentation |
list |
Lists indexed libraries and their status |
refresh |
Re-crawls an existing library |
remove |
Deletes a library and its indexed content |
get |
Returns the full markdown content for a page |
info |
Shows details and indexed pages for a library |
update |
Checks for or installs a newer Bun release |
That command surface makes the project useful even if you never connect it to an AI client.
MCP tools
On the protocol side, DocShark exposes a compact but useful toolset:
-
manage_libraryto add, rename, refresh, inspect, or remove a library -
search_docsto search across indexed content -
list_librariesto inspect what is available -
get_doc_pageto retrieve a full page in markdown form
Those tools are designed to map naturally to how people actually work with documentation.
What is inside the stack
DocShark keeps the stack intentionally small:
- Bun for runtime and CLI execution
- SQLite for persistence
- FTS5 for search
- Readability.js for extracting the main content
- Turndown with GFM support for Markdown conversion
- Valibot for validation
- CAC for the CLI parser and command dispatch
- TMCP for the protocol server
- A shared library service that powers both the CLI and MCP server
The current MCP surface is intentionally compact:
-
manage_libraryfor add, rename, refresh, inspect, and remove workflows -
search_docsfor ranked search -
list_librariesfor discovery -
get_doc_pagefor full-page retrieval
That choice keeps the project local, fast, and easier to reason about than a larger server stack.
Who it is for
DocShark is a good fit if you:
- use AI coding assistants regularly
- want documentation access inside your editor or terminal
- work with documentation sites that are not simple markdown repos
- prefer local tools over hosted indexing services
- want one general-purpose MCP server instead of many per-library integrations
Try it out
If you want to see the project in action, open the live site and source repo above.
Closing thought
DocShark is a small idea with a practical goal: make documentation available where AI tools already work, without handing your context over to a cloud service.
If you spend time jumping between docs tabs, terminal commands, and assistant prompts, it is the kind of tool that quietly removes friction from the whole workflow.
Top comments (0)