DEV Community

Cover image for DocShark: a local-first documentation MCP server for AI
Michael Amachree
Michael Amachree Subscriber

Posted on

DocShark: a local-first documentation MCP server for AI

DocShark: documentation for AI, served locally

Most AI tools are only as good as the documentation they can reach.
When the docs are spread across websites, rendered client-side, or buried behind a maze of pages, the context you get back is often incomplete or stale.

That is the problem DocShark is built to solve.

DocShark is a fast, local-first Model Context Protocol server that scrapes, indexes, and serves documentation from any website. It turns documentation into a local knowledge base that your AI tools can search instantly, without depending on a cloud service or an API key.

Live site:

GitHub: https://github.com/Michael-Obele/docshark

What DocShark does

At a high level, DocShark does four things well:

  1. Crawls documentation websites.
  2. Extracts the useful content and converts it to clean Markdown.
  3. Breaks pages into context-aware chunks.
  4. Makes the result searchable through MCP and the CLI.

The result is a local documentation layer that works with coding agents, desktop clients, and terminal workflows.

Why I built it

There are already tools that can fetch docs, but many of them only work for a narrow source type or rely on heavier infrastructure.

DocShark focuses on a simpler model:

  • documentation websites, not just GitHub repos
  • local storage, not a remote index
  • SQLite FTS5, not a hosted search backend
  • Bun-first tooling, not a large runtime stack
  • MCP compatibility, so AI clients can use it directly

That combination makes it useful both for individual developers and for people building agent workflows.

How DocShark compares

Context7 is the obvious comparison point because it also solves the "AI needs current documentation" problem. It is strong when you want a hosted documentation service that injects up-to-date library docs and examples into your prompt.

DocShark takes a different path. It is better when you want to index real documentation websites, keep everything local, avoid API keys and rate limits, and use one tool for both MCP and CLI workflows.

Here is the practical tradeoff:

Tool Strengths Limitations Where DocShark wins
Context7 Fresh version-specific docs, code examples, MCP integration, polished onboarding Cloud service, API key/rate-limit considerations, focused on supported libraries rather than arbitrary websites DocShark is better if you want a local-first index for any documentation site and no external dependency
Docfork Broad library coverage, up-to-date docs, open source, easy access to software library docs Optimized for library documentation rather than arbitrary rendered documentation sites DocShark is better for crawling and indexing any docs website, including custom or rendered docs
Deepcon Strong documentation retrieval for AI workflows, cloud-hosted convenience More service-oriented than local-first, and it is narrower in how you manage your own source set DocShark is better if you want to own the index and control exactly what gets crawled and stored
GitMCP / GitHub repo tools Great for repository-centric docs and code browsing Best when the source of truth lives in GitHub, not when the docs are published on a separate site DocShark is better for public docs sites, rendered pages, and documentation that is not tied to one repo
Per-library MCP servers Very targeted, often simple to set up for one project They do not scale well when you need to switch between many libraries DocShark is better as a single general-purpose server for multiple sources

If you want the shortest summary: Context7 is a strong hosted documentation service, but DocShark is the better alternative for local-first workflows, broader website coverage, and users who want to keep the whole documentation layer under their control.

Core features

Any documentation site

DocShark is not limited to source repositories. It can crawl public documentation sites and index their rendered content, which makes it useful for modern docs that are built from multiple routes, dynamic pages, or generated content.

Smart extraction

The scraper is designed to pull out the main content and discard the noise. Navigation, sidebars, and other non-essential layout elements are removed so the indexed result is easier for an AI assistant to use.

Semantic chunking

Pages are split by heading structure so the search results preserve context. That matters because a search result is only useful if it still knows where it came from in the document.

SQLite + FTS5 search

DocShark uses SQLite with FTS5 for full-text search, which keeps the entire experience local and fast.

That gives you:

  • instant keyword search
  • offline access once content is indexed
  • no external search provider
  • no dependency on cloud APIs

JS-rendered site support

Many docs sites are not simple static HTML pages.
DocShark supports rendered documentation sites, so it can work with sites that rely on JavaScript for content delivery.

Polite crawling

The crawler respects site structure and includes rate limiting and robots-aware behavior so it is safer to use against public documentation sites.

MCP server + CLI

DocShark exposes the same knowledge base through both an MCP server and a Bun-first CLI. That gives you two ways to work:

  • agent integrations for AI tools
  • direct terminal commands for indexing, searching, and maintenance

The workflow

Using DocShark usually looks like this:

1. Add a documentation site

Point DocShark at a docs URL to begin crawling:

bunx docshark add https://svelte.dev/docs
Enter fullscreen mode Exit fullscreen mode

2. Search the indexed content

Once the content is indexed, you can search for the exact topic you need.

bunx docshark search "query syntax"
Enter fullscreen mode Exit fullscreen mode

3. Connect your AI tool

Because DocShark speaks MCP, you can connect it to compatible clients and let the assistant query your documentation library directly.

{
  "mcpServers": {
    "docshark": {
      "command": "bunx",
      "args": ["-y", "docshark", "start", "--stdio"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

CLI features

DocShark includes a practical set of commands for day-to-day use:

Command What it does
start Runs the MCP server in HTTP or STDIO mode
add Adds a new documentation source and starts crawling
rename Renames an existing library without changing content
search Searches the indexed documentation
list Lists indexed libraries and their status
refresh Re-crawls an existing library
remove Deletes a library and its indexed content
get Returns the full markdown content for a page
info Shows details and indexed pages for a library
update Checks for or installs a newer Bun release

That command surface makes the project useful even if you never connect it to an AI client.

MCP tools

On the protocol side, DocShark exposes a compact but useful toolset:

  • manage_library to add, rename, refresh, inspect, or remove a library
  • search_docs to search across indexed content
  • list_libraries to inspect what is available
  • get_doc_page to retrieve a full page in markdown form

Those tools are designed to map naturally to how people actually work with documentation.

What is inside the stack

DocShark keeps the stack intentionally small:

  • Bun for runtime and CLI execution
  • SQLite for persistence
  • FTS5 for search
  • Readability.js for extracting the main content
  • Turndown with GFM support for Markdown conversion
  • Valibot for validation
  • CAC for the CLI parser and command dispatch
  • TMCP for the protocol server
  • A shared library service that powers both the CLI and MCP server

The current MCP surface is intentionally compact:

  • manage_library for add, rename, refresh, inspect, and remove workflows
  • search_docs for ranked search
  • list_libraries for discovery
  • get_doc_page for full-page retrieval

That choice keeps the project local, fast, and easier to reason about than a larger server stack.

Who it is for

DocShark is a good fit if you:

  • use AI coding assistants regularly
  • want documentation access inside your editor or terminal
  • work with documentation sites that are not simple markdown repos
  • prefer local tools over hosted indexing services
  • want one general-purpose MCP server instead of many per-library integrations

Try it out

If you want to see the project in action, open the live site and source repo above.

Star DocShark on GitHub

Closing thought

DocShark is a small idea with a practical goal: make documentation available where AI tools already work, without handing your context over to a cloud service.

If you spend time jumping between docs tabs, terminal commands, and assistant prompts, it is the kind of tool that quietly removes friction from the whole workflow.

Top comments (0)