Michael Amachree

Posted on Mar 19

DocShark: a local-first documentation MCP server for AI

#bunjs #typescript #opensource #ai

DocShark: documentation for AI, served locally

Most AI tools are only as good as the documentation they can reach.
When the docs are spread across websites, rendered client-side, or buried behind a maze of pages, the context you get back is often incomplete or stale.

That is the problem DocShark is built to solve.

DocShark is a fast, local-first Model Context Protocol server that scrapes, indexes, and serves documentation from any website. It turns documentation into a local knowledge base that your AI tools can search instantly, without depending on a cloud service or an API key.

Live site:

dsharkd.svelte-apps.me

GitHub: https://github.com/Michael-Obele/docshark

What DocShark does

At a high level, DocShark does four things well:

Crawls documentation websites.
Extracts the useful content and converts it to clean Markdown.
Breaks pages into context-aware chunks.
Makes the result searchable through MCP and the CLI.

The result is a local documentation layer that works with coding agents, desktop clients, and terminal workflows.

Why I built it

There are already tools that can fetch docs, but many of them only work for a narrow source type or rely on heavier infrastructure.

DocShark focuses on a simpler model:

documentation websites, not just GitHub repos
local storage, not a remote index
SQLite FTS5, not a hosted search backend
Bun-first tooling, not a large runtime stack
MCP compatibility, so AI clients can use it directly

That combination makes it useful both for individual developers and for people building agent workflows.

How DocShark compares

Context7 is the obvious comparison point because it also solves the "AI needs current documentation" problem. It is strong when you want a hosted documentation service that injects up-to-date library docs and examples into your prompt.

DocShark takes a different path. It is better when you want to index real documentation websites, keep everything local, avoid API keys and rate limits, and use one tool for both MCP and CLI workflows.

Here is the practical tradeoff:

Tool	Strengths	Limitations	Where DocShark wins
Context7	Fresh version-specific docs, code examples, MCP integration, polished onboarding	Cloud service, API key/rate-limit considerations, focused on supported libraries rather than arbitrary websites	DocShark is better if you want a local-first index for any documentation site and no external dependency
Docfork	Broad library coverage, up-to-date docs, open source, easy access to software library docs	Optimized for library documentation rather than arbitrary rendered documentation sites	DocShark is better for crawling and indexing any docs website, including custom or rendered docs
Deepcon	Strong documentation retrieval for AI workflows, cloud-hosted convenience	More service-oriented than local-first, and it is narrower in how you manage your own source set	DocShark is better if you want to own the index and control exactly what gets crawled and stored
GitMCP / GitHub repo tools	Great for repository-centric docs and code browsing	Best when the source of truth lives in GitHub, not when the docs are published on a separate site	DocShark is better for public docs sites, rendered pages, and documentation that is not tied to one repo
Per-library MCP servers	Very targeted, often simple to set up for one project	They do not scale well when you need to switch between many libraries	DocShark is better as a single general-purpose server for multiple sources

If you want the shortest summary: Context7 is a strong hosted documentation service, but DocShark is the better alternative for local-first workflows, broader website coverage, and users who want to keep the whole documentation layer under their control.

Core features

Any documentation site

DocShark is not limited to source repositories. It can crawl public documentation sites and index their rendered content, which makes it useful for modern docs that are built from multiple routes, dynamic pages, or generated content.

Smart extraction

The scraper is designed to pull out the main content and discard the noise. Navigation, sidebars, and other non-essential layout elements are removed so the indexed result is easier for an AI assistant to use.

Semantic chunking

Pages are split by heading structure so the search results preserve context. That matters because a search result is only useful if it still knows where it came from in the document.

SQLite + FTS5 search

DocShark uses SQLite with FTS5 for full-text search, which keeps the entire experience local and fast.

That gives you:

instant keyword search
offline access once content is indexed
no external search provider
no dependency on cloud APIs

JS-rendered site support

Many docs sites are not simple static HTML pages.
DocShark supports rendered documentation sites, so it can work with sites that rely on JavaScript for content delivery.

Polite crawling

The crawler respects site structure and includes rate limiting and robots-aware behavior so it is safer to use against public documentation sites.

MCP server + CLI

DocShark exposes the same knowledge base through both an MCP server and a Bun-first CLI. That gives you two ways to work:

agent integrations for AI tools
direct terminal commands for indexing, searching, and maintenance

The workflow

Using DocShark usually looks like this:

1. Add a documentation site

Point DocShark at a docs URL to begin crawling:

bunx docshark add https://svelte.dev/docs

2. Search the indexed content

Once the content is indexed, you can search for the exact topic you need.

bunx docshark search "query syntax"

3. Connect your AI tool

Because DocShark speaks MCP, you can connect it to compatible clients and let the assistant query your documentation library directly.

{
  "mcpServers": {
    "docshark": {
      "command": "bunx",
      "args": ["-y", "docshark", "start", "--stdio"]
    }
  }
}

CLI features

DocShark includes a practical set of commands for day-to-day use:

Command	What it does
`start`	Runs the MCP server in HTTP or STDIO mode
`add`	Adds a new documentation source and starts crawling
`rename`	Renames an existing library without changing content
`search`	Searches the indexed documentation
`list`	Lists indexed libraries and their status
`refresh`	Re-crawls an existing library
`remove`	Deletes a library and its indexed content
`get`	Returns the full markdown content for a page
`info`	Shows details and indexed pages for a library
`update`	Checks for or installs a newer Bun release

That command surface makes the project useful even if you never connect it to an AI client.

MCP tools

On the protocol side, DocShark exposes a compact but useful toolset:

manage_library to add, rename, refresh, inspect, or remove a library
search_docs to search across indexed content
list_libraries to inspect what is available
get_doc_page to retrieve a full page in markdown form

Those tools are designed to map naturally to how people actually work with documentation.

What is inside the stack

DocShark keeps the stack intentionally small:

Bun for runtime and CLI execution
SQLite for persistence
FTS5 for search
Readability.js for extracting the main content
Turndown with GFM support for Markdown conversion
Valibot for validation
CAC for the CLI parser and command dispatch
TMCP for the protocol server
A shared library service that powers both the CLI and MCP server

The current MCP surface is intentionally compact:

manage_library for add, rename, refresh, inspect, and remove workflows
search_docs for ranked search
list_libraries for discovery
get_doc_page for full-page retrieval

That choice keeps the project local, fast, and easier to reason about than a larger server stack.

Who it is for

DocShark is a good fit if you:

use AI coding assistants regularly
want documentation access inside your editor or terminal
work with documentation sites that are not simple markdown repos
prefer local tools over hosted indexing services
want one general-purpose MCP server instead of many per-library integrations

Try it out

If you want to see the project in action, open the live site and source repo above.

Star DocShark on GitHub

Closing thought

DocShark is a small idea with a practical goal: make documentation available where AI tools already work, without handing your context over to a cloud service.

If you spend time jumping between docs tabs, terminal commands, and assistant prompts, it is the kind of tool that quietly removes friction from the whole workflow.

Top comments (2)

Ceyhun Aksan • Mar 19

@dev_michael, solid choice going with SQLite + FTS5 for local-first search. One limitation I hit building a similar MCP server: keyword search alone misses semantic matches (e.g. "state handling" vs "state management"). Adding vector embeddings with RRF merge improved recall a lot. Have you considered a hybrid approach?

Michael Amachree • Mar 25

I actually have, and I plan on implementing something of that nature, but I am still looking at the best way to do it