Building a Skill/MCP to Access Any Open-Source Repo's Code and Docs

#ai #opensource #programming #showdev

I wanted to build a Skill/MCP that allows AI to access the source code and documentation of all open-source repositories to:

Ask AI questions about the underlying implementation of open-source projects, similar to Deepwiki, but using my own AI agent.
Let AI reference the latest library documentation and source code while writing code, eliminating hallucinated or deprecated APIs.

Let's dive into how it is built!

Data Source

There are two main options for retrieving source code from public GitHub repositories:

Using the GitHub API.
Cloning the repositories locally with git clone.

I chose the latter because:

Searching code via the GitHub Search API is relatively slow and has strict rate limits.
It's difficult to implement directory tree listings efficiently using the GitHub API.
While the initial git clone takes some time, subsequent requests respond blazingly fast.

Reducing Data Volume

We don't want to clone entire repositories, as that would be too slow and consume too much storage. Instead, we can use git partial clone:

git clone --depth 1 --filter=blob:limit=100k --no-checkout https://github.com/user/repo.git

--depth 1: Shallow clone, fetching only the latest commit.
--filter=blob:limit=100k: Filters out files larger than 100kb, which reduces the download size by up to 90%.
--no-checkout: Skips checking out the working directory. All data stays in the .git folder, further saving space.

Because there is no working tree, all read operations are performed directly via git commands:

List files: git ls-tree -r HEAD --name-only
Read a file: git cat-file -p HEAD:src/main.js
Search code: git grep "function login" HEAD

Associating Documentation

Now we can read the source code. However, we also want the AI to read the documentation. If the docs are in the same repository, it's easy. But often, the documentation lives in an entirely separate repository.

I used a simple approach to identify these doc repos: list all repositories under the same GitHub organization, filter the names containing doc, and then have an LLM determine if there's a matching documentation repository based on the repo name and description. A cheap and fast model is more than enough for this task. AI makes this incredibly simple!

This links the documentation repository so we can automatically provide it alongside the main repo whenever the AI accesses it.

At this point, the core tool is basically complete. I won't go into detail here on how to wrap it into a Skill or MCP server.

Try It

I've deployed this tool on my server and provided ready-to-use Skill and MCP endpoints.

For coding agents like OpenCode, Codex, Cursor, and Copilot, just run the following command to add the Skill:

npx skills add https://github.com/NitroRCr/gread --skill gread

For AI chat applications and other MCP clients, you can use the MCP endpoint:

https://api.gread.dev/mcp

JSON configuration reference:

{
  "mcpServers": {
    "gread": {
      "type": "streamableHttp",
      "url": "https://api.gread.dev/mcp"
    }
  }
}

Learn more:

DEV Community

Building a Skill/MCP to Access Any Open-Source Repo's Code and Docs

Data Source

Reducing Data Volume

Associating Documentation

Try It

Top comments (0)