DEV Community

Linghua Jin
Linghua Jin

Posted on

I Built a Tiny MCP That Understands Your Code and Saves 70% Tokens

Every coding agent demo looks magical... until you point it at a real codebase. Then it either:

  • Chokes on context windows
  • Hallucinates around stale code
  • Or becomes so slow you might as well just grep

I hit this wall building AI workflows with large Rust/Python/TS repos, so I built something I actually wanted for my own stack: a super light-weight, AST-based embedded MCP that just works on your codebase. It's called cocoindex-code and it's already saving me ~70% tokens and a lot of waiting time.

If you're using Claude, Codex, Cursor, or any MCP-friendly coding agent, this post is for you.

The Core Idea: AST + Incremental Indexing

Most "code RAG" setups feel like infra projects: spin up a vector DB, write ETL, fight schema drift, tune chunking, maintain workers. Then you pray it all stays in sync.

cocoindex-code takes the opposite approach:

  • Embedded MCP: It runs locally as an MCP server, no separate DB to run or maintain.
  • AST-based indexing: It understands code structure via Tree-sitter, so you get meaningful chunks (functions, classes, blocks) instead of random 200-line windows.
  • Incremental updates: Built on top of the Rust-based CocoIndex engine, it only re-indexes changed files.
  • Real multi-language support: Python, JS/TS, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.

The goal: you ask an agent a question, it pulls precisely the code it needs, without blowing up your context window.

What You Get Out of the Box

Here's what you get by just adding the MCP:

  • Semantic code search tool: search(query, limit, offset, refresh_index) as an MCP tool.
  • Instant token savings: Because only relevant code chunks go into prompts, not entire files or folders.
  • Speed: Incremental indexing + Rust engine means updates feel near-instant on typical dev repos.
  • No-key local embeddings by default: Uses sentence-transformers/all-MiniLM-L6-v2 locally via SentenceTransformers.
  • Optional power-ups: Swap in any LiteLLM-supported embedding model (OpenAI, Gemini, Mistral, Voyage for code, Ollama, etc.).

This means you can go from "plain coding agent" to "coding agent that actually knows your codebase" in about a minute.

1-Minute Setup for Claude, Codex, and OpenCode

First, install uv if you don't have it yet:

curl -LsSf https://astral.sh/uv/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Claude

claude mcp add cocoindex-code \
  -- uvx --prerelease=explicit --with \
  "cocoindex>=1.0.0a16" \
  cocoindex-code@latest
Enter fullscreen mode Exit fullscreen mode

Codex

codex mcp add cocoindex-code \
  -- uvx --prerelease=explicit --with \
  "cocoindex>=1.0.0a16" \
  cocoindex-code@latest
Enter fullscreen mode Exit fullscreen mode

OpenCode

You can do it interactively:

opencode mcp add
# MCP server name: cocoindex-code
# type: local
# command:
# uvx --prerelease=explicit --with cocoindex>=1.0.0a16 cocoindex-code@latest
Enter fullscreen mode Exit fullscreen mode

That's it. Point your agent at your repo, and you now have semantic search over your codebase as an MCP tool.

How the search MCP Tool Works

Once connected, the MCP exposes a search tool:

search(
  query: str,        # natural language or code snippet
  limit: int = 10,   # 1-100
  offset: int = 0,   # pagination
  refresh_index: bool = True  # re-index before querying
)
Enter fullscreen mode Exit fullscreen mode

Each result comes back with:

  • File path
  • Language
  • Code content
  • Start/end line numbers
  • Similarity score

I've found three killer use cases:

  1. "Where is the actual implementation of X?" - when the repo has 5 similarly named functions.
  2. "Show me all the auth-related logic touching JWT refresh."
  3. "Find the code that matches this stack trace snippet."

Because the index is kept up to date incrementally, you can refactor, run tests, and immediately use the agent against the new code layout without re-running some giant offline job.

Supported Languages and Smart Defaults

cocoindex-code ships with a very practical language matrix:

C, C++, C#, CSS/SCSS, Go, HTML, Java, JavaScript/TypeScript/TSX, JSON/YAML/TOML, Kotlin, Markdown/MDX, Pascal, PHP, Python, R, Ruby, Rust, Scala, Solidity, SQL, Swift, XML

It also auto-excludes noisy directories like __pycache__, node_modules, target, dist, and vendored dependencies.

Root path is auto-discovered via .cocoindex_code/, .git/, or falling back to current working directory. In practice, you usually don't set any env vars at all - it just finds your repo root.

Embeddings: Start Free, Scale Later

Out of the box, the project uses a local SentenceTransformers model:

  • Default: sbert/sentence-transformers/all-MiniLM-L6-v2
  • No API key, no billing surprises, completely local.

If you want stronger semantic understanding for code-heavy repos, you can point COCOINDEX_CODE_EMBEDDING_MODEL to any LiteLLM-supported embedding model:

  • Ollama (local)
  • OpenAI / Azure OpenAI
  • Gemini
  • Mistral
  • Voyage (code-optimized)
  • Cohere
  • AWS Bedrock
  • Nebius

Basically: start with free local, upgrade only if/when you actually need it.

What About Huge / Enterprise Codebases?

Under the hood, cocoindex-code uses CocoIndex, a Rust-based indexing engine built for large-scale, incremental data workflows.

For big org setups, you can:

  • Share indexes across teammates instead of re-indexing on every machine.
  • Take advantage of features like branch dedupe to avoid duplicate work.
  • Run it as part of a larger data/indexing platform on top of CocoIndex.

If You Want to Try It, Here's the Ask

If this sounds useful, here's a small but meaningful way you can help:

  1. Star the repo: cocoindex-code and the underlying cocoindex.
  2. Try it on your main project (the messy one, not the toy one).
  3. Drop feedback, issues, or ideas in the GitHub repo.

I'm especially interested in:

  • Repos where existing "code RAG" tools failed you
  • Languages or frameworks you want better support for
  • Workflows where you want your coding agent to feel 10x more context-aware

If you do try it, let me know in the comments what stack you used it on - I'd love to feature a few real-world examples in a follow-up post.

Top comments (1)