I Built a 35-Tool MCP Server That Cut My AI Token Usage by 95%

yashpalsinhc — Thu, 26 Feb 2026 07:36:22 +0000

Every time I asked Claude to help me with a codebase, the same thing happened: it would read file after file, burn through 50K+ tokens just to understand the project structure, and then I'd hit the context limit before getting any real work done.

I built an MCP server to fix this. It analyzes a codebase once, extracts everything an AI agent needs — function behaviors, call graphs, DB queries, HTTP calls — and serves precise answers in 2-4K tokens instead of 50K+.

Here's how it works and what I learned building it.

The Problem: AI Agents Are Blind

When you point an AI agent at a codebase, it has no memory. Every session starts from scratch. It runs grep, reads files one by one, and builds a mental model — slowly, expensively, and incompletely.

For a medium-sized Go project (~100 files), a typical exploration burns:

50K+ tokens just to understand what functions exist
Multiple rounds of grep → read → grep → read
And it still misses cross-file relationships like call graphs

This isn't an AI problem. It's a context delivery problem.

The Solution: Analyze Once, Query Forever

I built MCP Repo Context Server — a Go server that speaks the Model Context Protocol and provides 35 specialized tools for codebase understanding.

The core idea: parse the codebase into structured data, then let the AI query exactly what it needs.

Architecture

┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  AI Agent    │────▶│  MCP Server      │────▶│  Storage Layer  │
│  (Claude)    │◀────│  (JSON-RPC/stdio) │◀────│  JSON + SQLite  │
└──────────────┘     └──────────────────┘     └─────────────────┘
                            │
                    ┌───────┼───────┐
                    ▼       ▼       ▼
              AST Parser  Vector  Call Graph
              (Go)        Search  Builder

Three layers make this work:

1. AST Parsing — I use Go's go/ast package to extract every function signature, its behavior (step-by-step), database queries, HTTP calls, error handling patterns, and side effects. This isn't regex matching — it's actual syntax tree traversal, so it captures things like wrapped errors, deferred calls, and goroutine launches.

2. Semantic Vector Search — Each function and type gets a 384-dimension TF-IDF embedding stored in SQLite. When the AI asks "find authentication code," it doesn't need an exact keyword match — it finds semantically similar functions. No external API calls needed; embeddings are computed locally.

3. Call Graph Extraction — The analyzer builds a complete call graph: who calls what, from which line, what type of call (direct, goroutine, deferred). This powers tools like get_callers and visualize_call_graph that generate Mermaid diagrams.

What the Tools Actually Do

Here's a sampling of the 35 tools, grouped by what problems they solve:

"What does this function do?"

get_function_context returns: behavior summary, execution steps, DB queries with actual SQL, HTTP calls with endpoints, error handling patterns, who calls it, what it calls. All extracted from AST, no AI needed.

"Find all database operations"

search_by_side_effect with effect: "db_query" returns every function that touches the database, with the actual queries. Also works for http_call, file_io, and logging.

"How does auth work in this project?"

search_by_concept with concept: "authentication" finds all auth-related functions across the repo. Powered by the semantic index, not keyword grep.

"I just edited a file"

refresh_file re-analyzes a single changed file in ~10ms, updating the stored context. No need to re-analyze the entire repo.

"Show me the call chain"

visualize_call_graph generates a Mermaid flowchart showing callers and callees at configurable depth.

The Token Math

Here's the real comparison from my daily usage:

Task	Explore Agent	MCP Server
Understand a function	~50K tokens	~4K tokens
Find related code	~30K tokens	~2-3K tokens
After editing a file	Full re-explore	~1-2K tokens
Natural language Q&A	Not possible	~8K tokens

That's a 10-25x reduction in token usage per query. Over a full development session, it's the difference between hitting context limits constantly and having a fast, responsive AI assistant.

Design Decisions That Mattered

Minimal Dependencies

The entire server has only 2 direct dependencies: go-git for Git operations and go-sqlite3 for vector storage. Every other feature — AST parsing, HTTP handling, JSON serialization — uses the Go standard library. This keeps the binary small, the supply chain minimal, and deployment trivial.

Local Embeddings Over API Calls

I chose TF-IDF embeddings computed locally instead of calling OpenAI's embedding API. The quality is sufficient for code search (function names and patterns are fairly distinctive), and it means the server works offline with zero latency. No API keys, no rate limits, no cost.

Progressive Disclosure

Search results return compact references by default. Each reference includes a detail_ref that the AI can call to expand. This means the AI gets a list of 20 matching functions in ~2K tokens and only fetches full details on the 2-3 it actually needs.

Per-Repo Locking

Analysis of different repos runs concurrently. Only operations on the same repo serialize. This was a deliberate choice over a global mutex — when you're working across multiple services, you don't want analyzing repo A to block queries on repo B.

What I'd Do Differently

Language support is limited. Right now, the deep AST analysis only works for Go. Other languages get a generic analyzer that extracts basic structure but misses behavior details. Adding tree-sitter based parsing for Python and TypeScript is the next step.

Transport is stdio only. The MCP spec supports HTTP/SSE transport, which would let the server run as a long-lived daemon shared across multiple AI sessions. Currently, each Claude Code session spawns its own server process.

What's Next

The current server handles single-repo analysis well, but the roadmap is about scaling this to organization-level intelligence. Here's what I'm building:

Cross-Service API Flow Tracing

This is the killer feature. When you ask "what happens when someone hits /login?", the server should trace the entire flow: request enters service A's LoginHandler, which calls service B's /auth/validate endpoint, which publishes to Kafka topic user.verified, which is consumed by service C's VerificationHandler.

This means teaching the analyzer to detect HTTP client calls and extract destination URLs, parse route registrations from frameworks like gorilla/mux, detect async message producers and consumers (Kafka, RabbitMQ, NATS), and then match them across repos to build a complete service-to-service flow graph. The result: two new tools — trace_api_flow for end-to-end request tracing and get_service_map for a bird's-eye view of how all services connect.

Static analysis for distributed tracing is interesting because it works on code that isn't deployed yet — no OpenTelemetry instrumentation needed.

Dependency Graph & Import Analysis

The server currently ignores go.mod files entirely. I'm adding proper module dependency parsing — direct and indirect dependencies, replace directives, import classification (stdlib vs internal vs external) — and a get_dependency_graph tool that shows how repos depend on each other with Mermaid visualization. This is the foundation for cross-repo features.

Organization-Level Features

Right now, repos are standalone. I'm adding an organization model that groups repos together, with org-wide semantic indexing and a search_org tool that combines keyword and vector search across an entire org using hybrid ranking (reciprocal rank fusion). The goal: ask "find authentication code" once and get results across all 50+ repos, ranked by relevance.

Agent Recipes

Instead of agents making 5-10 tool calls to understand a PR, they should make one call to analyze_pr_impact and get everything: changed function behaviors, callers affected, cross-service impact, dependency-level impact, and a risk assessment. I'm building pre-built recipes for the three most common agent workflows — PR impact analysis, API flow explanation, and architecture review — each designed to return everything an agent needs in a single call within an 8K token budget.

Plugin Interface

The analyzer is currently Go-only and the embedder is fixed. I'm adding plugin interfaces for both — AnalyzerPlugin for adding language support (TypeScript, Python) and EmbedderPlugin for swapping embedding models (evaluating Voyage Code-3, which benchmarks 16% better than OpenAI on code retrieval).

Service Layer & REST API

The MCP server currently runs as a local process per Claude Code session. I'm wrapping the core tools as a REST API with GitHub/GitLab webhook integration for auto-analysis on push events, multi-tenant storage for org isolation, and async analysis queuing. The goal: deploy once for an entire team, not per-developer.

Try It

The project is open source: github.com/yashpalsinhc/mcp-repo-context

To use it with Claude Code, add to your MCP config:

{
  "mcpServers": {
    "repo-context": {
      "command": "path/to/mcp-repo-context",
      "args": ["--data-dir", "~/.mcp-data"]
    }
  }
}

Then ask Claude to analyze your repo:

> Analyze my local project at /path/to/repo
> What does the CreateUser function do?
> Find all database operations
> Show me the call graph for HandleLogin

If you're building AI-powered developer tools, the MCP ecosystem is worth exploring. The protocol is simple (JSON-RPC over stdio), Go is a great fit for the server side, and the payoff — turning expensive, slow AI exploration into fast, precise queries — is real.

I use this server every day. It changed how I work with AI on code.

DEV Community: yashpalsinhc