<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yashpalsinhc</title>
    <description>The latest articles on DEV Community by yashpalsinhc (@yashpalsinhc).</description>
    <link>https://dev.to/yashpalsinhc</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3793896%2Fe1329b18-c73e-407d-a389-feb142f84514.png</url>
      <title>DEV Community: yashpalsinhc</title>
      <link>https://dev.to/yashpalsinhc</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yashpalsinhc"/>
    <language>en</language>
    <item>
      <title>I Built a 35-Tool MCP Server That Cut My AI Token Usage by 95%</title>
      <dc:creator>yashpalsinhc</dc:creator>
      <pubDate>Thu, 26 Feb 2026 07:36:22 +0000</pubDate>
      <link>https://dev.to/yashpalsinhc/i-built-a-35-tool-mcp-server-that-cut-my-ai-token-usage-by-95-1b4</link>
      <guid>https://dev.to/yashpalsinhc/i-built-a-35-tool-mcp-server-that-cut-my-ai-token-usage-by-95-1b4</guid>
      <description>&lt;p&gt;Every time I asked Claude to help me with a codebase, the same thing happened: it would read file after file, burn through 50K+ tokens just to understand the project structure, and then I'd hit the context limit before getting any real work done.&lt;/p&gt;

&lt;p&gt;I built an MCP server to fix this. It analyzes a codebase once, extracts everything an AI agent needs — function behaviors, call graphs, DB queries, HTTP calls — and serves precise answers in 2-4K tokens instead of 50K+.&lt;/p&gt;

&lt;p&gt;Here's how it works and what I learned building it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: AI Agents Are Blind
&lt;/h2&gt;

&lt;p&gt;When you point an AI agent at a codebase, it has no memory. Every session starts from scratch. It runs &lt;code&gt;grep&lt;/code&gt;, reads files one by one, and builds a mental model — slowly, expensively, and incompletely.&lt;/p&gt;

&lt;p&gt;For a medium-sized Go project (~100 files), a typical exploration burns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50K+ tokens&lt;/strong&gt; just to understand what functions exist&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple rounds&lt;/strong&gt; of grep → read → grep → read&lt;/li&gt;
&lt;li&gt;And it still misses cross-file relationships like call graphs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't an AI problem. It's a &lt;strong&gt;context delivery&lt;/strong&gt; problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Analyze Once, Query Forever
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/yashpalsinhc/mcp-repo-context" rel="noopener noreferrer"&gt;MCP Repo Context Server&lt;/a&gt; — a Go server that speaks the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; and provides 35 specialized tools for codebase understanding.&lt;/p&gt;

&lt;p&gt;The core idea: &lt;strong&gt;parse the codebase into structured data, then let the AI query exactly what it needs.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  AI Agent    │────▶│  MCP Server      │────▶│  Storage Layer  │
│  (Claude)    │◀────│  (JSON-RPC/stdio) │◀────│  JSON + SQLite  │
└──────────────┘     └──────────────────┘     └─────────────────┘
                            │
                    ┌───────┼───────┐
                    ▼       ▼       ▼
              AST Parser  Vector  Call Graph
              (Go)        Search  Builder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Three layers make this work:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. AST Parsing&lt;/strong&gt; — I use Go's &lt;code&gt;go/ast&lt;/code&gt; package to extract every function signature, its behavior (step-by-step), database queries, HTTP calls, error handling patterns, and side effects. This isn't regex matching — it's actual syntax tree traversal, so it captures things like wrapped errors, deferred calls, and goroutine launches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Semantic Vector Search&lt;/strong&gt; — Each function and type gets a 384-dimension TF-IDF embedding stored in SQLite. When the AI asks "find authentication code," it doesn't need an exact keyword match — it finds semantically similar functions. No external API calls needed; embeddings are computed locally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Call Graph Extraction&lt;/strong&gt; — The analyzer builds a complete call graph: who calls what, from which line, what type of call (direct, goroutine, deferred). This powers tools like &lt;code&gt;get_callers&lt;/code&gt; and &lt;code&gt;visualize_call_graph&lt;/code&gt; that generate Mermaid diagrams.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Tools Actually Do
&lt;/h2&gt;

&lt;p&gt;Here's a sampling of the 35 tools, grouped by what problems they solve:&lt;/p&gt;

&lt;h3&gt;
  
  
  "What does this function do?"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;get_function_context&lt;/code&gt; returns: behavior summary, execution steps, DB queries with actual SQL, HTTP calls with endpoints, error handling patterns, who calls it, what it calls. All extracted from AST, no AI needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Find all database operations"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;search_by_side_effect&lt;/code&gt; with &lt;code&gt;effect: "db_query"&lt;/code&gt; returns every function that touches the database, with the actual queries. Also works for &lt;code&gt;http_call&lt;/code&gt;, &lt;code&gt;file_io&lt;/code&gt;, and &lt;code&gt;logging&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  "How does auth work in this project?"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;search_by_concept&lt;/code&gt; with &lt;code&gt;concept: "authentication"&lt;/code&gt; finds all auth-related functions across the repo. Powered by the semantic index, not keyword grep.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I just edited a file"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;refresh_file&lt;/code&gt; re-analyzes a single changed file in ~10ms, updating the stored context. No need to re-analyze the entire repo.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Show me the call chain"
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;visualize_call_graph&lt;/code&gt; generates a Mermaid flowchart showing callers and callees at configurable depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Token Math
&lt;/h2&gt;

&lt;p&gt;Here's the real comparison from my daily usage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Explore Agent&lt;/th&gt;
&lt;th&gt;MCP Server&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Understand a function&lt;/td&gt;
&lt;td&gt;~50K tokens&lt;/td&gt;
&lt;td&gt;~4K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Find related code&lt;/td&gt;
&lt;td&gt;~30K tokens&lt;/td&gt;
&lt;td&gt;~2-3K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After editing a file&lt;/td&gt;
&lt;td&gt;Full re-explore&lt;/td&gt;
&lt;td&gt;~1-2K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Natural language Q&amp;amp;A&lt;/td&gt;
&lt;td&gt;Not possible&lt;/td&gt;
&lt;td&gt;~8K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's a &lt;strong&gt;10-25x reduction&lt;/strong&gt; in token usage per query. Over a full development session, it's the difference between hitting context limits constantly and having a fast, responsive AI assistant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Decisions That Mattered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Minimal Dependencies
&lt;/h3&gt;

&lt;p&gt;The entire server has only &lt;strong&gt;2 direct dependencies&lt;/strong&gt;: &lt;code&gt;go-git&lt;/code&gt; for Git operations and &lt;code&gt;go-sqlite3&lt;/code&gt; for vector storage. Every other feature — AST parsing, HTTP handling, JSON serialization — uses the Go standard library. This keeps the binary small, the supply chain minimal, and deployment trivial.&lt;/p&gt;

&lt;h3&gt;
  
  
  Local Embeddings Over API Calls
&lt;/h3&gt;

&lt;p&gt;I chose TF-IDF embeddings computed locally instead of calling OpenAI's embedding API. The quality is sufficient for code search (function names and patterns are fairly distinctive), and it means the server works offline with zero latency. No API keys, no rate limits, no cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Progressive Disclosure
&lt;/h3&gt;

&lt;p&gt;Search results return compact references by default. Each reference includes a &lt;code&gt;detail_ref&lt;/code&gt; that the AI can call to expand. This means the AI gets a list of 20 matching functions in ~2K tokens and only fetches full details on the 2-3 it actually needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Per-Repo Locking
&lt;/h3&gt;

&lt;p&gt;Analysis of different repos runs concurrently. Only operations on the same repo serialize. This was a deliberate choice over a global mutex — when you're working across multiple services, you don't want analyzing repo A to block queries on repo B.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Language support is limited.&lt;/strong&gt; Right now, the deep AST analysis only works for Go. Other languages get a generic analyzer that extracts basic structure but misses behavior details. Adding tree-sitter based parsing for Python and TypeScript is the next step.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transport is stdio only.&lt;/strong&gt; The MCP spec supports HTTP/SSE transport, which would let the server run as a long-lived daemon shared across multiple AI sessions. Currently, each Claude Code session spawns its own server process.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current server handles single-repo analysis well, but the roadmap is about scaling this to organization-level intelligence. Here's what I'm building:&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Service API Flow Tracing
&lt;/h3&gt;

&lt;p&gt;This is the killer feature. When you ask "what happens when someone hits /login?", the server should trace the entire flow: request enters service A's LoginHandler, which calls service B's /auth/validate endpoint, which publishes to Kafka topic &lt;code&gt;user.verified&lt;/code&gt;, which is consumed by service C's VerificationHandler.&lt;/p&gt;

&lt;p&gt;This means teaching the analyzer to detect HTTP client calls and extract destination URLs, parse route registrations from frameworks like gorilla/mux, detect async message producers and consumers (Kafka, RabbitMQ, NATS), and then match them across repos to build a complete service-to-service flow graph. The result: two new tools — &lt;code&gt;trace_api_flow&lt;/code&gt; for end-to-end request tracing and &lt;code&gt;get_service_map&lt;/code&gt; for a bird's-eye view of how all services connect.&lt;/p&gt;

&lt;p&gt;Static analysis for distributed tracing is interesting because it works on code that isn't deployed yet — no OpenTelemetry instrumentation needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependency Graph &amp;amp; Import Analysis
&lt;/h3&gt;

&lt;p&gt;The server currently ignores &lt;code&gt;go.mod&lt;/code&gt; files entirely. I'm adding proper module dependency parsing — direct and indirect dependencies, replace directives, import classification (stdlib vs internal vs external) — and a &lt;code&gt;get_dependency_graph&lt;/code&gt; tool that shows how repos depend on each other with Mermaid visualization. This is the foundation for cross-repo features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organization-Level Features
&lt;/h3&gt;

&lt;p&gt;Right now, repos are standalone. I'm adding an organization model that groups repos together, with org-wide semantic indexing and a &lt;code&gt;search_org&lt;/code&gt; tool that combines keyword and vector search across an entire org using hybrid ranking (reciprocal rank fusion). The goal: ask "find authentication code" once and get results across all 50+ repos, ranked by relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agent Recipes
&lt;/h3&gt;

&lt;p&gt;Instead of agents making 5-10 tool calls to understand a PR, they should make one call to &lt;code&gt;analyze_pr_impact&lt;/code&gt; and get everything: changed function behaviors, callers affected, cross-service impact, dependency-level impact, and a risk assessment. I'm building pre-built recipes for the three most common agent workflows — PR impact analysis, API flow explanation, and architecture review — each designed to return everything an agent needs in a single call within an 8K token budget.&lt;/p&gt;

&lt;h3&gt;
  
  
  Plugin Interface
&lt;/h3&gt;

&lt;p&gt;The analyzer is currently Go-only and the embedder is fixed. I'm adding plugin interfaces for both — &lt;code&gt;AnalyzerPlugin&lt;/code&gt; for adding language support (TypeScript, Python) and &lt;code&gt;EmbedderPlugin&lt;/code&gt; for swapping embedding models (evaluating Voyage Code-3, which benchmarks 16% better than OpenAI on code retrieval).&lt;/p&gt;

&lt;h3&gt;
  
  
  Service Layer &amp;amp; REST API
&lt;/h3&gt;

&lt;p&gt;The MCP server currently runs as a local process per Claude Code session. I'm wrapping the core tools as a REST API with GitHub/GitLab webhook integration for auto-analysis on push events, multi-tenant storage for org isolation, and async analysis queuing. The goal: deploy once for an entire team, not per-developer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The project is open source: &lt;a href="https://github.com/yashpalsinhc/mcp-repo-context" rel="noopener noreferrer"&gt;github.com/yashpalsinhc/mcp-repo-context&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To use it with Claude Code, add to your MCP config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"repo-context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"path/to/mcp-repo-context"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"--data-dir"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"~/.mcp-data"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask Claude to analyze your repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; Analyze my local project at /path/to/repo
&amp;gt; What does the CreateUser function do?
&amp;gt; Find all database operations
&amp;gt; Show me the call graph for HandleLogin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;If you're building AI-powered developer tools, the MCP ecosystem is worth exploring. The protocol is simple (JSON-RPC over stdio), Go is a great fit for the server side, and the payoff — turning expensive, slow AI exploration into fast, precise queries — is real.&lt;/p&gt;

&lt;p&gt;I use this server every day. It changed how I work with AI on code.&lt;/p&gt;

</description>
      <category>go</category>
      <category>ai</category>
      <category>mcp</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
