DEV Community

Cover image for Building a Local-First RAG Engine for AI Coding Assistants
Niraj Kumar
Niraj Kumar

Posted on

Building a Local-First RAG Engine for AI Coding Assistants

AI coding assistants have a context problem.

They can generate code, explain algorithms, refactor functions. But ask Claude or Cursor "where is authentication handled in this codebase?" and you'll get a guess at best.

The assistant doesn't actually know your code. It sees one file at a time. No persistent memory. No understanding of how components connect.

This is the RAG problem — Retrieval Augmented Generation. The AI needs relevant context to give useful answers. Someone has to find that context first.

The Current Options

Cloud indexing services upload your codebase to external servers. They build searchable indexes, handle embeddings, serve results via API. Fast and convenient — until you remember that's proprietary code sitting on infrastructure you don't control.

IDE-specific solutions like GitHub Copilot work well but lock you into their ecosystem. Switch tools, lose your indexed context.

Self-hosted RAG pipelines usually mean Python, a dozen dependencies, vector database setup, and configuration that breaks between machines. Great for experimentation, painful for daily use.

None of these felt right for how I actually work.

What I'm Building

AmanMCP is a local-first search engine for codebases.

  • Runs entirely on your machine
  • Single binary, zero dependencies
  • Works with any MCP-compatible assistant (Claude Code, Cursor, others)
  • Your code never leaves your laptop

MCP is the Model Context Protocol — an open standard for connecting AI assistants to external tools and data sources. AmanMCP implements this protocol, so any compatible client can use it without custom integration.

How It Actually Works

Hybrid Search with Query Classification

Most code search tools use either keyword matching (grep-style) or vector similarity (semantic search). Both have tradeoffs.

Keyword search excels at exact matches. Looking for ERR_CONNECTION_REFUSED? Keyword search finds it instantly. But ask "how does error handling work?" and keyword search struggles.

Vector search understands meaning. It knows "authentication" and "login verification" are related concepts. But it can miss exact technical terms, especially uncommon ones.

AmanMCP uses both — and automatically adjusts the balance based on your query.

Query Classifier Weights

The classifier examines query structure — presence of error codes, camelCase identifiers, natural language patterns — and sets weights accordingly. No manual tuning required.

Results from both searches merge using Reciprocal Rank Fusion (RRF), a technique that combines ranked lists without needing comparable scores.

AST-Aware Chunking

RAG systems split documents into chunks before indexing. Most use fixed token counts — every 500 tokens, create a new chunk.

This breaks code in awkward places. A function split mid-way loses meaning. A class definition separated from its methods becomes harder to understand.

AmanMCP uses tree-sitter to parse actual code structure. Chunks align with logical boundaries:

  • Functions stay whole
  • Classes keep their methods
  • Related code stays together

When a function exceeds the chunk limit, it splits at nested boundaries — inner functions, large blocks — rather than arbitrary positions.

AST-Aware vs Traditional Chunking

Markdown files use header-based chunking. Each section becomes a chunk, with header hierarchy preserved as context.

Local Embeddings

Vector search requires embeddings — numerical representations of text that capture semantic meaning.

Most RAG systems call cloud APIs (OpenAI, Cohere) for embeddings. Every query and every indexed chunk makes a network request. Costs add up. Rate limits apply. Your code travels over the wire.

AmanMCP generates embeddings locally using Ollama with the nomic-embed-text model. Runs on your hardware. No API costs. No external calls.

For machines without GPU or when Ollama isn't running, a static embeddings fallback provides basic semantic search using CPU-only word vectors. Quality decreases, but search still works.

Architecture

AmanMCP Architecture Diagram

Storage layer uses:

  • USearch for vector similarity (HNSW algorithm)
  • Custom BM25 inverted index for keyword search
  • SQLite for metadata and file tracking

All indexes live in .amanmcp/ within your project directory. Portable and inspectable.

Performance Targets

The goal is sub-100ms query latency on a 50,000 file codebase running on typical developer hardware (16-32GB RAM).

This requires:

  • Efficient vector indexing (USearch with HNSW)
  • In-memory BM25 with smart caching
  • LRU cache for repeated queries
  • Parallel search execution (BM25 and vector run concurrently)

Memory management adapts to available RAM:

  • 16GB system → conservative settings, I8 quantization
  • 24GB system → balanced defaults, F16 quantization
  • 32GB+ → full precision, larger caches

Why Go?

AmanMCP is written in Go. Single binary compilation. No runtime dependencies. Cross-platform without configuration.

# That's it. No pip, no venv, no node_modules.
./amanmcp serve
Enter fullscreen mode Exit fullscreen mode

Tree-sitter bindings require CGO, but the distributed binary includes everything. Users don't need a compiler toolchain.

Go's concurrency model fits the architecture — parallel search paths, background indexing, file watching — without callback complexity.

What's Next

Current status: finalizing the technical specification and will be completing it over the weekend.

The spec covers:

  • Complete data models (chunks, symbols, projects)
  • Search algorithms with code examples
  • Configuration schema with sensible defaults
  • MCP tool definitions
  • Error handling and graceful degradation

AmanMCP Conceptual Architecture

Next milestone: working v1 with core search functionality.

Get Involved

AmanMCP will be open source. If you're interested in:

  • Local-first developer tools
  • RAG systems and hybrid search
  • Go-based infrastructure

Watch the repo (link coming soon) or connect with me here.

The AI assistant ecosystem is growing fast. The tooling that feeds context to these assistants matters. I'd rather that tooling respect privacy by default.


AI #OpenSource #DeveloperTools #BuildInPublic #Golang #RAG #LocalFirst #LLM #DevTools #AIAssistant

Building in public. Questions and feedback welcome.

Top comments (0)