Niraj Kumar

Posted on Jan 4 • Edited on Jan 19

Building Memory for AI-Assisted Development

#rag #mcp #go #ai

The Thousand-Line Problem

You ask your AI assistant: "How does our billing system handle subscription renewals?"

It reads billing.go—all 900 lines. Then subscription.go—700 more. Then payment_processor.go. It finds the answer in line 342.

Two thousand lines processed. Fifteen lines needed.

Modern AI is remarkably capable. It navigates code, discovers files, uses tools. The problem isn't intelligence—it's efficiency. And efficiency, at scale, becomes the difference between flow and friction.

The Three Frictions

When I started building AmanMCP, I identified three distinct sources of friction in AI-assisted development:

Friction One: The Token Economy

Every line the AI reads costs tokens. Tokens cost money, or time, or both. When your assistant reads an entire file to find a single function, you're paying for exploration overhead. Multiply this across a day of development. Across a team. The waste compounds.

Friction Two: The Precision Gap

Developers think in concepts. "Authentication flow." "Error handling pattern." "The thing that processes webhooks." But traditional search is literal. grep finds exact strings, not intentions. You end up translating your thoughts into searchable keywords—one more context switch, one more interruption to flow.

Friction Three: The Privacy Bargain

Most RAG solutions—the technology that gives AI "memory"—run in the cloud. Your code goes up. Embeddings come down. For personal projects, perhaps acceptable. For proprietary code, for client work, for anything you wouldn't post publicly—it's a bargain you shouldn't have to make.

AmanMCP is my answer to all three.

What If Memory Was Local?

Here is the core idea, stripped to its essence:

Your AI assistant should have long-term memory of your codebase, and that memory should live entirely on your machine.

Not memory that phones home. Not memory stored on someone else's server. Memory that runs on your hardware, indexes your disk, and never touches a network.

When you search for "authentication middleware," you should get:

The exact function definition (keyword precision)
The conceptually related OAuth handler (semantic understanding)
The relevant documentation section (cross-format retrieval)

All within milliseconds. All without a single byte leaving your laptop.

This is what I'm building.

The Hybrid Mind

Pure keyword search is a librarian who only reads titles. Ask for "authentication" and you'll find files with that exact word—but miss the ValidateCredentials function that's conceptually the same thing.

Pure semantic search is a librarian who understands meaning but loses precision. It grasps concepts but might rank a comment about authentication above the actual implementation.

The human mind uses both. You recall by keyword when you know the exact term. You recall by meaning when you're thinking conceptually. The best retrieval should work the same way.

AmanMCP runs two searches in parallel:

BM25 for lexical precision—finding exact identifiers, error codes, function names
Vector search for semantic understanding—finding conceptually related code

Then it fuses the results using Reciprocal Rank Fusion, a technique proven at scale by OpenSearch, Elasticsearch, and MongoDB. The result: code that matches what you typed and code that matches what you meant.

The technical details matter. The experience is simpler: you search, you find.

Code is Not Text

There is a subtle mistake embedded in most search systems: treating code as if it were prose.

A generic text chunker sees 500 characters and splits. It doesn't know that those 500 characters are the first half of a function, meaningless without the second half. It doesn't understand that ProcessPayment is a complete thought, that UserService is a coherent unit.

I use tree-sitter—the same parser that powers syntax highlighting in Neovim, Helix, and Zed—to understand the structure of your code. When I create chunks, I respect semantic boundaries. Functions stay whole. Classes stay together. The imports travel with the code that needs them.

This is AST-aware chunking. It means your search results are complete thoughts, not arbitrary fragments.

The Technology Stack

For those who appreciate the architecture:

Go as the implementation language. Single binary distribution. Excellent concurrency. Native alignment with the MCP ecosystem—the official SDK is maintained by Google and Anthropic.

Ollama as the embedding backend. Runs locally on Apple Silicon with Metal GPU acceleration. Supports Qwen3-Embedding, currently the top-ranked model on the MTEB leaderboard. Your embeddings never leave your machine.

coder/hnsw for vector storage. Pure Go, no CGO complexity, logarithmic scaling. Handles 300,000+ documents with sub-10ms query times. Built by the Coder team, battle-tested in production.

Bleve for BM25 indexing. Native Go, active maintenance, the standard for embedded full-text search in the ecosystem.

tree-sitter for code parsing. 100+ language grammars, error-tolerant, used by GitHub for their search infrastructure.

Every component was chosen after evaluating alternatives. Every tradeoff was deliberate. I didn't want clever—I wanted correct.

The "It Just Works" Philosophy

I took inspiration from tools that defined their categories.

git installs once and works everywhere. Homebrew handles dependencies invisibly. The best Apple products configure themselves.

When you run amanmcp in your project directory:

It detects your project type from go.mod, package.json, pyproject.toml
It discovers source directories intelligently
It respects your .gitignore patterns
It builds an index ready for queries

No configuration file required. No environment variables needed. No documentation necessary for the default case.

The 90% of developers who just want it to work—it should just work.

The 10% who want customization—the options are there, when you need them.

Where I Am

Let me be precise about the state of things.

AmanMCP is in active development. The core is functional—you can index a project, perform hybrid search, integrate with MCP-compatible assistants. The technology choices are validated against 2025-2026 industry research. The architecture is sound.

But I am not finished. There are edge cases to smooth, performance to optimize, patterns to polish. The "It Just Works" promise demands a level of refinement that takes time to achieve.

What I have is a foundation—solid, tested, usable today.

What I'm building toward is a vision: AI-assisted development where context is complete, retrieval is instant, and privacy is unconditional.

The Invitation

I am building AmanMCP because I want to use it.

I want to code in flow state, where the context I need appears without friction. I want privacy that isn't a premium feature or an afterthought. I want tools that compound my thinking, not interrupt it.

This is open source. The code is there to read, to run, to question, to improve. I'm not building a product—I'm building infrastructure for a different way of working.

If you've felt the friction I'm describing—if you've watched tokens accumulate as your AI explores your codebase—try indexing a project. Ask your assistant a question. See if the results match your expectations.

And if they don't, help me make them better.

The code is yours. The memory is local. The future is being built.

AmanMCP

Local-first RAG for AI-assisted development

Zero configuration. Privacy-first. It just works.

Star the repository. Join the conversation. Let's build the memory layer together.

github.com/Aman-CERP/amanmcp

DEV Community