Esan Mohammad

Posted on Apr 22

I was tired of re-explaining my project to Claude every session

#ai #webdev #productivity #opensource

I'd start a new Claude Code session and spend the first ten minutes pasting files.

"Here's the API gateway. Here's the user service. The gateway talks to users over HTTP. Users publish to this Kafka topic. The payments service consumes it. The shared types live in this package. Here's the schema."

Next day. New session. Same ten minutes.

Some days I'd realize halfway through that I'd already burned half my context window on orientation and hadn't gotten to the actual problem yet. That was the moment I knew this was broken.

Our work project spans five repos. TypeScript, Go, Python, Kafka, shared Postgres. Real production stuff. And every AI tool I tried treated my project like a blank slate every time I opened it. The better the model, the more it noticed the gaps, the more it guessed. And when it guessed wrong, it guessed wrong confidently.

I got tired of it. I spent three months of weekends building the thing I wished existed. It ended up being two things.

The first thing: a pipeline with a built-in knowledge graph

The core idea is stupid simple. Parse the project once. Build a compact architectural summary. Inject that summary into every agent call.

I call it Anvil Pipeline. Here's what it actually does:

It walks every repo in your project. Uses tree-sitter to extract functions, classes, interfaces, types, imports. Builds a graph where nodes are symbols and edges are the relationships between them. Then it looks across repos and auto-detects the connections between them — Kafka topic producers and consumers, HTTP routes and their callers, shared TypeScript interfaces, protobuf definitions, Docker Compose service links, environment variables that reference other services. Fourteen detection strategies in total.

The output is a GRAPH_REPORT.md file per repo. It's designed to be low-token — a compact architectural overview, not a dump of code. That file gets injected into every agent prompt.

The first time I ran it and started a Claude session, the agent just... knew. Knew which services I had. Knew Kafka was between them. Knew which types were shared. I didn't paste anything. I saved a conservative 20,000 tokens in the first session alone, probably more.

That alone would have been enough. But while I was there I built the pipeline part.

Anvil Pipeline takes a feature description and runs it through eight stages: clarify the intent, produce a high-level plan, break it into per-repo requirements, write technical specs, generate task lists, build the code, validate with build and test commands, ship as pull requests. Each stage writes artifacts to disk. Each stage is resumable.

The resumability part matters. If you've run long agent sessions you know: Claude's auth expires. Your budget hits its limit. Your laptop goes to sleep. The dashboard crashes. Any of these kills a naive agent loop and you lose everything.

Anvil checkpoints at every stage. When auth expires, the pipeline pauses, sends a browser notification, auto-opens the re-login page, and resumes from the same spot once you're authenticated. I built this part after losing a 40-minute run to a five-second auth check. The kind of thing where you walk away to grab water and come back to nothing.

The second thing: a plug-and-play MCP server

The other half of my frustration was smaller but more constant. AI tools making up function names. Imports that don't exist. Helpers I never wrote.

The fix here is also stupid simple: give the model actual tools to look up the code, instead of asking it to recall from training.

Code Search MCP is a standalone MCP server. Any MCP client picks it up — Claude Code, Claude Desktop, Cursor, whatever you're using next month. One line to install in Claude Code:

That's it. Claude now has eleven new tools, including the ones I actually use:

search_code for hybrid search — vector plus BM25 plus graph expansion plus cross-encoder reranking
find_callers — everywhere a function is called across all your repos
find_dependencies — what a function depends on
impact_analysis — what breaks if you change this file
impact_analysis has turned out to be the one I use most. I did not expect that. Before, I'd ask Claude "what breaks if I remove this" and get a plausible-sounding guess. Now I get a real answer, because the tool walks the graph.

The part I spent the most time on is incremental indexing. Codebases change constantly. Re-embedding the whole thing on every commit is expensive and slow. So I built four layers of skip logic:

Git SHA at the repo level. If the repo's HEAD hasn't moved, skip entirely.
Git diff at the file level. Only files that changed.
SHA-256 at the content level. Files that changed but ended up with the same content get skipped too.
Embedding diff at the chunk level. Only new chunks are embedded. Existing embeddings in LanceDB are preserved.
A typical "I changed 2 files" reindex embeds about 5 new chunks instead of redoing the whole repo.

Embeddings are provider-agnostic. Ollama is the default, which means it runs free and local out of the box. If you want better quality you can plug in Voyage, Mistral/Codestral, OpenAI, Gemini, or any OpenAI-compatible endpoint. I did not want to force anyone into a specific cloud.

Why both live in the same repo

I almost split them. They target different users. Pipeline is for teams doing feature work across repos. Code Search MCP is for any developer using any AI assistant. Different install stories, different mental models.

But they share the hard parts. The tree-sitter parsing, the cross-repo edge detection, the embedding pipeline. Splitting them meant maintaining two copies of all that.

So they ship together, under the Anvil umbrella. Use either. Use both. Use neither — the code is MIT, so rip out the parts you want.

What I care about, technically

Everything runs on your machine. Dashboard, pipeline, knowledge graph, indexing — all local.

No telemetry. No analytics. No crash reporters. No phone-home. I checked twice.

No account system. Nothing to sign up for.

Your code only goes to the LLM provider you explicitly select. Anvil never proxies or stores it.

MIT licensed. Every line auditable.

I wanted AI tooling that didn't compromise on any of this. There is a lot of AI dev tooling out there now, but most of it sends your code through someone's SaaS. I didn't want that for my own work, and I figured other people might not want it either.

What's still weak

The 8-stage pipeline is opinionated. It works for how I work. If your workflow doesn't fit "describe, plan, code, ship" it'll feel stiff.

The cross-repo detection strategies cover my stack. GraphQL federation, event sourcing, and some message queue patterns aren't handled yet. I'm collecting edge cases.

The dashboard isn't pretty. I spent the time on correctness.

If you try it

The repo is at https://github.com/esanmohammad/Anvil

Code Search MCP is one line to install. Pipeline takes a config file — anvil init walks you through it.

I'd love to hear what breaks. Especially the cross-repo detection, and which of the 11 MCP tools you actually use in practice. I suspect two or three should be cut and I don't know which ones yet.

This is my first time shipping a side project in public. Feedback welcome, roasts too.

If you found any of this useful, the repo is on GitHub and I'd love a star — it helps other people find it.

Top comments (2)

PEACEBINFLOW • Apr 22

The detail about resumability after auth expiration is the kind of thing that only comes from losing real work. Not hypothetical work. Not "this would be nice to have." Actual hours of context and progress, gone because a token timed out while you stepped away. That specific pain is what turns a feature from "maybe later" into "this is why the pipeline exists."

What I find interesting is how this inverts the usual relationship with AI tools. Normally, you adapt to the tool's constraints. You learn to save often, to keep sessions short, to never walk away mid-run. Anvil adapts the tool to your constraints. The checkpointing doesn't make the AI smarter. It just makes it less brittle to the realities of working with cloud APIs. That's a different kind of value. Not flashy, but it's the difference between a tool you use once for a demo and a tool that becomes part of your daily workflow.

The four-layer skip logic for incremental indexing is the same pattern in a different domain. It's not about making the index better. It's about making it cheaper to maintain so you actually keep it current. The best index in the world is useless if it's stale because reindexing was too painful to run regularly.

I'm curious about the impact_analysis tool specifically. You mentioned using it more than expected. Is that because the AI's guesses about impact were plausible but wrong in ways that were hard to detect without actually running the graph? Or is it more that you just didn't trust the guesses and wanted ground truth before acting? The distinction matters for understanding whether the tool is providing new information or just providing confidence.

Esan Mohammad • Apr 23

Hey, thank you for feedback. Current implementations are just my thoughts how certain things should behave.
My idea is to get different perspective on this. And current project is in very early phase and there's much in pipeline.

Impact_analysis is because I don't want AI to guess things. I want clear answer of what will break if I change things. I want AI to run in controlled manner.