Swapnanil Saha

Posted on May 26 • Originally published at swapnanilsaha.com

Vectr — Code Intelligence AI Tool

#ai #llm #softwaredevelopment #tooling

You log off for the day after two hours of research. You know the entry point is EvaluateSegments in targeting/segment/evaluator.go. You know the nil visitor_id case is unhandled. You know bidder/auction.go calls this function and can't have its interface changed.

Next morning, Claude Code knows none of that. It starts fresh. It greps, reads files, consumes 8,000 tokens rediscovering what you already found. Every session is day one.

This is the actual friction in AI-assisted development — not the quality of code generation, but the complete absence of working memory across session boundaries.

The problem with how AI assistants use context

On a codebase with 40,000 files, the AI runs rg -l "authenticate", gets 200 results, reads 8 complete files — 12,000 tokens gone for one query. And the next session, it starts over from zero: no memory of what it found, no record of what's still missing.

A 200,000-token context window sounds vast, but a 40,000-file codebase is vastly larger. Assistants compensate by running grep-style searches, finding matching files, then reading entire files to locate the relevant function. Within a session, experienced users manage this. The real problem is across sessions. Every conversation starts empty. Research done Monday is redone Thursday.

Humans solve this differently. A developer who worked on a feature last week doesn't remember every line — but they remember that targeting code lives in targeting/, that segment evaluation has an edge case around nil visitor IDs, and that the auction pipeline calls EvaluateSegments. They remember at different levels of fidelity, and they can re-read the details in seconds when needed. They can afford to forget, because retrieval is fast.

What Vectr does

Vectr is a local codebase indexer that gives an AI assistant the same layered recall capability. It provides three kinds of knowledge — and a memory system for working state.

Layer 1: Codebase map. At startup, Vectr makes one LLM call over the directory structure and README to build a ~300-token plain-English passport. It captures module purposes, tech stack, entry points, and domain vocabulary. Every session, the AI gets this for free via vectr_map — no file reading required.

vectr_map() →
"Go DSP ad server. Main modules: targeting/ (audience matching),
bidder/ (bid logic), tracker/ (event recording).
Entry: bidder/pipeline.go:RunBidPipeline
Domain terms: segment, visitor_id, bid_request, floor_price"

Layer 2: Symbol graph. Vectr uses tree-sitter to extract every function, class, and method into a persistent SQLite-backed graph with call relationships. vectr_locate finds where a symbol is defined — file, line number, kind — without returning any code content. vectr_trace follows the call graph in either direction.

vectr_locate("EvaluateSegments") →
[function] EvaluateSegments  targeting/segment/evaluator.go:45

vectr_trace("EvaluateSegments", direction="callers") →
Called by (2):
  RunBidPipeline  in bidder/pipeline.go:88
  RequestBid      in bidder/auction.go:134

Layer 3: Content search. AST-aware chunks — split at function and class boundaries, never mid-logic — are embedded with Snowflake/snowflake-arctic-embed-m-v1.5 (local, no API key, ~440MB download once). Adaptive hybrid search: vector similarity + BM25 keyword, with weights tuned per codebase fingerprint — small repos lean on BM25, large ones on semantics, static-typed monorepos use graph traversal first. Override with VECTR_EMBED_MODEL=<hf-model-id> for any sentence-transformers compatible model.

vectr_search("nil visitor_id handling segment evaluation") →
[1] targeting/segment/evaluator.go  lines 45-89  score 0.934
    symbol: EvaluateSegments
    ...

The part that's actually new: working memory

The layer that makes Vectr different from every other code search tool is the bidirectional protocol between the AI and the memory store.

vectr_remember lets the AI offload a working note:

vectr_remember(
  "Implementing segment targeting. Entry: EvaluateSegments() in evaluator.go:45.
   Need to add nil guard for visitor_id before line 61.
   bidder/auction.go calls this — cannot change its interface.
   Missing: integration test for multi-segment visitor with expired segments.",
  tags=["segment-targeting", "wip"],
  priority="high"
)
→ "Stored note #4. You can safely drop related code chunks from context."

The AI can now discard the code chunks from its context window. Vectr has them and will return them in under 50ms.

vectr_evict_hint makes this explicit. When the AI has accumulated a session's worth of retrieved content, Vectr proactively tells it what to drop:

vectr_evict_hint() →
"Vectr has 6 chunks (~3,840 tokens) indexed and instantly retrievable.
You can safely drop these from your context window:
  targeting/segment/evaluator.go  [lines 40-110 (EvaluateSegments)]
  bidder/auction.go  [lines 88-134 (RequestBid)]
Recall latency: <50ms. Nothing will be lost."

Next morning:

vectr_recall("segment targeting") →
[HIGH] [seg, wip] (14h ago)
  Implementing segment targeting. Entry: EvaluateSegments() in evaluator.go:45.
  Need to add nil guard for visitor_id before line 61.
  bidder/auction.go calls this — cannot change its interface.
  Missing: integration test for multi-segment visitor with expired segments.

Three MCP calls, roughly five seconds, and the AI is fully context-loaded — without re-reading any code.

How to run it

Two install options depending on your environment.

Option A — pip (recommended for individual developers):

pip install git+https://github.com/swapnanil/vectr

cd /path/to/your/project
vectr start

Option B — Docker (for servers and CI pipelines):

git clone https://github.com/swapnanil/vectr
docker-compose up api

On first run, Vectr downloads the embedding model (~440MB), indexes the workspace, builds the symbol graph, and writes MCP configuration files for Cursor and Claude Code. No configuration files to write, no environment variables required for local-only use.

Other CLI commands:

# Stop and restart on a different workspace
vectr restart --path /path/to/other/project

# Write CLAUDE.md + .mcp.json without starting the server
vectr init

# Stop the server
vectr stop

# Search from the terminal
vectr search "JWT token validation"

If you set ANTHROPIC_API_KEY (or OPENAI_API_KEY + LLM_MODEL), Vectr also builds the codebase passport on startup — one LLM call, ~$0.005, cached permanently.

Once running, Claude Code and Cursor automatically use the ten MCP tools (vectr_map, vectr_locate, vectr_trace, vectr_search, vectr_remember, vectr_recall, vectr_evict_hint, vectr_snapshot, vectr_snapshot_list, vectr_status) without any manual configuration. The MCP server runs at localhost:8765/mcp — any compatible client connects with two lines of JSON config.

Benchmark results: Camel Run 2

To measure the cross-session memory benefit, the benchmark uses a two-phase design: Phase 1 explores the codebase and stores notes with vectr_remember; Phase 2 opens a cold session, calls vectr_recall(), and implements. Vanilla Phase 2 re-reads from scratch.

The Camel codebase is 5,856 files of enterprise Java — the kind of thing where the model has no meaningful training coverage.

Task	Vanilla Phase 2	Vectr Phase 2	Cost Δ	Tool calls Δ	Output
`custom_component`	$0.56 · 134s · 51 tools	$0.36 · 195s · 11 tools	−35%	−78%	0 bytes (failure) vs 9,398 bytes (5 files)
`route_policy`	$1.15 · 430s · 59 tools	$0.35 · 177s · 16 tools	−70%	−73%	both 280-line impl
`type_converter`	$0.48 · 187s · 25 tools	$0.20 · 86s · 11 tools	−57%	−56%	both working
Totals (Camel)	$2.19 · 751s · 135 tools	$0.92 · 458s · 38 tools	−58%	−72%	−40% input tokens

The custom_component result shows the failure mode most clearly: vanilla ran out of context budget navigating the unfamiliar Java package hierarchy and produced nothing. Vectr's Phase 2 started with structured notes from Phase 1 — ~200 tokens replacing hundreds of re-discovery tool calls — and delivered a complete 5-file implementation.

route_policy shows the efficiency case where both sides succeeded: 3× cheaper, 2.4× faster.

Vectr helps in proportion to how much re-discovery work Phase 2 would otherwise do. Single-session tasks on well-known codebases see minimal benefit. Large unfamiliar codebases and cross-session continuation tasks see the most.

Django results were mixed: complex ORM internals showed −24% tokens, −60% cost; well-known APIs where the model already has training coverage showed no benefit. The mechanism is the same in both cases — Vectr just doesn't help where re-discovery cost is already low.

A session with the full stack

Morning — session start (3 calls, ~5 seconds):

vectr_map()                                          → structural overview (247 tokens)
vectr_recall()                                       → yesterday's notes, verbatim
vectr_locate("EvaluateSegments")                     → file:line, no code read

During the session:

vectr_search("visitor_id nil handling")              → 3 chunks, 580 tokens
vectr_trace("EvaluateSegments", direction="callers") → 2 callers identified

End of session:

vectr_remember("Segment targeting done...")          → note stored
vectr_evict_hint()                                   → drops 3,840 tokens of chunks
vectr_snapshot("segment-targeting-day1")             → full session saved

Full context in three calls, five seconds. No file reading on reconnect.

What's next

Vectr is open source at github.com/swapnanil/vectr. The current build supports Python, JavaScript, TypeScript, Go, Rust, and Java for AST chunking and symbol extraction. Planned: adaptive retrieval strategy selection based on codebase fingerprint (Java monorepos benefit from graph traversal; dynamic Python codebases respond better to semantic search), and LLM-generated symbol descriptions generated lazily on first access.

If you work on a large codebase and your AI assistant spends the first five minutes of every session re-reading the same files, try Vectr. The full tool page is at swapnanilsaha.com/tools/vectr/.