I Built a Local Codebase Search Tool for AI Queries — Without Uploading a Single Line of Code

#ai #privacy #python #devtools

Every AI coding tool I’ve used has the same problem: to understand your codebase, it needs to see your codebase. Which means sending it somewhere.

For most projects that’s fine. For anything with proprietary logic, unreleased features, or internal API details — it’s a problem you think about every time you paste something.

I wanted a different option. So I built ctx-kit: a local BM25 search engine for your codebase. It finds the files most relevant to your question, formats them as a context block, and lets you paste that into any AI tool. Nothing is sent automatically.

How it works

# Index your project (runs locally, saves to .ctx-index.json)
ctx index .

# Ask a question
ctx ask "how does authentication work?"

Output:

# Context for: how does authentication work?

## src/auth/jwt.py (relevance: 4.21)

python
def create_token(user_id: str) -> str:
"""Create a signed JWT for the given user."""
...

def verify_token(token: str) -> dict:
"""Verify and decode a JWT. Raises if invalid."""
...


## src/middleware/auth.py (relevance: 3.87)
...

You get back the most relevant files, formatted as a context block. Paste it wherever you want.

The --copy flag

ctx ask "where is rate limiting implemented?" --copy
# Copied context (1,247 words, 4 files) to clipboard.

One command, context is in your clipboard, ready to paste.

Why BM25 and not embeddings?

I looked at local embedding models first. They work, but they add dependencies: Ollama or a model download, a vector store, more moving parts.

BM25 needs nothing. It’s pure Python, no dependencies beyond typer. For code search on your own repo — same language, same naming conventions — BM25 is surprisingly accurate.

I indexed my own projects and compared results. BM25 found the right file in the top 3 results ~85% of the time for concrete questions like “where is X implemented.” That’s enough to be useful.

The privacy angle

The interesting thing about building local-first is that privacy stops being a trade-off. You’re not giving up quality to stay private. You’re just choosing what you paste into the AI tool, with better information about which files actually matter.

Self-hosted AI is having a moment — airi hit 3,000 GitHub stars in 24 hours this week with a fully local AI companion. The shift toward controlling your own AI context is real.

ctx-kit is a small piece of that: own your context, decide what gets sent.

Install

pip install ctx-kit

Repo: LakshmiSravyaVedantham/ctx-kit

If you’ve ever hesitated before pasting code into an AI because you weren’t sure what else might be in the file — this is the tool I built for that feeling.