The problem with AI-assisted debugging isn't Claude. It's the 10 minutes before you even open Claude — manually hunting through 300 files to figure out which 5 are actually relevant.
My debugging loop used to look like this:
- Error fires in production
- Open the repo — 300 files staring back at me
- Spend 10 minutes manually figuring out which 5 files are actually relevant
- Copy those files into Claude
- Claude gives a great answer
- Repeat tomorrow with a different error
Step 3 was the bottleneck. Not the AI — the archaeology before the AI.
So I built dug — a CLI that does that context-gathering step automatically, and gets better at it every time you fix a bug.
What dug actually does
dug init # indexes your codebase once
dug "your error" # generates a Claude Code prompt with ranked file context
That second command outputs something like this — on the dug codebase itself:
dug "language detection returning wrong languages includes python in typescript project"
## Bug Report
**Error:** language detection returning wrong languages includes python in typescript project
**Files to investigate (ranked by relevance):**
- src/dug/__main__.py (modified in relevant recent commit, semantic match 3.36/5)
- src/dug/graph.py (semantic match 2.00/5)
- src/dug/chunker.py (semantic match 1.51/5)
**Recent commits touching these files:**
0560e92: "fix: language detection now respects ignore_paths" (0d ago)
**Suggested starting point:**
Begin at src/dug/__main__.py.
__main__.py is exactly where the bug lived — in a function called _detect_languages(). The tool found it from a plain English description with no file names, no line numbers, no stack trace.
Here's how.
The three-layer scoring system
dug doesn't just grep. It combines three independent signals into a single relevance score for every file in your repo.
Layer 1 — Structural scoring
During dug init, dug builds a directed graph of your codebase:
FILE nodes → SYMBOL nodes (functions, classes)
FILE nodes → FILE nodes (imports)
COMMIT nodes → FILE nodes (recently changed)
At query time, dug extracts signals from your error text — file names mentioned, symbol names, error type — and walks this graph:
| Signal | Points |
|---|---|
| File directly mentioned in error | +10 |
| File imports the mentioned file | +8 |
| File modified in a commit matching the error | +8 |
| File modified in any recent commit | +2 |
This alone gets you far. A NullPointerException in UserService will score UserService.java highly just from the graph — no ML required.
Layer 2 — Semantic scoring
Structural scoring fails when the error text doesn't match file names. "checkout is failing with a null value" won't grep-match anything useful.
During dug init, every function body gets embedded using fastembed (ONNX-based, no PyTorch, runs fully local) and stored in LanceDB. At query time, the error text gets embedded with the same model and a cosine similarity search finds the most semantically related functions.
The model is sentence-transformers/all-MiniLM-L6-v2 — 384 dimensions, fast on CPU. Semantic hits add up to +5 points based on similarity score.
Layer 3 — History boost
This is the part that makes dug different from every other code search tool.
After you fix a bug, you run:
dug solved
It shows what it suggested and asks which files actually had the fix:
Last query: "language detection returning wrong languages"
Suggested files were:
- src/dug/__main__.py
- src/dug/graph.py
Which files actually contained the bug? (comma-separated paths)
> src/dug/__main__.py
This gets saved to .dug/history.json:
{
"bug_input": "language detection returning wrong languages",
"error_type": "None",
"resolved_files": ["src/dug/__main__.py"],
"solve_count": 1,
"last_solved": "2024-06-16T10:23:00Z"
}
Next time a similar error comes in, find_similar_past_bugs() scores it against every entry in history:
score = text_similarity × 0.6
+ signal_overlap × 0.25 # shared files/symbols between queries
+ error_type_match × 0.2 # exact error class gives a bonus
Text similarity uses a blend of character-level SequenceMatcher and word-level Jaccard with CamelCase/snake_case splitting — so NullPointerException and null pointer in config score as similar even without shared substrings.
Files from matching past bugs get up to +6 points, scaled by similarity:
boost = 6.0 × similarity_score
# 0.9 similar → +5.4 points
# 0.5 similar → +3.0 points
On top of that, there's an error pattern boost — if UserService.java appeared in 8 of 10 past NullPointerException fixes, it gets an extra +0–3 points from that pattern alone, independent of text similarity.
Why "learning" is the right word
Fresh install, dug is good. After 20 bugs marked solved, dug is better for your specific codebase. After 100 bugs, it knows:
-
ImportErroralmost always means__main__.pyin this project -
TypeError undefinedalmost always meansapi/client.ts - The auth bug that keeps coming back always starts in
src/auth/
It's not ML training in the PyTorch sense. It's weighted frequency built from your team's actual debugging history. Stateless tools like grep can't do this — they have no memory. dug does.
Zero LLM calls
The entire pipeline — graph traversal, vector search, history lookup, scoring, prompt assembly — runs locally. No API key. No network requests. No latency.
The LLM call happens after, when you paste the output into Claude Code. dug's job is purely context assembly.
This means it works offline, costs nothing to run, and produces deterministic output you can inspect and debug.
Install
# macOS
brew tap ratishjain12/dug
brew trust ratishjain12/dug
brew install dug-cli
# Python users
pipx install dug-cli
# Linux / macOS one-liner
curl -fsSL https://raw.githubusercontent.com/ratishjain12/dug/main/install.sh | sh
Supports Python, TypeScript, JavaScript, Java.
GitHub: github.com/ratishjain12/dug
What's next
Sentry / error tracker integration — dug sentry <issue-url> fetches the stack trace directly. Eliminates copy-paste entirely.
MCP server — expose dug as an MCP tool so Claude Code can call it mid-session directly.
Call graph edges — use jedi to add SYMBOL→SYMBOL edges for Python. Callers of the broken function get scored too.
VSCode extension — highlight error text, right-click, "Generate dug prompt."
If you try it, run dug solved after your first fix — that's what starts the learning loop. The first few times it's just bookkeeping. By week two it's noticeably better at predicting where your bugs live.
Questions about the architecture or scoring? Happy to go deeper in the comments.
Top comments (0)