Every time I asked an AI to fix a bug, it rewrote my entire file.
I'd ask: "fix the empty list check on line 47."
The AI would return: 300 lines of "improved" code.
Half my carefully tuned logic — gone.
This happened to me one too many times. So I built something different.
The Problem with AI Code Tools
Most AI coding assistants work like this:
- You send the whole file
- AI rewrites the whole file
- You diff 300 lines to find the 2 that changed
- Something else broke that wasn't broken before
Copilot, Cursor, and friends are great — but they all have this problem when you're working on existing code you care about.
The Solution: Surgical SEARCH/REPLACE Patches
I built AI Code Sherlock around one core idea:
The AI should only touch the exact block that needs changing.
Every response from the AI looks like this:
[SEARCH_BLOCK]
result = data[0]["value"]
[REPLACE_BLOCK]
if not data:
return None
result = data[0]["value"]
[END_PATCH]
The engine finds this exact string in your file, validates it appears exactly once, creates a backup, and replaces only that block. Nothing else is touched.
If the search block isn't found — or matches more than once — the patch is rejected. No silent corruption.
The Auto-Improve Pipeline
This is where it gets interesting.
You configure:
- Goal: "achieve f1 > 0.85 on validation set"
- Script: train_model.py
- Strategy: Safe Ratchet
- Max iterations: 20
Then press Run and walk away.
The pipeline:
- Runs your script
- Reads stdout/stderr
- Extracts metrics
- Builds a prompt with context + history
- Gets a patch from the AI
- Validates syntax
- Applies the patch
- Re-runs the script
- Checks if metrics improved (Safe Ratchet mode)
- Repeats until goal is reached or iterations exhausted
Real terminal output looks like this:
[PIPELINE] Iteration 3/10 strategy=SAFE_RATCHET goal="f1 > 0.85"
[RUN] python train_model.py → exit 0 (14.3s)
precision=0.71 recall=0.68 f1=0.69
[AI] Analysing metrics... building patch...
[APPLY] ✓ Patch applied · syntax OK · backup created
[RUN] python train_model.py → exit 0 (13.8s)
f1=0.73 ↑+0.04
[RATCHET] metrics improved — continuing to iteration 4...
8 AI Strategies
Different problems need different approaches:
| Strategy | When to use |
|---|---|
| 🛡️ Conservative | Only fix explicit errors |
| ⚖️ Balanced | Fix + moderate improvements (default) |
| 🔥 Aggressive | Maximum changes, refactor logic |
| 🔒 Safe Ratchet | Apply only if metrics improve |
| 🧭 Explorer | Different approach every iteration |
| 🔬 Hypothesis | Form hypothesis → test → validate |
| 🎭 Ensemble | Generate 3 variants, pick best |
| 📈 Exploit | Double down on what worked |
Works 100% Offline with Ollama
No API key required. Just run:
ollama serve
ollama pull deepseek-coder-v2
And point AI Code Sherlock at localhost. Your code never leaves your machine.
It also supports OpenAI, Gemini, Groq, Mistral — any OpenAI-compatible endpoint.
Consensus Engine
Query multiple models at once and let them vote:
- Vote mode: patches agreed on by ≥N models win
- Best-of-N: pick response with most valid patches
- Merge: combine unique patches from all models
- Judge: one model evaluates all responses and picks the best
The Error Map
Every error gets stored with its confirmed solution. When the same error appears again, the AI sees: "this exact problem was fixed before — here's what worked."
Avoid-patterns prevent the AI from repeating approaches that already failed.
Key Technical Details
- Built with Python 3.11 + PyQt6
- Async subprocess runner with real-time stdout/stderr streaming
- AST-based context compression (120k tokens → 4k without losing signal)
- Unicode sanitizer strips zero-width spaces, BOM, smart quotes from AI responses
- Atomic settings save (corruption-proof)
- Version control: every patch backed up, full diff viewer, one-click restore
Try It
GitHub: https://github.com/signupss/ai-code-sherlock
Website: https://codesherlock.dev
git clone https://github.com/signupss/ai-code-sherlock.git
cd ai-code-sherlock
pip install -r requirements.txt
python main.py
Free and open source under MIT License.
Would love to hear what you think — especially if you've tried to build something similar or have ideas for the pipeline. What would make this useful for your workflow?
Top comments (0)