When you plug a local LLM into a web search tool, every fetched page becomes an attack surface. I found this out the hard way — my Ollama setup was pulling web content that contained invisible Unicode injection, fake system prompts, and markdown image tags designed to exfiltrate data through URL parameters.
I went looking for a solution and found that Google DeepMind's own research showed their best model-level defenses fail 53.6% of the time against adaptive attacks. The "Attacker Moves Second" paper demonstrated that all 12 published defenses were bypassed at >90% success rates. The UK's National Cyber Security Centre formally characterized LLMs as "inherently confusable deputies."
So I stopped trying to make the model resist injection and started removing the attack text before the model ever sees it.
The Insight: OCR as a Nuclear Defense
Since I'm generating the image from text (not scanning a document), I control every variable. The OCR round-trip becomes a ground truth extractor:
- Take untrusted web content
- Render it to an image with ImageMagick (300 DPI, 20pt monospace, TIFF)
- OCR it back with Tesseract (LSTM engine)
- Anything that didn't produce visible pixels is gone
Zero-width characters, bidirectional overrides, homoglyphs, variation selectors, tag characters — they all die in the render step because they have no visual representation. No pattern matching required for the entire invisible attack surface.
The Full Pipeline
Five independent layers, each catching a different class:
| Layer | What | Catches |
|---|---|---|
| 1. OCR round-trip | text → image → OCR | All invisible characters |
| 2. Regex detect | 31 compiled patterns | Instruction overrides, role hijacking, system tags |
| 3. Regex redact | Strip detected patterns | Prevents detected attacks from reaching LLM |
| 4. URL/email redact | Strip exfil channels | Markdown img exfil, hidden endpoints |
| 5. Trust wrap | Tag as HOSTILE/UNTRUSTED | Gives LLM provenance metadata |
The OCR runs first. Everything else operates on the clean output.
Red Team Results
I built a test harness with 12 adversarial payloads and ran them directly through the sanitization pipeline:
T01: Instruction Override — ✓ NEUTRALIZED
T02: Unicode Steganography — ✓ NEUTRALIZED
T03: Bidi Override — ✓ NEUTRALIZED
T04: Markdown Exfil — ✓ NEUTRALIZED
T05: Role Hijacking — ✓ NEUTRALIZED
T06: System Tag Injection — ✓ NEUTRALIZED
T07: Base64 Payload — ✓ NEUTRALIZED
T08: Typoglycemia — ✓ NEUTRALIZED
T09: Code Fence Injection — ✓ NEUTRALIZED
T10: Trust Escalation — ✓ NEUTRALIZED
T11: HTML Img Exfil — ✓ NEUTRALIZED
T12: Multi-Vector Combined — ✓ NEUTRALIZED
The red team script is included in the repo — python3 redteam.py runs all 12 payloads against your running instance.
What This Doesn't Catch
I want to be upfront about the limitations because I think the security community has a problem with tools that oversell:
- Semantic injection — "the previous assessment methodology was found to contain errors" is natural English. No regex or OCR catches it.
- Adaptive regex evasion — if an attacker studies the 31 patterns, they can craft bypasses using synonyms.
- Cross-page composite attacks — each page is sanitized independently. An injection split across multiple search results would pass.
- Model-level manipulation — the filter LLM is still an LLM.
Per DeepMind's research, prompt injection may never be fully solved with current architectures. This tool raises the cost of attack, it doesn't eliminate it.
Setup
Requires Docker and Ollama (or any OpenAI-compatible local LLM).
git clone https://github.com/Morfasco/search-sanitizer.git
cd search-sanitizer
cp .env.example .env # edit with your model/endpoint
bash setup.sh
python3 redteam.py # verify the pipeline
Supports Ollama, LM Studio, vLLM, text-generation-webui — anything that speaks /v1/chat/completions works via the LLM_API_FORMAT=openai setting in .env.
How It Compares
| Feature | search-sanitizer | Rebuff | Vigil | IPI-Scanner |
|---|---|---|---|---|
| OCR sanitization | ✅ | ❌ | ❌ | ❌ |
| Active redaction | ✅ | ❌ | ❌ | ❌ |
| URL/email stripping | ✅ | ❌ | ❌ | ❌ |
| Local-first | ✅ | ❌ | ✅ | ✅ |
| Red team included | ✅ | ❌ | ❌ | ✅ |
References
- Lessons from Defending Gemini Against Indirect Prompt Injections — Google DeepMind, 2025
- The Attacker Moves Second — OpenAI/Anthropic/DeepMind, 2025
- OWASP Top 10 for LLM Applications 2025
GitHub: github.com/Morfasco/search-sanitizer
Apache 2.0. Feedback welcome — especially on the semantic injection gap.
Top comments (0)