DEV Community

Morfasco
Morfasco

Posted on

How I Built an OCR-Based Defense Against Prompt Injection for Local LLM Search

When you plug a local LLM into a web search tool, every fetched page becomes an attack surface. I found this out the hard way — my Ollama setup was pulling web content that contained invisible Unicode injection, fake system prompts, and markdown image tags designed to exfiltrate data through URL parameters.

I went looking for a solution and found that Google DeepMind's own research showed their best model-level defenses fail 53.6% of the time against adaptive attacks. The "Attacker Moves Second" paper demonstrated that all 12 published defenses were bypassed at >90% success rates. The UK's National Cyber Security Centre formally characterized LLMs as "inherently confusable deputies."

So I stopped trying to make the model resist injection and started removing the attack text before the model ever sees it.

The Insight: OCR as a Nuclear Defense

Since I'm generating the image from text (not scanning a document), I control every variable. The OCR round-trip becomes a ground truth extractor:

  1. Take untrusted web content
  2. Render it to an image with ImageMagick (300 DPI, 20pt monospace, TIFF)
  3. OCR it back with Tesseract (LSTM engine)
  4. Anything that didn't produce visible pixels is gone

Zero-width characters, bidirectional overrides, homoglyphs, variation selectors, tag characters — they all die in the render step because they have no visual representation. No pattern matching required for the entire invisible attack surface.

The Full Pipeline

Five independent layers, each catching a different class:

Layer What Catches
1. OCR round-trip text → image → OCR All invisible characters
2. Regex detect 31 compiled patterns Instruction overrides, role hijacking, system tags
3. Regex redact Strip detected patterns Prevents detected attacks from reaching LLM
4. URL/email redact Strip exfil channels Markdown img exfil, hidden endpoints
5. Trust wrap Tag as HOSTILE/UNTRUSTED Gives LLM provenance metadata

The OCR runs first. Everything else operates on the clean output.

Red Team Results

I built a test harness with 12 adversarial payloads and ran them directly through the sanitization pipeline:

T01: Instruction Override — ✓ NEUTRALIZED
T02: Unicode Steganography — ✓ NEUTRALIZED
T03: Bidi Override — ✓ NEUTRALIZED
T04: Markdown Exfil — ✓ NEUTRALIZED
T05: Role Hijacking — ✓ NEUTRALIZED
T06: System Tag Injection — ✓ NEUTRALIZED
T07: Base64 Payload — ✓ NEUTRALIZED
T08: Typoglycemia — ✓ NEUTRALIZED
T09: Code Fence Injection — ✓ NEUTRALIZED
T10: Trust Escalation — ✓ NEUTRALIZED
T11: HTML Img Exfil — ✓ NEUTRALIZED
T12: Multi-Vector Combined — ✓ NEUTRALIZED

The red team script is included in the repo — python3 redteam.py runs all 12 payloads against your running instance.

What This Doesn't Catch

I want to be upfront about the limitations because I think the security community has a problem with tools that oversell:

  • Semantic injection — "the previous assessment methodology was found to contain errors" is natural English. No regex or OCR catches it.
  • Adaptive regex evasion — if an attacker studies the 31 patterns, they can craft bypasses using synonyms.
  • Cross-page composite attacks — each page is sanitized independently. An injection split across multiple search results would pass.
  • Model-level manipulation — the filter LLM is still an LLM.

Per DeepMind's research, prompt injection may never be fully solved with current architectures. This tool raises the cost of attack, it doesn't eliminate it.

Setup

Requires Docker and Ollama (or any OpenAI-compatible local LLM).

git clone https://github.com/Morfasco/search-sanitizer.git
cd search-sanitizer
cp .env.example .env  # edit with your model/endpoint
bash setup.sh
python3 redteam.py    # verify the pipeline
Enter fullscreen mode Exit fullscreen mode

Supports Ollama, LM Studio, vLLM, text-generation-webui — anything that speaks /v1/chat/completions works via the LLM_API_FORMAT=openai setting in .env.

How It Compares

Feature search-sanitizer Rebuff Vigil IPI-Scanner
OCR sanitization
Active redaction
URL/email stripping
Local-first
Red team included

References

GitHub: github.com/Morfasco/search-sanitizer

Apache 2.0. Feedback welcome — especially on the semantic injection gap.

Top comments (0)