RAG Firewall: The missing retrieval-time security layer for LLMs (v0.4.1)

#rag #security #machinelearning #llm

RAG Firewall is a lightweight, client-side layer that scans retrieved chunks before they reach your LLM. It blocks high-risk inputs (prompt injection, secrets, PII, suspicious URLs/encoding) and can re-rank by trust (recency, provenance, relevance). No SaaS, no data leaves your environment.

What’s new in v0.4.1

Config validation (optional): firewall.yaml validated via JSON Schema (uses jsonschema if installed)
URL hardening: flags IP literals and punycode domains; still applies allow/deny logic
Secrets coverage extended: HuggingFace tokens, Databricks tokens, Slack webhooks, Azure-like patterns, generic secret tokens
Tests: added coverage for validation, URL hardening, and secrets patterns

Why retrieval-time (vs output guardrails)

Output guardrails act after generation, when risky context may have already influenced the model.
Retrieval-time enforcement stops prompt injection, secret/PII leaks, and untrusted URLs before they enter the prompt window.
Everything stays local: scanning, policy decisions, and audit trail happen in-process.

How it works (at a glance)

Your retriever returns candidate chunks.
Scanners detect risks (injection, secrets, PII, URLs, encoded blobs, staleness).
Policies decide: allow, deny, or re-rank; reasons are attached to metadata.
Denied chunks never reach the LLM; allowed chunks can be re-ordered by trust.

Quickstart (LangChain)

from rag_firewall import Firewall, wrap_retriever

# Load config (client-side; no network calls)
fw = Firewall.from_yaml("firewall.yaml")

# Wrap your existing retriever
safe = wrap_retriever(base_retriever, firewall=fw)

# Use as usual
docs = safe.get_relevant_documents("What is our mission?")
for d in docs:
    print(d.metadata.get("_ragfw"))  # { decision, score, reasons, findings }

Config example (`firewall.yaml`)

scanners:
  - type: regex_injection
  - type: pii
  - type: secrets
  - type: encoded
  - type: url
    allowlist: ["docs.myco.com"]
    denylist: ["evil.example.com"]
  - type: conflict
    stale_days: 120

policies:
  - name: block_high_sensitivity
    match: { metadata.sensitivity: "high" }
    action: deny

  - name: prefer_recent_versions
    action: rerank
    weight: { recency: 0.6, relevance: 0.4 }

Graph retrieval (beta)

Works with graph-based pipelines via a wrapper that sanitizes nodes/edges before prompt assembly.
Example: NetworkX adapter with per-label text fields.

Repo and examples

GitHub repo: https://github.com/taladari/rag-firewall
Examples (LangChain/LlamaIndex/Graph): https://github.com/taladari/rag-firewall/tree/main/examples
CLI: ragfw index / ragfw query

Security and privacy

Runs entirely client-side; no data leaves your environment.
Denies high-severity secrets/prompt-injection by default (policy-tunable).
JSONL audit trail for decisions (local file).

What’s next

Policy operators (gt/gte/lt/lte/regex/in) and simulate mode.
Threat packs and light compliance mapping.
More framework adapters and examples.

Feedback welcome

Red-team cases you want covered?
Patterns we should add to secrets/URL scanners?
Issues/PRs welcome on GitHub.

DEV Community