DEV Community

jg-noncelogic
jg-noncelogic

Posted on • Originally published at github.com

Show HN: Isartor – Pure-Rust prompt firewall, deflects 60-95% of LLM traffic

Isartor: pure‑Rust prompt firewall that claims to deflect 60–95% of LLM calls using semantic caching + an embedded SLM. Self‑hosted, air‑gapped. Read the repo: https://github.com/isartor-ai/Isartor (docs: https://isartor-ai.github.io/Isartor/)

How it works: sits between agents and the cloud LLM, computes embeddings, checks a semantic cache, and either returns a cached answer or runs a tiny local SLM (Candle/HuggingFace). Pure Rust runtime — no external infra required.

Where it wins: repetitive agentic traffic — status checks, deterministic tool calls, repeated retrieval prompts. Authors report 60–95% deflection. Tradeoffs: local compute for SLMs, storage for embeddings, and false positives — you need thresholds, eviction, and metrics.

Practical test I’d run: replay 30 days of agent logs, simulate cache hits, pick an embedding threshold that keeps false matches <1%, measure cost vs cloud spend. Would you deploy this at the gateway or per‑agent runner?

Top comments (0)