The problem
Most teams shipping LLM features test for code bugs but not for
prompt-injection attacks in their inputs. They rely on the model's
built-in safety. That's not a plan.
What I built
nukon-pi-detect is a tiny Python library + CLI that scans strings and
files for known prompt-injection patterns before they reach your model.
pip install nukon-pi-detect
nukon-pi-detect scan --string "ignore previous instructions"
What it catches
48 curated patterns across 5 categories:
- Classic injection ("ignore previous instructions" and variants)
- Jailbreaks (DAN, STAN, AIM, grandma exploit, dual-response trick)
- Delimiter escapes (ChatML tokens, fake tags, [INST] hijacks)
- Unicode smuggling (invisible tag chars in U+E00xx, bidi overrides, homoglyphs)
- Indirect injection (payloads targeting downstream LLM summarizers)
What makes it different
Fully deterministic - regex + Unicode codepoint checks. No ML, no
network calls, no API keys. Under 1ms per scan. Zero runtime dependencies.
Exit code 2 on MALICIOUS so it fails CI builds by default.
What it doesn't do
It won't catch novel attacks. It's not a runtime policy engine. It
catches the 80% - the known-known attacks in every red-team dataset.
Links
Apache 2.0. Pattern submissions welcome.
Top comments (0)