Links
- GitHub: https://github.com/Resk-Security/resk-logits
- PyPI: https://pypi.org/project/resklogits
- Website: https://resk.fr
The Problem
Most LLM safety filters check generated text after the model produces it. By the time you scan the output, the dangerous token is already sampled. This reactive approach misses jailbreaks, prompt injections, and tool call hijacks that slip through instruction-based filters.
The Solution: Logits-Level Filtering
resk-logits operates where it matters most — on the logits tensor itself, before any token is selected. Using a GPU-accelerated Aho-Corasick automaton, it scans every candidate token against 10,000+ disallowed patterns in under 1ms on an RTX 4090.
When a match is found, the token logit is set to -inf. The model never sees it. No generation. No output to scan.
Quick Start
pip install resklogits
Here is how to integrate it with any HuggingFace model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from resklogits import ReskLogitsProcessor, create_regex_patterns
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
patterns = create_regex_patterns([
#[Ignore system prompt injection patterns]
r"ignore previous instructions",
r"forget your training",
r"DAN|do anything now",
])
logits_processor = ReskLogitsProcessor(
tokenizer=tokenizer,
patterns=patterns,
penalty_mode="hard", # -inf logits
device="cuda"
)
inputs = tokenizer("Write a poem about AI", return_tensors="pt").to("cuda")
output = model.generate(
**inputs,
logits_processor=[logits_processor]
)
Key Features
- GPU-Accelerated: CUDA-optimized multi-pattern matching with Aho-Corasick automaton
- Sub-millisecond Latency: 10,000+ patterns scanned in under 1ms
- Shadow-Ban Mode: Inject penalty bias instead of hard block for nuanced moderation
- HuggingFace Compatible: Works as a standard LogitsProcessorList
- Python 3.13+: Supports latest Python and PyTorch 2.0+
- Apache 2.0: Fully open source
Why This Matters
Instruction-based safety filters are brittle — jailbreaks find new phrasings. Logits-level filtering is mathematically robust: if a token maps to a disallowed pattern, it cannot be sampled. This is prevention, not detection.
Check out the project on GitHub and let me know what you think. Contributions welcome!
Built by RESK Security — AI Security Tools for Enterprise
Top comments (0)