Bitmask-Based LLM Security Firewall with reskSecure — Block Jailbreaks at Token Level

#security #llm #python #cybersecurity

Links:

PyPI: https://pypi.org/project/resksecure
GitHub: https://github.com/Resk-Security/reskSecure
Web: https://resk.fr

Most prompt injection and jailbreak guards work by scanning output text after the model has already generated it. This is too late — the damage is done, the tool call was made, the sensitive data was exfiltrated.

reskSecure takes a different approach: block at the logits level, before the model ever samples the first forbidden token.

How it works

reskSecure uses a bitmask-based policy engine. Each policy entry defines a YAML rule with:

A pattern to match using Aho-Corasick for multi-pattern search
A capability bitmask that determines which tools or outputs are allowed
A severity mode: hard (-inf logit) or bias (configurable penalty)

When a matching pattern is detected in the current token window, reskSecure either blocks the token entirely or applies a penalty to its probability. The model can never generate disallowed tool call tokens.

Quick example

# policies/block-pii.yaml
version: "1.0"
rules:
  - name: block-ssn
    patterns: ["SSN", "social security", "###-##-####"]
    severity: hard
    response: "This information cannot be shared for security reasons."

  - name: bias-unsafe-code
    patterns: ["eval(", "exec(", "__import__"]
    severity: bias
    bias_value: -5.0
    response: "This operation is restricted."

Then use it as middleware:

from resksecure import SecurityFirewall

firewall = SecurityFirewall(
    policy_dir="./policies/",
    auto_reload=True
)

# Use as a logits processor
output = model.generate(
    input_ids=prompt,
    logits_processor=[firewall]
)

Key features

Token-level blocking: model never generates forbidden tokens
Hot-reload YAML policies: change rules without restart
Bitmask capability system: fine-grained tool access control per request
Two severity modes: hard block or configurable bias penalty
Aho-Corasick engine: thousands of patterns searched simultaneously
Works with any HuggingFace model via logits processor API
Python 3.13+, PyTorch 2.0+

Install

pip install resksecure

Security architecture insight

Most LLM security products scan the output text after generation. reskSecure operates at the logits tensor level — it modifies the output probability distribution before token sampling. This means:

No latency penalty from post-generation scanning
The model can never accidentally generate the first token of a forbidden sequence
Tool calls are blocked atomically at generation time

The bitmask approach extends naturally to multi-tenant deployments: each request gets its own capability mask, and the same firewall process enforces different policies per user context.