DEV Community

RESK
RESK

Posted on

Bitmask-Based LLM Security Firewall with reskSecure — Block Jailbreaks at Token Level

Links:


Most prompt injection and jailbreak guards work by scanning output text after the model has already generated it. This is too late — the damage is done, the tool call was made, the sensitive data was exfiltrated.

reskSecure takes a different approach: block at the logits level, before the model ever samples the first forbidden token.

How it works

reskSecure uses a bitmask-based policy engine. Each policy entry defines a YAML rule with:

  • A pattern to match using Aho-Corasick for multi-pattern search
  • A capability bitmask that determines which tools or outputs are allowed
  • A severity mode: hard (-inf logit) or bias (configurable penalty)

When a matching pattern is detected in the current token window, reskSecure either blocks the token entirely or applies a penalty to its probability. The model can never generate disallowed tool call tokens.

Quick example

# policies/block-pii.yaml
version: "1.0"
rules:
  - name: block-ssn
    patterns: ["SSN", "social security", "###-##-####"]
    severity: hard
    response: "This information cannot be shared for security reasons."

  - name: bias-unsafe-code
    patterns: ["eval(", "exec(", "__import__"]
    severity: bias
    bias_value: -5.0
    response: "This operation is restricted."
Enter fullscreen mode Exit fullscreen mode

Then use it as middleware:

from resksecure import SecurityFirewall

firewall = SecurityFirewall(
    policy_dir="./policies/",
    auto_reload=True
)

# Use as a logits processor
output = model.generate(
    input_ids=prompt,
    logits_processor=[firewall]
)
Enter fullscreen mode Exit fullscreen mode

Key features

  • Token-level blocking: model never generates forbidden tokens
  • Hot-reload YAML policies: change rules without restart
  • Bitmask capability system: fine-grained tool access control per request
  • Two severity modes: hard block or configurable bias penalty
  • Aho-Corasick engine: thousands of patterns searched simultaneously
  • Works with any HuggingFace model via logits processor API
  • Python 3.13+, PyTorch 2.0+

Install

pip install resksecure
Enter fullscreen mode Exit fullscreen mode

Security architecture insight

Most LLM security products scan the output text after generation. reskSecure operates at the logits tensor level — it modifies the output probability distribution before token sampling. This means:

  • No latency penalty from post-generation scanning
  • The model can never accidentally generate the first token of a forbidden sequence
  • Tool calls are blocked atomically at generation time

The bitmask approach extends naturally to multi-tenant deployments: each request gets its own capability mask, and the same firewall process enforces different policies per user context.

Conclusion

If you are deploying LLMs into production, a post-generation filter is not enough. Block at the token level with a hot-reloadable bitmask firewall.

Check the docs on resk.fr and star the repo on GitHub. Feedback and PRs welcome.


pip install resksecure
GitHub: https://github.com/Resk-Security/reskSecure
Web: https://resk.fr

Top comments (0)