Modern machine learning workflows rely heavily on pretrained models—downloaded from GitHub, HuggingFace, and countless other model hubs. This convenience comes with a growing risk: model tampering, data poisoning, and hidden backdoors embedded in .pth checkpoints.
To address this problem, we built Mithridatium, a lightweight open-source framework designed to verify the integrity of pretrained neural networks before they enter production or research pipelines.
Why Mithridatium?
Today’s ML ecosystem assumes that pretrained models are safe. In reality, the model file itself can be a silent attack vector:
• poisoned training data
• hidden triggers that activate under specific inputs
• manipulated weights
• malformed checkpoints that cause unexpected runtime behavior
Mithridatium provides a command-line workflow to evaluate these risks through model-centric defenses, inspired by academic research, but simplified for real-world use.
Offline Usage
Once installed, Mithridatium can run entirely offline.
You only need:
1. Your .pth model file
2. A local dataset directory (optional for STRIP; required for MMBD depending on configuration)
This makes the tool suitable for restricted environments, air-gapped machines, or secure internal ML pipelines.
Installation
Install from PyPI:
pip install mithridatium
Upgrade to the latest release:
pip install --upgrade mithridatium
Implemented Defenses
- MMBD (Maximum Mean Backdoor Detection)
MMBD evaluates synthetic class-optimized images to detect anomalous activation patterns commonly associated with backdoored models.
The implementation exposes:
• per-class eigenvalue scores
• normalized anomaly distributions
• classical hypothesis testing (p-value)
• a deterministic verdict
Example invocation in our tool:
mithridatium detect --model model.pth --defense mmbd --arch resnet18 --data cifar10
- STRIP (Strong Intentional Perturbation)
STRIP is a black-box defense: it does not rely on internal architectural details. Instead, it evaluates the prediction entropy when the model is exposed to heavily perturbed variants of the same input. Backdoored models typically exhibit abnormally low entropy under perturbation, due to a forced output toward the trigger class.
Our implementation includes:
• entropy computation on perturbed samples
• sampling and perturbation utilities
• summary metrics (mean, min, max entropy)
• integration into a unified reporting schema
Example invocation in our tool:
mithridatium detect --defense strip --model model.pth --data cifar10 --arch resnet18
Recent Advancements
The most recent development cycle added the following enhancements:
- STRIP Core Utility
A modular implementation inside defenses/strip.py that handles entropy scoring, perturbation generation, and device-safe execution (CPU/MPS/CUDA).
- CLI Integration
STRIP can now be invoked just like MMBD, with unified reporting and JSON output.
- Output Schema Normalization
We’re standardizing all defenses toward a single report format to enable ecosystem integration.
- End-to-End CLI Tests
Full test coverage ensures STRIP runs cleanly through subprocess without crashes.
What’s Next
With the major defenses complete, the remaining work is focused on:
• improving documentation
• adding developer notes
• refining report summaries
• strengthening validation and error messaging
We’re not adding new defenses till next year; instead, we’re polishing the tool so it is maintainable and accessible to new contributors.
Try it Yourself
The project is open-source and available here:
Contributions, issues, and feedback are welcome.
If you’re working with pretrained models—research, deployment, or security, you should not assume integrity. Mithridatium helps you verify it. You can read detailed explanations, defense theory, and usage examples in the repository’s README.
Top comments (0)