Pelumi Oluwategbe

Posted on Dec 3, 2025

Mithridatium: An Open-Source Toolkit for Verifying the Integrity of Pretrained Machine Learning Models

#ai #machinelearning #security #neuralnetworks

Modern machine learning workflows rely heavily on pretrained models—downloaded from GitHub, HuggingFace, and countless other model hubs. This convenience comes with a growing risk: model tampering, data poisoning, and hidden backdoors embedded in .pth checkpoints.

To address this problem, we built Mithridatium, a lightweight open-source framework designed to verify the integrity of pretrained neural networks before they enter production or research pipelines.

Why Mithridatium?

Today’s ML ecosystem assumes that pretrained models are safe. In reality, the model file itself can be a silent attack vector:
• poisoned training data
• hidden triggers that activate under specific inputs
• manipulated weights
• malformed checkpoints that cause unexpected runtime behavior

Mithridatium provides a command-line workflow to evaluate these risks through model-centric defenses, inspired by academic research, but simplified for real-world use.

Offline Usage

Once installed, Mithridatium can run entirely offline.

You only need:
1. Your .pth model file
2. A local dataset directory (optional for STRIP; required for MMBD depending on configuration)

This makes the tool suitable for restricted environments, air-gapped machines, or secure internal ML pipelines.

Installation

Install from PyPI:

pip install mithridatium

Upgrade to the latest release:

pip install --upgrade mithridatium

Implemented Defenses

MMBD (Maximum Mean Backdoor Detection)

MMBD evaluates synthetic class-optimized images to detect anomalous activation patterns commonly associated with backdoored models.

The implementation exposes:
• per-class eigenvalue scores
• normalized anomaly distributions
• classical hypothesis testing (p-value)
• a deterministic verdict

Example invocation in our tool:

mithridatium detect --model model.pth --defense mmbd --arch resnet18 --data cifar10

STRIP (Strong Intentional Perturbation)

STRIP is a black-box defense: it does not rely on internal architectural details. Instead, it evaluates the prediction entropy when the model is exposed to heavily perturbed variants of the same input. Backdoored models typically exhibit abnormally low entropy under perturbation, due to a forced output toward the trigger class.

Our implementation includes:
• entropy computation on perturbed samples
• sampling and perturbation utilities
• summary metrics (mean, min, max entropy)
• integration into a unified reporting schema

Example invocation in our tool:

mithridatium detect --defense strip --model model.pth --data cifar10 --arch resnet18

Recent Advancements

The most recent development cycle added the following enhancements:

STRIP Core Utility

A modular implementation inside defenses/strip.py that handles entropy scoring, perturbation generation, and device-safe execution (CPU/MPS/CUDA).

CLI Integration

STRIP can now be invoked just like MMBD, with unified reporting and JSON output.

Output Schema Normalization

We’re standardizing all defenses toward a single report format to enable ecosystem integration.

End-to-End CLI Tests

Full test coverage ensures STRIP runs cleanly through subprocess without crashes.

What’s Next

With the major defenses complete, the remaining work is focused on:
• improving documentation
• adding developer notes
• refining report summaries
• strengthening validation and error messaging

We’re not adding new defenses till next year; instead, we’re polishing the tool so it is maintainable and accessible to new contributors.

Try it Yourself

The project is open-source and available here:

👉 mithridatium

Contributions, issues, and feedback are welcome.
If you’re working with pretrained models—research, deployment, or security, you should not assume integrity. Mithridatium helps you verify it. You can read detailed explanations, defense theory, and usage examples in the repository’s README.

DEV Community

Mithridatium: An Open-Source Toolkit for Verifying the Integrity of Pretrained Machine Learning Models

Top comments (0)