A empirical study analyzing ~45,000 model repos across five major model-sharing platforms finds that custom model-loading code (the same code users run when they call from_pretrained with trust flags) is common, often executes arbitrary Python during initialization, and frequently contains security “smells” that enable remote code execution (RCE).
Why this matters:
Model hubs made ML development frictionless, but that convenience creates a supply-chain attack surface: importing a model can implicitly import and execute remote Python modules (tokenizer, modeling_*.py, hubconf, etc.), which run with the privileges of your process. That elevates risks from data leakage to full host compromise, especially in environments with mounted credentials or broad network egress.
Key takeaways:
• Prevalence: A measurable fraction of shared models rely on custom code that will be executed during loading, this is not rare in modern hubs.
• Concrete findings: Static analysis (Bandit, CodeQL, Semgrep) and signature scans (YARA) reveal widespread security smells and potential vulnerabilities in model repos.
• Platform gaps: Warning UIs, trust flags, and automated scanning are unevenly implemented, defaults and UX often favor convenience over safety.
• Developer confusion: Many maintainers and users misunderstand the implications of “trusting” remote code, creating a human-factor risk in addition to technical exposure.
Practical steps teams could take:
• Assume code executes. Treat any model repo that supplies Python modules as potentially executable until proven otherwise.
• Audit before load. Download (snapshot_download) and inspect tokenizer*.py, modeling_*.py, requirements.txt and any top-level scripts before importing.
• Pin revisions. Use commit hashes/tags; avoid main/latest.
• Isolate runtime. Load untrusted models only inside ephemeral, network-blocked containers/VMs with no mounted credentials, dropped privileges, and hardened kernel policies (seccomp/AppArmor).
• Static scan. Run lightweight checks (grep for subprocess, network/socket I/O, open("~/.aws"), use Semgrep/Bandit/CodeQL rules) as a gate.
•Prefer data-only tokenizers. When possible use tokenizer JSON/safetensors and trust_remote_code=False.
• Platform controls. Platform operators should strengthen warnings, enforce richer metadata, and provide opt-in gating for executable content.
Bottom line:
Model-sharing ecosystems are powerful but they are also code distribution platforms. Loading a model can be equivalent to running a third-party script on your host, treat it with that level of scrutiny.
Top comments (0)