The Novo Nordisk breach hit patient data and proprietary AI models. The attack surface is expanding.

#security #ai #devops #discuss

What happened

Novo Nordisk confirmed a cyberattack this week. The Danish pharmaceutical company behind Ozempic and Wegovy disclosed that attackers accessed internal IT systems and exfiltrated pseudonymized patient data from clinical trials.

The confirmed breach includes patient IDs, sex, year of birth, biomarkers, health data, immunogenicity data, and lifestyle factors like BMI and smoking status. No direct identifiers like names were exposed. Healthcare professional data was also hit, including names, registration numbers, emails, phone numbers, and office locations.

What the attackers are claiming

That's the official disclosure. The interesting part is what goes beyond it.

A group calling itself Dragonfly says it got significantly deeper than patient records. According to screenshots they shared, the stolen data allegedly includes a 16.7 GB trained AI model checkpoint for an internal multimodal model called NovoPert covering text, image, and transcriptomics data. A 407 MB proprietary biological and chemical training dataset. Full source code for the training pipeline. 113 training runs with complete logs. Internal infrastructure maps covering HPC clusters, Slurm configs, and SSH configurations. Over 53 GB of container images. Developer identities, internal hostnames, and a private GitHub repo URL.

Novo Nordisk hasn't confirmed any of the AI-related claims. No ransomware has been identified.

AI assets are now high-value targets

If the Dragonfly claims are accurate, the AI assets are arguably more valuable than the patient data. A proprietary model trained on biological and chemical data in the pharmaceutical space represents months or years of R&D investment. That's worth enormous money to competitors or on the black market. This changes the threat model for any organization developing internal AI systems.

AI development infrastructure is an unaudited attack surface
HPC clusters, Slurm job schedulers, training pipelines, container registries, model artifact stores. These are all systems that most security teams weren't auditing five years ago because they didn't exist in most organizations. Now they hold some of the most valuable intellectual property the company owns and they're often configured by ML engineers and data scientists, not security-focused infrastructure teams.

Leaked container images are architecture blueprints

The container image leak (53 GB+) is potentially devastating beyond the model itself. Container images routinely contain embedded credentials, environment variables, internal network configurations, and dependency chains that reveal the entire stack. A single leaked production container image can give an attacker a blueprint of your internal architecture.

Developer identities enable follow-on attacks

The "developer identities and private GitHub repo URL" part is the quiet detail that enables what comes next. If attackers have developer identities and know where the code lives, supply chain attacks against those specific developers become trivial to plan. Phishing a developer whose name, email, and repo access you already know is a very different proposition from attacking blind.

Why this matters beyond Novo Nordisk

The patient data breach is serious on its own. But if the AI asset claims are real, this is one of the first major breaches where proprietary AI intellectual property was the primary target, not a side effect. That's a shift that every team building internal AI systems should be paying attention to.

Source: cybersecuritynews.com

How is your team thinking about securing AI training infrastructure? Is it part of your threat model yet, or is it still treated as an internal research environment?