Open Source Malicious Packages: The Problem

#cybersecurity #opensource #devops #learning

Open Source Malicious Packages: The Problem
This is the first episode in a series of articles about the most prevalent kind of software supply chain attacks: those that (ab)use a public registry of software components, intended for open-source projects to upload artifacts that could be shared with other users.

When the bad guys publish malicious software there, using the registry as a vehicle for malware distribution, we have a supply chain attack when the victim organizations install or run the infected software component.

To simplify the discussion we will talk about software packages: components in a packaged form produced by third parties. This includes not only components used by package managers like NPM or Poetry, but also operating system components including libraries and executable binaries, container images, and virtual machines, or tool extensions for development, build, and deployment tools.

We have seen malicious packages everywhere. Cybercriminals do not mind: they are delighted by the alternatives provided by modern software infrastructures and use the registry and the tool that best fits their intent. So please remember that software packages are a shorthand for container images, binary packages, open-source repositories, and extensions or plugins of all sorts (IDEs, CI/CD systems, build tools). All are routinely under attack.

The Series: What to Expect

The series will have 5 episodes:

What Is the Problem With Open Source Packages?

This is the theme of this post. Why are criminals of all kinds publishing malicious packages? Why should I be concerned?
Anatomy of Malicious Packages: What Are the Trends?

In this episode, we focus on the threat we are monitoring with our MEW system, day after day. With a large background noise due to a large number of malicious packets using typosquatting or dependency confusion, a smaller percentage of attacks are much more insidious and pose a greater risk. How has the bad actors’ behavior regarding OS changed in the recent past? What are the numbers? What are the tactics, techniques, and procedures used, and the harmful actions seen?
Protecting Against Open Source Malicious Packages: What Does (Not) Work

Most security-aware professionals have ideas about how to handle this threat. In this episode, we will review why some common assumptions are wrong and which efforts and resources are really needed.
Open Source Malicious Packages: The Xygeni Approach

Here we present the strategy we follow at Xygeni for our Malware Early Warning (MEW) system, including how the system works, how evidence is gathered, triaged, and how classification is improved.
Exploiting Open Source: What To Expect From The Bad Guys

The series ends focusing on the newest actions the adversaries are embracing to make the attacks stealthier, more targeted, and how AI is being leveraged for advanced malware delivery.

Let’s open the stage with the first episode: What’s going on with malicious open source packages?

What Is the Problem With Open Source Packages?

In recent years, wrongdoers of all kinds used open source software registries for delivering malicious behavior. These activities are as old as open source, but their frequency exploded in the last three years.

Publishing malicious components into public registries (dependency-based attacks) is asymmetric guerrilla warfare that threat actors use to distribute malware, leveraging the trust that organizations put in open source components coming from unknown developers (remember the dependency xkcd comic?). Because you trust packages and do not manually review their contents and dependencies, these attacks are extraordinarily effective. The asymmetry comes from the fact that they can be largely automated, and the bad guys do not need to interact with the victim directly. They simply upload the package into the public registry and let it go.

Malicious packages surged by a 6x factor in 2022, and continued to grow by a 2.5x factor in 2023. Last year, a whopping 245,000 malicious packages were seen—a figure that more than doubles the total number from previous years combined. This is exponential growth! From package removals as confirmed malware in the hundreds during 2021 and in the thousands during 2022, we saw much more background “noise” during 2023, with a similar pace for this year. Hidden in that noise—caused by unsophisticated cybercriminals following the “path of least resistance”—a minority of high-profile attacks reached headlines even in general media.

Why is this a problem of such magnitude?

There is an excess of trust all over the chain. Open source software is distributed with its source code, and released under a given license. Yes, anyone can inspect the source code—but who does, at scale? Who, after inspecting that the software has no malware, builds the software from the sources? Who, before passing the packaged component (also known as a package) downstream to the package manager or build tool, makes sure that it is malware-free and matches the supposed source code?

Why does the infrastructure allow such easy attacks?

Package registries are open, often requiring minimal verification of the identity of the publisher. “Anyone is welcome to publish their software here!” The bar for attackers is set low: they use disposable email addresses and disposable GitHub accounts to create hundreds of malicious packages in short, phishing-like campaigns. Only for targeted ones a higher sophistication is needed: we’ve seen even credible GitHub source repositories with many stars and commits from multiple fake contributors and ...

Package managers were designed for ease of use, not for security. They can run pre- and post-install scripts (sometimes compiling native code for a library is necessary). Also, package managers install packages from multiple sources, and sometimes the default is to use public registries. They did not check for mismatches between the metadata in the publish request and the metadata in the package itself.

Dependencies are nested and form a graph. In certain ecosystems like Node (JavaScript), small-grained dependencies accumulate in the hundreds or thousands. One thing is to have strict control over direct dependencies declared by your software projects, but transitive dependencies are harder to control. Open source followed “the friends of my friends are my friends” logic. Brotherhood is the norm in the wild Far East! Threat actors know this and hide malicious behavior deep in obscure dependencies that are...

This is how open source software worked since its inception. It will not change much. Some package registries are demanding at best two-factor authentication, and often just for the most popular packages. Some registries provide scopes, a namespace owned by a vetted organization, but tragically others do not support it (like PyPI) or make it optional (like NPM).

It is interesting to note that even a simple screening scheme (based on control of the DNS or GitHub repository/organization matching the group ID) and making PGP signatures mandatory for all artifacts except checksums removes most of the “noise,” typosquatting-like malicious packages, and limits much of dependency confusion. Sophisticated attacks are possible but much harder, with only a few like the com.github.codingandcoding:maven-compiler-plugin known for Maven Central. And not all Maven registries follow...

Security controls on package managers may burden but do not impede dependency attacks. The problem with multi-factor authentication is that for automation, derived credentials like access tokens or API keys are generated for accounts to be used in API calls made from automation scripts, with no backing interactive user providing a second factor.

MFA is good for protecting user accounts from password leaks, but the generated access tokens or API keys need to be protected while active, or their owner will be impersonated by adversaries. A large fraction of package-based supply chain campaigns start with a leaked key or token. Just remember incidents like Ledger, 3CX, and many more, where non-interactive credentials were first exfiltrated in a preliminary intrusion for launching the supply chain attack.

The response was not robust enough

The response given to this threat was not robust enough. In the third episode, we will focus on what worked, and what failed miserably. The industry needs to work collectively on the standards, processes, education, and tooling to mitigate risks to global supply chains. This is not a problem a single organization can solve on its own.

To end this section, the crucial misunderstanding: we are talking about malicious packages, not vulnerable ones. Vulnerabilities come from design or coding errors, accidentally introduced, without bad intent. Vulnerabilities may be exploited, but many are not.

Malicious packages are always intentional, and there is 100% exploitability if they get executed. No comparable risk! Hence, it is paradoxical to see how many efforts are put into detecting and mitigating vulnerabilities, and the lack of equivalent measures for malicious components.

“We take security seriously”

Let’s imagine the customary Acme Corporation. Acme, a major provider for WileCoyote.com, has most of its software coming from third parties, with more than 80% from open source projects. They produce software for internal usage, but they also provide software for their partners, providers, and customers/end-users.

Acme has software written in Go, JavaScript, Java, C#, and Python, and runs most of its software on the cloud, under Kubernetes clusters. Acme builds its custom images from base images taken from Docker Hub and other registries. And they share a few libraries, packages, and container images in public registries as well.

Acme takes security seriously. They are pretty aware of the problem of open source security, and the risk it conveys. All developers, system managers, and DevOps engineers use those cute little crypto keys as second-factor authentication. All commits to code repos are signed, branch protection is enabled with mandatory code reviews, CI/CD locked, secrets stored in a vault, and with an internal registry partially mirroring external registries where only the allowed, white-listed components are stored.

Probably most organizations fit into this profile. Dear reader, yours certainly fits if you are yet here, isn’t it?

Then one ill-fated day, an important frontend developer at Acme ran npm install acme-cute-lib, forgetting that @acme/cute-lib was the right scoped dependency. The exact mistake is not important—many things may go wrong even when one assumes perfect control of the software lifecycle. Our developer did not know that an APT group was targeting Acme and published a malicious component under that name, in a cunning way so the malicious behavior activates only when the software is installed on Acme computers. The package was not detected for weeks after its publication.

An installation script is run that searches for credentials (there were many juicy access tokens on our developer’s laptop), allowing access to internal software repositories and the aforementioned internal registry, which of course is only accessible via VPN. The malicious code managed to use the existing VPN connection and publish a second-stage malicious component into the internal registry, affecting a common utils library shared by most of the software delivered by Acme.

Weeks later, other organizations using Acme’s published tools started seeing strange traffic on their networks, with traffic using Acme’s protocol but directed to hosts resembling the Acme domain. The traffic was encrypted, but system monitoring tools found access to unexpected files and the execution of processes that looked like system commands but which ended up running downloaded executables.

The rest is history: Acme first denied that such behavior was imputable to them and that all security measures were in place. Only after the cybersec media started asking why the source of the detected behavior originated from Acme’s components, and security analysis posted how riddled those components were with stealthy malware, Acme had to recognize the incident and called in an incident response firm. A negative marketing campaign that undermined hard-earned confidence in a second. “Acme was one npm install away from disaster” was a common headline. Then lawsuits and canceled contracts followed suit.

Why poisoned packages are so popular

This hypothetical incident shows that even with a reasonable approach to open source security, organizations need specific measures to avoid falling prey to malware in open source components. Schematically, the threat actor can:

Create a new package (typosquatting or dependency confusion—most common by volume).
Try to infect an existing one:
- Injecting code via pull requests.
- Using social engineering to become a maintainer.
- Gaining credentials and impersonating maintainers (e.g. right9ctrl or “Jao Tan”).
Inject malware during the build (via malicious scripts or MITM on downloads).
Directly upload a poisoned artifact using stolen registry credentials.

Poisoning registries with malware is the basis for dependency attacks. Nothing new under the sun: its prevalence exploded, but the same techniques work now as five years ago.

Source: Backstabber’s Knife Collection