If you are a Python developer then you are aware of pip. However, were you aware of the potential malware threat associated with Python's recommended package-management system? This article will discuss the security threats associated with pip and what you can do to protect yourself against them.
What is PIP?
Package Installer for Python or (pip) is the de facto and recommended package-management system written in Python and is used to install and manage software packages. It connects to an online repository of public packages, called the Python Package Index(
PyPI). For example, let's say you want to install the
request module. You would use the following syntax:
pip install request
This command will download the source code of the
request package and install it into your local Python environment, allowing you to utilize its functionality.
In general practice, Python developers will usually upload secure and ethical code to the PyPI repository. However, you would be surprised to know, there are no third-party checks on the code that is uploaded to PyPI. The only restriction is that once a package name exists, only the maintainer(s) can upload packages with that name. Meaning you can't submit a package using an already established name.
Unfortunately, this security feature can be exploited. In 2016, research proved that PyPI could be exploited through typosquatting. The researcher uploaded some harmless "simulation malware" to PyPI under names that were misspelled versions of popular package names, in order to collect data on how often these misspelled packages were installed. If a script kiddie or black-hat hacker was doing this then they could have used a much more malicious script.
Malware on PyPI
On July 28, 2022 researchers at Sonatype discovered malicious code on PyPI. The packages were named
"requesr," which are all common typosquats of
"requests" — a legitimate and widely used HTTP library for Python. Sonatype immediately reported this incident to PyPI's administrators, and two of the packages have since been removed.
According to the researchers at Sonatype, the packages (
requesys) was downloaded about 258 times, presumably by developers who made typographical errors when attempting to download the real
"requests" package. One version of the
requesys package contained the encryption and decryption code in plaintext Python. But a subsequent version contained a Base64-obfuscated executable that made analysis a little harder, according to Sonatype.
Nothing harmful found
According to Sonatype, developers who ended up with their system encrypted received a pop-up message instructing them to contact the author of the package for the decryption key. Victims were able to obtain the decryption key without having to make a payment for it. Which according to Sonatype, "makes this case more of a gray area rather than outright malicious activity."
Information on the hacker's Discord channel shows that at least 15 victims had installed and run the package.
A Growing Threat
This event is one of an increasing number of recent occurrences where hackers have hidden harmful code in commonly used software repositories in an effort to lure developers into downloading and installing it in their environments. For instance, Sonatype discovered in May that 300 developers had mistakenly downloaded
"Pymafka," a malicious program for disseminating Cobalt Strike, from the PyPI registry thinking it was the popular and trustworthy Kafka client
In July, researchers at Kaspersky discovered four information-stealing packages in the Node Package Manager (npm) repository.
Hopefully after reading this article, you now realize why it is important to pay close attention to what you download from public code repositories such as PyPI. Security researchers state that organizations must pay closer attention to their software supply chains — especially when it comes to using open source software from public repositories such as PyPI. Remember, as a Python developers, it is always your responsibility to ensure your packages are secure. Be very careful when typing out the names of popular libraries, as
typosquatting is one of the most common methods for this exploitation.
Additionally, it behooves you to take preemptive measure to protect your files in your day-to-day operations. Use a trusted antivirus and malware protection software, use secure & strong passwords, always use secure internet connections, and always, always, always — backup your data!
If you found this article helpful or have any questions please leave a comment.
Top comments (8)
Speaking of Sonatype, PyPI should take a lesson from how Sonatype manages the Maven Central Repository for Java artifacts. Every version of every package uploaded undergoes a vulnerability scan of its dependencies. It certainly doesn't guarantee that no malware will slip through. But it is a lot better than doing nothing.
Maven's coordinates system, where artifacts are also associated with what Maven calls a groupId, helps ensure you are downloading packages published by the right organization, individual, etc. The groupId is a reverse domain whose ownership was verified. So only owner of example.com can publish to Maven Central with groupId of com.example. To carry out a successful typosquat on Maven Central, the attacker would need to typosquat the domain name of the rightful publisher of the package, and then successfully get their typosquatted domain verified by Sonatype. The process for the latter involves a human giving you a ticket number to add to a dns entry. That human in the loop provides at least somewhat of a chance someone will notice the typo-variation of a legitimate groupId. Again it is no guarantee to block such typosquatting. But it makes it more difficult.
Hi Vicent, thank you for your comments! Based on what you described, I agree, PyPI should model how Sonatype manages the Maven Central Repository for Java. Something needs to be done! That is why I am trying to raise awareness about this vulnerability. Thanks again for your feedback!
Aside from the more severe case you describe of malware, there is also potential for accidentally importing similarly named but different legitimate package on PyPI. Since PyPI requires package names to be globally unique across all of PyPI, you end up with slight variations of a name for packages that do similar things.
It's not the same attack but it reminds me that, a couple of months ago, the CTX package has been hacked on PyPI.
Package hijacking is a serious threat even if platforms give advisories like this one.
The same platforms run regular checks and analysis to spot suspicious accounts and vulnerabilities, but hackers do not really care, as one update is usually enough to reach thousands of users, perhaps millions.
By the time the account gets suspended, the damage is done.
Thanks for your comments, I appreciate you sharing other examples of similar vulnerabilities. I've been programming in Python for a few years now, and I'm just recently hearing about this type of vulnerability. These type of vulnerability concern me because of how easy it is to execute. In my opinion, security threats like this should be addressed and patched asap, especially in a popular programming language such as Python.
You can only mitigate such threat. This is a major problem with the current software supply chain that is ever more complex.
Can I install malware with pip? جلب الحبيب بالنظر الى صورته بالجوال
There are services that can send notification about used components with vulnerabilities. For example alerts.vulmon.com/ (i do not know this service i just wanted to include an example)