Valerie Kuzmina for JetBrains Qodana

Posted on Mar 10, 2023 • Originally published at blog.jetbrains.com

Secure Your PHP Code With Taint Analysis by Qodana

#codereview #php #security #codequality

It only takes one user to exploit a vulnerability in your project and breach your system. To defend programs against malicious inputs from external users (known as “taints”), development teams add taint checking to their static analysis routines.

In this year’s first release, the Qodana team has delivered taint analysis for PHP in the EAP. The feature is available only in Qodana for PHP 2023.1 (jetbrains/qodana-php:2023.1-eap). Qodana for PHP was the first linter we released, so we decided to let PHP developers be the first to test our new security functionality, too. We plan on adding more languages in the future, after we’ve collected enough feedback.

Read on to learn more about what taint analysis is and how it works in Qodana.

GET STARTED WITH QODANA

What is taint analysis?

A taint is any value that can pose a security risk when modified by an external user. If you have a taint in your code and unverified external data can be distributed across your program, hackers can execute these code fragments to cause SQL injection, arithmetic overflow, cross-site scripting, path traversal, and more. Usually they exploit these vulnerabilities to destroy the system, hijack credentials and other data, and change the system’s behavior.

Example of a taint. Arbitrary data from the GET parameter is displayed on the screen. For example, malicious users can exploit this vulnerability to tamper with your program’s layout.

As an extra layer of defense against malicious inputs, development teams execute taint analysis when they run a security audit on the program’s attack surface.

Taint analysis is the process of assessing the flow of untrusted user input throughout the body of a function or method. Its core goal is to determine if unanticipated input can affect program execution in malicious ways.

Taint sources are locations where a program gets access to potentially tainted data. Key points in a program that are susceptible to allowing tainted input are called taint sinks. This data can be propagated to the sinks via function calls or assignments.

If you run taint analysis manually, you should spot all of the places where you accept data from external users and follow each piece of data through the system – the tainted data can be used in dozens of nodes. Then, to prevent taint propagation, you should take one of the two approaches described below:

Sanitize the data, i.e. transform data to a safe state. In the example below, we removed tags to resolve the taint.

Validate the data, i.e. check that the added data conforms to a required pattern. In the example below, we enable validation for the $email variable.

In other words, the taint analysis inspection traces user-tainted data from its source to your sinks, and raises the alarm when you work with that data without sanitizing or validating it.

How taint analysis works in Qodana

Taint analysis is performed by Qodana for PHP starting from version 2023.1 EAP. This functionality includes an inspection that scans the code and highlights the taint and potential vulnerability, the ability to open the problem in PhpStorm to address it on the spot, and a dataflow graph visualizing the taint flow.

Example #1. SQL injection

Let’s take a look at an example of SQL injection and how Qodana detects it:

Here, Qodana shows us the following taints in the system_admin() function:

Markers 1-2: Data from user form input is retrieved from the $_POST global array with no sanitization or validation and is assigned to the variable $edit. This is a taint.

Marker 3: The tainted variable $edit is passed to the system_save_settings function as an argument without any proper sanitization.

Marker 4: Data from the $edit variable is now located in the $edit parameter.

Marker 5: The $edit variable is passed to foreach with the $filename key and $status value. Both variables contain the tainted data from the $edit variable concatenated with the string. The $filename key is concatenated with a tainted SQL string, and then it will propagate tainted data into an argument passed to the db_query.

Marker 6: The $ filename key contains the tainted data from the $edit variable concatenated with the string.

Marker 7: The $ filename key is concatenated with a tainted SQL string.

Marker 8: Tainted SQL string will propagate tainted data into an argument passed to the db_query

Let’s now look at the db_query:

Marker 9: The tainted string will be located in the $query parameter.

Marker 10: This parameter is going to be an argument of the _db_query function.

Let’s move on to the _db_query function:

Marker 11: Tainted data located in the first parameter $ query of the _db_query function.

Marker 12: Data of the parameter is passed to the mysql_query function, which is a sink.

The whole data flow above illustrates how data moves from $_POST[“edit”] to the mysql_query($query) without any sanitization or validation. This allows the attacker to manipulate the SQL query which was concatenated with a key of $_POST[“edit”] and trigger SQL injection.

Qodana will spot these risks in your codebase along with all nodes where tainted data is used, so you can sanitize all tainted data in a timely manner.

Example #2. XSS problem

In the Qodana UI, you can see a graph that visualizes the entire taint flow. Here’s how Qodana will visualize the XSS vulnerability, which contains 2 sources that would be merged on marker 5.

Source 1

Markers 1-2: Data from the searchUpdate.pos file will be read and tainted data will be assigned to the $start variable.

Source 2

Markers 3-4: Data from files whose path is located in $posFile will be read and tainted data will be assigned to the $start variable.

Marker 5: A merged tainted state from all conditional branches in the $start variable will be passed as an argument to the doUpdateSearchIndex method.

Let’s look inside the doUpdateSearchIndex() method:

Markers 6-8: The $ start parameter will contain tainted data on this dataflow slice and then it will be passed within a concatenated string as an argument to the output method.

Let’s look inside the output method:

Marker 9: Tainted data contained inside the transmitted string will be located in the $out parameter.

Marker 10: Data from the $out parameter will be transferred to the print function without any sanitization. This function is a sink and causes XSS vulnerability, which can be exploited.

To exploit the vulnerability, an attacker can, for example, upload a shell script instead of the expected files in markers 1 and 2, and will be able to put any information onto the web page as a result of an unsanitized print function.

Qodana will alert you to this vulnerability and give it a high priority so that you can resolve it as soon as possible and prevent the hack.

Conclusion

Taint analysis helps eliminate exploitable attack surfaces, so it’s an effective method to reduce risk to your software. To learn about taint analysis and Qodana in detail, explore Qodana documentation.

Happy developing and keep your code healthy!

GET STARTED WITH QODANA