The Peril of Unquoted Arguments
We often have the need to run commands in _*shells1 *_using Python. Subprocess is a cross-platform2 python module which helps you do this.
Shells commands are separated by command delimiters (such as ‘;’
‘&&’
‘||’
and ‘\n’
in POSIX3)
. Argument delimiters on the other hand define how a single command’s arguments are split. On POSIX-compliant systems the IFS
4environment variable defines the characters used to split arguments. By default, IFS is set to split on spaces, newlines and tabs.
When a path or argument to a command contains a space, the shell does not see a single continuous entity. It sees two distinct things. A single path has become two arguments.
Let us begin with a simple example.
Part 1: The Space as a Separator
Consider the creation of a directory with a space in its name.
When we use the os
module it understands the path as a single string because it does not involve the shell's interpretation. The directory is created as intended with a space in its name.
Now let us use the subprocess
module with the shell=True
flag. This flag instructs Python to pass our command to the shell for execution as a single string.
When this code runs, the shell sees the command as mkdir my new path
. The shell interprets this as a command to create a directory named my
a second directory named new
and a third named path
. The single path becomes three paths.
Part 2: The Command Injection
The danger is not only in misinterpretation but also in malicious injection.
Consider if the path came from an external source such as a user defined argument for example.
The shell sees mkdir my; rm -rf /
. The semicolon is a command separator in the shell. The shell will first execute mkdir my
and then it will execute rm -rf /
which deletes the root directory.
The unquoted path has allowed the user to inject a new command. This is a profound and dangerous failure of boundary. A simple space or semicolon can shatter the integrity of the system.
Part 3: The Principle of Quoting
To prevent this we must use quoting. Quoting places a protective barrier around the string telling the shell to treat it as a single unit regardless of its contents.
The shell now sees mkdir "my; rm -rf /"
. The entire string is treated as a single argument for mkdir
. No new directories are created. No commands are executed. The semicolon is rendered harmless a mere character within the string.
Part 4: The Path of Wisdom
The wise path is the one that avoids the shell entirely when not needed. Most linters (like ruff) will detect this for you.
Here we pass a list of arguments to subprocess.run
. Python does not pass a single string to the shell. Instead it executes mkdir
directly as a separate process and passes user_input
as its first argument. The shell is never involved and the risk is eliminated.
This is the preferred way. It is clean and safe.
Part 5: A Real-World Command Injection (CVE-2024-9287)
Command Injection is not just a theoretical problem. Last year, a high severity vulnerability was found to affect all Python versions <= 3.13:
A vulnerability has been found in the CPython
venv\
module and CLI where path names provided when creating a virtual environment were not quoted properly, allowing the creator to inject commands into virtual environment "activation" scripts (ie "source venv/bin/activate"). This means that attacker-controlled virtual environments are able to run commands when the virtual environment is activated. Virtual environments which are not created by an attacker or which aren't activated before being used (ie "./venv/bin/python") are not affected.
Previously, when a virtual environment was created, the activation scripts (activate
, activate.bat
, etc.) would use the environment name provided by the user to construct the venv
path without enclosing it in quotation marks.
For example, a name like my test venv
would be written into the script as /home/user/my test venv
. An attacker could craft a virtual environment name like my-venv-with-space-and-command; malicious_command
which would be interpreted by the shell as a path followed by an additional command to execute.
The fix5 (see summarized code below) was to ensure that all paths written into the venv
activation scripts are properly quoted using the shlex
module. The fix uses shlex.quote
to ensure that any special characters or spaces in the path are escaped or enclosed in single quotes. This prevents the shell from misinterpreting the path as separate arguments or commands.
Forging the Blade
The Master Blacksmith’s lesson was simple. Just as a piece of metal may appear to be a single piece of steel, but splits into shards when struck by a hammer; unquoted command strings are split into different arguments or commands by the shell.
When constructing commands in Python:
Avoid
shell=True
and usesubprocess.run([…])
insteadIf you must use
subprocess
withshell=True
, quote the command stringWhen constructing shell commands outside of
subprocess,
useshlex
to avoid command injection from untrusted input.
The blacksmith does not strike a piece of steel without thought. So too should the developer not pass a path to the shell without care.
Thanks for reading Python Koans! If you enjoyed this post, share it with your friends :)
1
A shell _is a program that provides an interface between the user and the operating system (OS). It’s called a _“shell” because it surrounds the kernel (the “core”) and lets you interact with it. The shell takes commands (from your keyboard, a script, or another program), interprets them, and asks the OS to run the corresponding programs or built-in functions.
2
The subprocess
module is available on all platforms except mobile (i.e. Android, iOS) and webassembly (i.e. WASI).
3
POSIX is an IEEE standard (IEEE 1003) defining a common API and shell behavior for Unix-like systems. It ensures that programs and scripts written on one POSIX-compliant system will work on another.
4
IFS can be made to split on other characters by changing it’s value before running a command. For example: IFS=, read a b c <<< "one,two,three"; echo "$a | $b | $c"
will split the comma-delimited string into the three variables a, b and c.
5
Because the issue affected all versions of Python, a patch was created for all 3.x versions. The commit diff for Python 3.12 can be found here.
Top comments (0)