Israël Hallé for Flare

Posted on Jul 21, 2022

Typing your way into safety

#python #mypy #security

I've been working with Python typing annotation in the last few years as part of our main product at Flare Systems. I've found it to be a wonderful tool to support refactoring and make the code more readable. Lately, I explored how we can make API safer with the uses of types. I will specifically look about how we can use Python typing annotation to make os.system foolproof.

As a starting point, the current type of system is:

def system(command: StrOrBytesPath) -> int: ...

This typing is correct. It has the benefits of catching any call not using a string that would be bound to fail. But this doesn't check for any misuse such as passing unsanitized user input. For example, someone might want to use ImageMagick to resize an image:

def resize(size: str):
  system(f"convert INPUT -resize {size} OUTPUT")

def api(request):
  resize(request.args["size"])

Unfortunately, this simple implementation introduced a critical flaw in our application. A user could use a malicious size such as $(echo hacked). The size would then insert itself in the command and execute the following command: convert INPUT -resize $(echo hacked) OUTPUT. This exact vulnerability pattern is still very common to this day.

The fix is as simple as the mistake. shlex.quote can be used to ensure a string is used as a single string token in a command. Yet, there's no explicit check in system to ensure that the command has escaped user inputs.

Fortunately, we can think of ways to improve this. First of all, we can split all types into two categories: Safe and unsafe. As seen, using user input as system argument is unsafe. But, passing a literal string should be somewhat safer. At least, literals lead to predictable behavior. If you execute the rm --no-preserve-root -rf / literal you can predict that it will wipe your disk.

Typing annotation users might already be familiar with the Literal type. As a quick reminder, the literal type allows developers to type a variable with a literal value. This is useful for functions that might take only a finite amount of known literal value. For example, system could have used it this way:

def system(command: Literal["ls"] | Literal["id"]): ...

system("ls")  # ok
system("id")  # ok

system("rm -rf")  # error!
system(request.args["size"])  # error!

Note that in practice I usually favor Enum. This usually lead to safer code since it also check the value are correct at runtime.

One nice thing with this concept is that the type checker will not allow passing a str when expecting a Literal. The big limitation is that Literal only work on concrete literal. There's no way to set the type of a variable to take any literal of a type:

def system(command: Literal) -> int: ...  # error: Literal[...] must have at least one parameter

In our case this is quite restrictive since we want system to run any safe command. Until Python 3.10, there were no built-in ways to have a function that only takes Literal arguments. That is, before Python 3.11 adds the LiteralString type. LiteralString allows a variable to accept only literal strings.

def system(command: LiteralString) -> int: ...

system("convert INPUT OUTPUT")  # ok
system(f"convert INPUT -resize {size} OUTPUT")  # error!

At the time of publishing, Mypy define LiteralString as an alias to str. Thus, the latest version of Mypy with Python 3.11 won't catch any error in the snippets above and below.

It still limits us to literal values. Going back to our use case, we want to be able to pass in the size of the image. It is actually possible to make size safe by sanitizing the value for shell use. These are the usual quote or escape function that takes user input and return strings that are safe to use. For shells, Python has the shlex.quote function available. The input and output of these functions have different safeness property. It would be interesting to reflect this difference in the types:

ShellQuotedString = NewType("ShellQuotedString", str)

def quote(value: str) -> ShellQuotedString: ...

Here we introduce a new type that includes the safety property. Python includes the NewType tool to easily create a new type from an existing one. This new type can be used wherever the base time is used, but not the other way:

safe: ShellQuotedString = ShellQuotedString("This string is safe")
unsafe: str = safe  # ok
safe = unsafe  # error: Incompatible types in assignment

Now we have both kinds of safe data: Literal and quoted data. For ease of use, we can alias both to an enum:

ShellString = LiteralString | ShellQuotedString

This enum is all the safe types of command that system can execute. This ensures that a developer has to think about quoting user input before passing them to system.

def system(command: ShellString) -> int: ...

system("convert INPUT OUTPUT")  # ok
system("convert INPUT -resize {quote(size)} OUTPUT")  # error!

It's still not accepting our quoted size argument. An interesting property of NewType is that any operation done on it will convert it back to the base type. For example, concatenating a str to a ShellQuotedString will return a str. This put the burden on the API designer to define the set of safe operation. If we want to provide operations to work on our safe strings, we have to implement them.

In our case, we know that concatenating shell-safe strings will create a new shell-safe string. So we can expose this operation as the safe way to mix user input and literal values.

def shell_format(
  format_string: str
  *args: ShellString,
  *kwargs: ShellString,
) -> ShellQuotedString:
  return ShellQuotedString(format_string.format(*args, **kwargs))

output_path = "/tmp/out"
shell_format(
  "convert INPUT --resize {} {}", 
  quote(size), 
  output_path
)  # ok

shell_format(
  "convert INPUT --resize {} {}", 
  size, 
  output_path
)  # error!

Note that this implementation might still leave room for security vulnerability. Using the function like shell_format("convert -resize '{}'", size) would leave size effectively un-quoted. It would be possible to add some more checks to ensure any {} literals aren't surrounded by quotes. This is a great example of why regular strings operation might lead to unsafe behavior if applied to our new types.

Now that we have all the operation and safety property we need, we can now glue everything together:

Our system API is now proved to be safe. It should be catching any misuse with unsanitized values. Note that we are using here a ShellQuotedString instead of a SafeString type that could be reused for many other cases (SQL quoting, html.escape, etc.). Our type safety is relative to the usage of it. The return value of html.escape is safe to render without introducing XSS. Yet, the same value could introduce SQL injection if used as-is in a query.

Adding safety to types can go beyond escaping or quoting patterns. Types can expose most implicit preconditions to static analysis. For example, an static file API that open a file from user input could define a SafePath type. A function could then convert a str to SafePath after it checks that it's under a specific directory.

We have seen that we can easily use types to embed semantic in our code. Python typing annotation can do much more than just preventing TypeError. It can make precondition explicit and prevent critical security vulnerabilities.

Top comments (3)

tlavoie • Jul 22 '22

Cool article, thanks. Haven't tried typing in Python. Could this be changed so that besides escaping, it also enforces that the string must be parsed as an integer? Maybe even within a specific size range?

Israël Hallé Flare • Jul 22 '22

I think to keep the same spirit, I would abstract it in another layer. I would define a function resize that takes in an int. As for specific range, NewType could also be used the same way we did for strings:

Size = NewType("Size", int)

def parse_size(unsafe_size: str) -> Size:
  size = int(size_str)
  return valid_size(size)


def valid_size(unsafe_size: int) -> Size:
  if size > 2000:
    raise ValueError("Too big!")
  return size

# int as str are always safe for shell!
def shell_int(x: int) -> ShellQuotedString:
  return str(x)

def resize(size: Size) -> None:
  command = shell_format("convert INPUT -resize {} OUTPUT", shell_int(size))
  system(command)

resize(parse_size("123")) # ok
resize(valid_size(123)) # ok
resize(123) # error!

tlavoie • Jul 22 '22

OK, that makes sense thanks. As long as you're strict about only accepting the appropriate typed versions at the command input, enforcement works. I think what is missing is the equivalent of Haskell's enforcing of types everywhere; this is a useful tool, but you have to pay attention to make sure it's used consistently.

Having it enforced everywhere would be more of a challenge I suppose, because there isn't really the equivalent of that compile-time checking in a more dynamic environment.