Ben Ford for puppet

Posted on Oct 3, 2022

Writing Secure Puppet Code: part one

#puppet #security #code #hardening

Code like this gives me the heebie jeebies.

command => "/usr/bin/add-apt-repository ${options} ${name} || (rm ${::apt::sources_list_d}/${sources_list_d_filename} && false)",

This comes directly from an older version of our puppetlabs-apt module and it scares me. See, I can interpret it and figure out what it (usually) does. But I don't know for sure what it was intended to do. And I'm left trusting that the author was familiar with shell command pipelining and subshells and operator precedence and short circuiting to know exactly what would happen in all eight+ cases in the truth table of exit codes and variable values.

As it turns out, if I crawl back through the git history, I can see why it was added and I can see that the original PR had a bug due to improper short circuiting that was subsequently fixed. But you should not have to audit the source code of modules you want to use in order to discover those sorts of things. And there's still a few potential misuse bugs. For example, if you experiment with the module and pass in an option string containing a semicolon, then it will inexplicably just delete the PPA file without really showing you why.

If you have a nefarious user, this command is also vulnerable to shell injection. Given an options string or a name of ';reboot;', Puppet would reboot the machine every time it ran. Given that many nodes run Puppet at boot time, this could be challenging to troubleshoot and repair. Obviously, more malicious command strings could also be injected.

This code violates several guidelines. Puppet code should be

Easy to read and reason about.
Easy to use properly.
Hard to use improperly, either inadvertently or intentionally.

This is true for all code, but it's especially important for configuration management code. The authors and users are typically people whose speciality is in managing complex infrastructure rather than in software engineering, and by design this kind of code is meant to abstract away platform complexities instead of just shifting the complexity to other places.

Why should I care?

Let's step back and look at the big picture for a moment. You might be thinking of a couple objections:

Just don't write garbage Puppet code and it won't do bad things
The people with access to write our Puppet code are trusted users already. If they can write Puppet code, they can do anything on the machine without resorting to subterfuge.

Neither of these objections are incorrect, but they both miss a very large point. The world has changed in the last eight years. We used to be both the producers and the consumers of our code. We knew where all the dragons lie and how to avoid them. We cannot take that for granted anymore. We have federated access to our configuration management systems, with many teams, with control repository overlays, with external contractors, etc. They don't all know how to avoid problems and might simply experiment to see what happens. And other teams have different incentives. They have no real reason to audit your code to ensure that parameters are passed safely, so when they're tasked with something like a web panel for self-service web hosting, they may not necessarily sanitize inputs before writing parameters into the Hiera data files that drive your code.

Even intentional misuse isn't always completely nefarious. Raise your hand if you've never considered sneaking in a backdoor shell or a network proxy to bypass "onerous" access restrictions.

We live in a more complex world now and must evolve our practices to include the guardrails that ensure safe operation, even in edge cases.

What does secure Puppet code mean?

There are three major definitions of secure Puppet code I see today:

Puppet DSL code that's less likely to be exploited, abused, or accidentally misused.
Puppet modules that leave managed systems in a more secure state.
Code that does not leak sensitive information.

We'll talk about the first definition in this post and ways to work towards it. Keep an eye out for follow up posts exploring the other ones.

Puppet itself does a lot to protect you from misuse and exploitation out of the box. For example, if an attacker were to attempt a classic symlink attack by creating a symlink to /etc/passwd with the path of a file Puppet was managing, Puppet would not overwrite the /etc/passwd file. Instead, the symlink would be removed and replaced with the desired file and contents. And most resource type parameters are validated before they're acted on.

But some parameters, most notably those for exec commands, cannot be automatically validated because some valid uses are indistinguishable from invalid uses. In these cases, it's up to the module author to handle inputs safely. For the most part, this means that you should avoid using or interpolating untrusted input directly.

Escape input variables

The simplest case is to sanitize all inputs to a class with the shell_escape() function. This function will escape any metacharacters with backslashes that instruct the shell not to interpret them. It might look like this:

$input = "Hello world!; rm -rf –no-preserve-root /"

exec { 'sanitized input':
    command   => "/bin/echo ${shell_escape($input)}",
    logoutput => true,
}

This would echo the full string out with the semicolon escaped, rather than executing the command.

$ puppet apply shell_escape.pp
Notice: Compiled catalog for arachne.local in environment production in 0.13 seconds
Notice: /Stage[main]/Main/Exec[sanitized input]/returns: Hello world!; rm -rf –no-preserve-root /
Notice: /Stage[main]/Main/Exec[sanitized input]/returns: executed successfully
Notice: Applied catalog in 0.04 seconds

Restrict to known inputs

Another technique is to only allow certain known good values. One good way to do this is with class parameter data type signatures. This lets Puppet do the validation for you.

class demo (
  # allows any string to be passed, even malicious ones
  String $message,
  # only accepts specific strings
  Enum['primary', 'secondary'] $mode,
  # validates against a regular expression
  Pattern[/\A(?:UTC|GMT)[+-]\d{1,2}:\d\d\z/] $timezone,
) {
  # ...
}

When invalid input is provided, then Puppet will scream loudly at you and refuse to compile the code.

$ puppet apply class_signatures.pp
Error: Evaluation Error: Error while evaluating a Resource Statement, Class[Demo]:
  parameter 'mode' expects a match for Enum['primary', 'secondary'], got 'rm -rf /'
  parameter 'timezone' expects a match for Pattern[/\A(?:UTC|GMT)[+-]\d{1,2}:\d\d\z/], got 'rm -rf /' (file: /Users/ben.ford/tmp/security/class_signatures.pp, line: 11, column: 1) on node arachne.local

If your validation needs are more complex, then you might have to write the logic yourself.

if $mode == 'secondary' and $hatype in ['hotswap', 'failover'] {
  $_hatype = $hatype
} else {
  $_hatype = undef
}

This code leaves you in a distinct known state, and the final variable can only be one of two values, or unset.

Parameterized Commands

One of the most robust ways to protect yourself from shell injection attacks is for the shell to not interpret metacharacters at all. Let's go back to some of the earlier examples to understand how this works.

Starting from the beginning, what happens if you type cat /etc/hosts into a Unix system such as macOS or Linux? The contents of the file are displayed, sure. But let's go into more detail.

First the shell, such as bash or zsh, will tokenize and parse the command string. It will identify two tokens, cat and /etc/hosts, which are then identified as a program and an argument. The program cat is invoked, and /etc/hosts is passed as an argument. The program knows how to use the argument as a filename, it then reads the file and prints the contents to the screen.

But what if you said the name of the file was /etc/hosts;yes? Try it and see what happens. You'll need to press ctrl-c to stop the unending stream of Y characters!

What happened is that the shell tokenized the string cat /etc/hosts;yes into three tokens, not two. The semicolon is a command separator, so the string became cat and /etc/hosts and yes. A command, an argument to that command, and then another command.

This is obviously not what you wanted, but consider the case when the input to that command came from an external source, and you ran it like cat $filename. If you don't have control over the contents of that variable, you can still instruct the shell to treat it as a single token by putting quotes around it, like cat "$filename". Now even if $filename was set to /etc/hosts;yes the shell will treat it as a single token:

$ cat "$filename"
cat: /etc/hosts;yes: No such file or directory

When you're running a command in a Puppet exec type, you can do the same thing with a parameterized command. Instead of interpolating a string and writing your exec like so:

# don't write interpolated exec commands like this
exec { 'display file':
  command   => "/bin/cat ${filename}",
  logoutput => true,
}

You can instead parameterize that command and tell the shell exactly what tokens to use:

# instead, parameterize them like this
exec { 'display file':
  command   => ['/bin/cat', $filename],
  logoutput => true,
}

When writing code with parameterized commands, the first array item is the command and all the rest are pre-tokenized arguments to that command. Note that the onlyif and unless parameters are arrays of check commands, so to parameterize them you'll need to pass arrays of parameterized commands, an array of arrays.

When using parameterized commands or shell escaped strings, you cannot do complex command chaining or output redirection or the like. It's sometimes possible to write safe commands of this kind by careful quoting of the appropriate arguments, but in general, that leads to even less readable command strings. Instead, when that functionality is required you should write a complete script and invoke it. That gives you a far more readable script that can be more easily audited and that can do better input validation and error checking on its own. For example, one possible solution to the complex apt repository command above could be to write a small wrapper script that reverts the files to their previous state if the command fails.

file { '/usr/bin/puppet-add-apt-repository':
    owner   => 'root',
    mode    => '0755',
    content => 'puppet::///modules/profile/apt/puppet-add-apt-repository',
    before  => Exec['add apt repo'],
}

exec { 'add apt repo':
    command => '/usr/bin/puppet-add-apt-repository',
    # [...]
}

To recap

Don't ever trust input. Even if it's not actively malicious, it's far too easy to accidentally misuse code that doesn't properly sanitize input. Restrict the input that can be passed to known good values and parameterize commands when you can. At the very least, defang tainted commands with the shell_escape() function.

Watch out for the next post in this sequence in which we talk about best practices for writing management code that leaves your systems in a hardened state.

Ben is the Community and DevRel lead at Puppet, where he's been scared by production Puppet code for the last decade.

Learn more

See the exec documentation for the command, unless, and onlyif parameters.
See docs on Puppet data types.
Watch my Puppetize talk.
Try out some automated Puppet lint security checks and some infrastructure management best practices checks.
Read about the Zero Trust model.

DEV Community