DEV Community

Cover image for Puppet is YAML
Martin Alfke for betadots

Posted on

Puppet is YAML

or: The Power of Hiera5
or: Puppet with 7 lines of code only

We sometimes see people struggling when it comes to Puppet.
Mostly this is due to time needed to learn about the details of Puppet, like Puppet Language syntax.

There are two general ways on how to manage your infrastructure using Puppet:

  1. own code
  2. YAML data

Writing your own code will be helpful in edge cases or if there are internal applications where no Puppet extension exists.

When starting using Puppet we always recommend to make use of hiera.
Hiera is the Puppet internal data backend and can provide data to Puppet either in YAML or JSON syntax.

This posting will explain, how you can easily start managing your settings using Puppet YAML data only.

Node classification

Hiera can easily be used for node classification.

There are two different data types possible: Array and Hash[String].

We first explain the Array syntax:

Within puppet manifests/site.pp we run the lookup function and query a specific key. We also provide the required data type and a default value:

    # manifests/site.pp
        'name'          => 'classes',
        'value_type'    => Array,
        'default_value' => []
Enter fullscreen mode Exit fullscreen mode

Prior adding data, we must first check the infrastructure to identify differences. All systems may need default classes, some systems need different classes - depending on system use-case.

All systems need:

  • Security policies
  • LDAP/AD integration
  • Monitoring client

Database servers need:

  • Database setup
  • Backup client
  • Metrics exporter

Webservers need:

  • Webserver setup
  • Web application

Database servers for application 'A' need:

  • Specific database schema
  • Python extensions

Webservers for application 'A' need:

  • Mail sending capabilities
  • Extended security settings

Within hiera we use the term hierarchy to identify different system settings.
These differences must be made available as Puppet facts or trusted information which are part of the Puppet client certificate.

All systems will receive common data.
Specific systems will receive data based on application, service and stage (prod, test, dev).
We usually recommend the following hiera configuration settings:

# hiera.yaml
version: 5

  datadir: data

  - name: "All hierarchies"
      # node specific data
      - "nodes/%{trusted.certname}.yaml"
      # application/service-stage data
      - "%{trusted.extensions.pp_application}/%{trusted.extensions.pp_service}-%{trusted.extensions.pp_env}.yaml"
      # application/service data
      - "%{trusted.extensions.pp_application}/%{trusted.extensions.pp_service}.yaml"
      # application data
      - "%{trusted.extensions.pp_application}.yaml"
      # network zone data
      - "zone/%{trusted.extensions.pp_zone}.yaml"
      # os specific data
      - "os/%{}-%{facts.os.version.major}.yaml"
      # default data
      - "common.yaml"
Enter fullscreen mode Exit fullscreen mode

It is important that one first understands the infrastructure prior starting with hiera!

Now we can start adding class data. Class data can be added to the relevant hierarchy yaml file. Besides this we want to ensure that we collect classes data from all hierarchies. For arrays the merge behavior unique must be set.

This what the lookup_options key is used for.
We first specify the lookup_options key (just to be directly visible) and the classes array data:

# data/common.yaml
    merge: 'unique'

  - 'class_a'
  - 'class_b'
Enter fullscreen mode Exit fullscreen mode

The Array usage has a limitation:

One can only add classes in higher hierarchies.
It is not possible to remove a class!

This is where the Hash data type can be used.

Within hashes we set a unique identifier as key with the required class as value.
This allows us to overwrite already declared classes.
It is then also possible to let people know about this disabled class by using the echo resource type:

The puppet code must be adopted:

1  lookup( 'classes_hash', { 'value_type' => Hash, 'default_value' => {} } ).each |$name, $c| {
2    unless $c.empty {
3      contain $c
4    } else {
5      echo { "Class for ${name} on ${facts['networking']['fqdn']} is disabled": }
6    }
7  }
Enter fullscreen mode Exit fullscreen mode
# data/common.yaml
      behavior: 'deep'

  'description of class A': 'class_a'
  'description of class B': 'class_b'
Enter fullscreen mode Exit fullscreen mode

If a node is very specific and should not recieve a default class, the key can be overwritten with an empty string:

# data/nodes/different_server.domain.tld.yaml
  'decription of class A': ''
Enter fullscreen mode Exit fullscreen mode

Using upstream Puppet modules (Libraries)

For many applications one can find ready-to-use Puppet modules on Puppet Forge.
Unluckily documentation lacks examples for Hiera YAML data in most cases.
Luckily it is best practice to provide a file, which describes classes and their parameters.

A simple use case for nginx:

# data/application/webserver.yaml
  'webserver for application': 'nginx'

nginx::port: '8080'
Enter fullscreen mode Exit fullscreen mode

Within many Puppet modules, one will find classes and sometimes additional Resource Types.
Resource Types know exactly how a specific configuration can be achieved (e.g. to create a nginx server vhost).
But Resource Types can not be added to Puppet like classes.

Simple installation, configuration and services

The stdlib module provides a class which enables you to make use of the Hiera YAML Data backend to add any Resource Type.
The class is called: stdlib::manage.

Add to common.yaml

  'puppet_is_yaml': 'stdlib::manage'
Enter fullscreen mode Exit fullscreen mode

The only things one needs to learn are:

  • what resource types are available and
  • what parameters are provided by the resource type.

Within a Puppet base installation we already have a couple of resource types available:

  • user
  • group
  • package
  • file
  • service
  • ...

Most resource types will be added by modules.
e.g. PostgreSQL database management is done by a resource type in PostgreSQL module.

You can identify all available Ruby based resource types by running sudo puppet describe -l
Defined types will not be visible in that list.

Within your data, you must provide a hash to stdlib::manage::create_resources.
The Hash consists of three levels. The first level describes the resource type, the second level describes the instance and within the third level we provide the parameters.

General syntax:

  'Resource Type1':
    'Unique Name':
      'attribute': 'value'
  'Resource Type2':
    'Unique Name':
      'attribute': 'value'
Enter fullscreen mode Exit fullscreen mode

The following is a simple example on how to manage ntp:

# data/os/RedHat-7.yaml
stdlib::manage::create_resources: # Puppet Library Data lookup
  'package':                      # Resource Type
    'ntp':                        # Type title or unique name
      ensure: 'present'           # Parameter of resource type
      ensure: 'file'
      source: 'puppet:///modules/profile/time/ntp.conf'
      owner: 'root'
      group: 'root'
      mode: '0644'
      require: 'Package[ntp]'
      ensure: 'running'
      enable: true
      subscribe: 'File[/etc/ntp.conf]'
Enter fullscreen mode Exit fullscreen mode

Defaults and overwriting, adding or removing parameters

YAML anchors and aliases allows to set defaults, e.g. file resource defaults.

Please note that anchors and aliases must exist in the same file.
Each YAML file can have their own set of anchors and aliases.
One can not refer to anchors set in other YAML files.

First we define the anchor.

file_defaults: &file_defaults
  owner: 'root'
  group: 'root'
  mode: '0644'
Enter fullscreen mode Exit fullscreen mode

Within the same YAML file, we can reference the anchor using an alias:

      << : *file_defaults
      ensure: 'file'
      source: 'puppet:///modules/profile/time/ntp.conf'
      require: 'Package[ntp]'
      << : *file_defauts
      ensure: 'file'
      content: 'admin'
      mode: '0400'
Enter fullscreen mode Exit fullscreen mode

The full power of hiera comes into place, if you want to set a resource globaly, but want to adopt settings for certain systems.

In this case one can configure hiera to look into all hierachy levels and overwrite data in a higher hierachy level:

Setting the lookup_options:

# data/common.yaml
      behavior: 'deep'
      behavior: 'deep'
Enter fullscreen mode Exit fullscreen mode

Overwriting node specific data:

# data/nodes/timeserver.yaml
      source: 'puppet:///modules/profile/time/ntp-timeserver.conf'
Enter fullscreen mode Exit fullscreen mode


YAML has several specialities which one should be aware of.
In general we learn that String should always be quoted.
If you don't quote Strings, one should be aware of the consequences.

We are going through several examples, which will explain the problems:

Sexagesimal Numbers

Sexagesimale Number are base 60 number (from 0 to 59) and have been introduced with YAML 1.1 und which have been removed from YAML 1.2.
Depending on the YAML specification version used by the parser one might get different results:

  - 22:22
  - 443:443
Enter fullscreen mode Exit fullscreen mode

Using YAML 1.1 the following result is returned:

{ "port_map": [1342, "443:443"]}
Enter fullscreen mode Exit fullscreen mode

Anchors, aliases und tags

There are some special characters which change YAML behavior.
We already talked about Anchors and Aliases. An Anchor starts with &, and Alias starts with *.

If you add an unquoted string starting with * YAML will search the corresponding Anchor. If YAML can not finde the anchor it will return an error.


# blog_posts/yaml_demo.yaml
  - /robots.txt
  - *.html
Enter fullscreen mode Exit fullscreen mode

Now we load the YAML file:

# irb
require 'yaml'
Traceback (most recent call last):
       11: from C:/Program Files/Puppet Labs/Bolt/bin/irb.bat:31:in `<main>'
       10: from C:/Program Files/Puppet Labs/Bolt/bin/irb.bat:31:in `load'
        9: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/gems/2.7.0/gems/irb-1.2.6/exe/irb:11:in `<top (required)>'
        8: from (irb):2
        7: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:577:in `load_file'        
        6: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:577:in `open'
        5: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:578:in `block in load_file'
        4: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:277:in `load'
        3: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:390:in `parse'
        2: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:456:in `parse_stream'     
        1: from C:/Program Files/Puppet Labs/Bolt/lib/ruby/2.7.0/psych.rb:456:in `parse'
Psych::SyntaxError ((blog_posts/yaml_demo.yaml): did not find expected alphabetic or numeric character while scanning an alias at line 4 column 5)
Enter fullscreen mode Exit fullscreen mode

Tags in YAML are used to parse complex data types. Main problem is that this allow injection of arbitrary code.
Another problem is: If YAML can not find a tag, the tag reference will be replaced by NIL.


# blog_posts/yaml_demo.yaml
  - /robots.txt
  - !local.html
Enter fullscreen mode Exit fullscreen mode

Now we parse the YAML file:

# irb
require 'yaml'
=> {"web_files"=>["/robots.txt", nil]}
Enter fullscreen mode Exit fullscreen mode

The Norway Problem

Some unquoted strings in YAML are processed using Bool values.
This affects the following unquoted strings being converted to False:

  • off
  • no

Several combination of capitalizing are possible.

The following unquoted strings will be parsed as True:

  • on
  • yes

This problem has been solved in YAML 1.2. But many parsers still use YAML 1.1.


  - no
  - off
  - yes
  - on
Enter fullscreen mode Exit fullscreen mode

Read YAML file:

require 'yaml'
=> {"bool_strings"=>[false, false, true, true]}
Enter fullscreen mode Exit fullscreen mode

One should be aware that this also affcts unquoted Hash keys!

Original Post:


The concept of Hiera and YAML data allows you to manage your infrastructure with 7 lines of Puppet code.

All configurations are just YAML data.
Whether you use upstream modules or just define types using stdlib::manage - its all data.

While this concept might cover most parts of your infrastructure you still want sometimes to egt even more flexibility.
This is where the Puppet DSL comes into place.

  • Code logic (if, unless, case)
  • Data type validation (Integer, Boolean, String)
  • More complex setup
  • Your own set of types and providers

betadots GmbH wishes everybody success and fun using Puppet and YAML for Configuration Management.

Top comments (0)