DEV Community

Martin Alfke for betadots

Posted on

Modern Puppet node classification

Within Puppet we use modules to describe specific technical components which we want to configure on a system.
This can by achieved either by upstream library modules (some refer to these as component modules) which can be found on Puppet Forge or by self written Puppet code which we usually refer to as technical implementation profiles.

Since Puppet is a client server model, the server must be aware of each node and must know which classes a node needs. This process is called node classification.

Within Puppet there are several ways how nodes can be classified.
This article describes the classical node classification and its limitations.
We then demonstrate usage and examples for a more sophisticated hiera data driven node classification process.

  1. Classical classification concepts
    1. Node resource
    2. Role classification
    3. External Node Classifier
  2. Hiera based classification
    1. Hiera Array
    2. Hiera Hash
    3. Hiera Multiple Hashes
  3. Conclusion

Classical classification concepts

The Puppet server uses the manifest configuration option to check for a directory in which the Puppet server expects the site.pp file.
The default setting has the following value: /etc/puppetlabs/code/environments/production/manifests

Within the site.pp file (and any other *.pp file in this directory or any subdirectory) we can add Puppet code for node classification.

Node resource

The most simple approach is the usage of the node resource type. The node resource uses the Puppet agent certificate DN as identifier. It is also possible to use a regular expression

# manifests/site.pp
...
node 'app1web03-dev.domain.tld' {
  include profile::base
  include profile::accounts::dev
  include profile::webserver::nginx
  include profile::application::app01
}
...
Enter fullscreen mode Exit fullscreen mode

Please note that Puppet Server must receive a node classification object.
Therefore we can add an empty default fallback node to site.pp:

node default {}
Enter fullscreen mode Exit fullscreen mode

The node resource is a simple solution for small environments with a couple of nodes only.
When it comes to larger infrastructure this approach is very time consuming to maintain - even when we make use of the possibility to group servers in individual files and directories.

Role classification

In site.pp file we can also add any Puppet code like querying for specific Puppet agent facts or other data.
If we add a fact called role to our Puppet agents, we can use the fact to include role classes:

# manifests/site.pp
include "role::${facts['role']}"
Enter fullscreen mode Exit fullscreen mode

The role pattern makes sense if you have larger groups of servers which must be configured identically.
On the other hand every node with individual configuration must receive its own role.
In case that an infrastructure consists of many different roles, this concepts becomes very time intensive to maintain.

External Node Classifier

Puppet is able to make use of other tools for node classification. These tools are called External Node Classifiers (ENC). Puppet Enterprise and Foreman make use of this feature.

Attention: Please note that the ENC is an add-on to the node resource classification and not a replacement!
If a node is classified in ENC and within manifests directory, both classification objects are used.

To configure Puppet to make use of an ENC script one must add the following two configuration options to puppet.conf:

[master]
  node_terminus = exec
  external_nodes = /usr/local/bin/enc
Enter fullscreen mode Exit fullscreen mode

Puppet Server runs the command specified via external_nodes and passes the client’s certname to the script.
The Puppet Server user executes the command which can be written in any language, query remote services for data (query a web API, a Database, check file contents) and has to return YAML output for the given certname:

---
environment: production
classes:
  profile::base:
    time_servers: ['time.domain.tld']
  profile::accounts::dev: {}
  profile::webserver::nginx: {}
  profile::application::app01: {}
Enter fullscreen mode Exit fullscreen mode

Within the classes section the Puppet Server expects an array or a has of classes to include for that node. When using hashes, one can also pass a data sub-hash for class parameters.
Within the optional environment key we can force an agent to use the specified Puppet environment.

The following is an enc example shell script, which uses files in a directory:

# /usr/local/bin/enc
#!/bin/bash
if [ -e /etc/enc/nodes/$1.yaml ]; then
  cat /etc/enc/nodes/$1.yaml
else
  cat /etc/enc/default.yaml
fi
Enter fullscreen mode Exit fullscreen mode

The Puppet ENC classification is useful if you want to offer the possibility to add new nodes or change classification on existing nodes with a separate tool.
This is an elegant solution to separate Puppet and the classification process.
While this allows one to develop a solution with unlimited complexity, we always recommend to keep the classification process as simple as maximum possible and as complex as minimal required.
An ENC should return an answer very fast. In larger environments one must also consider load and performance. When using remote systems, one wants to ensure high availability and high performance.

Hiera

Hiera is the Puppet built-in data backend in which we usually add parameters which differ within the infrastructure, e.g. servers in datacenter A use a different DNS server setting than servers in datacenter B.

Within hiera configuration file we can specify different layers of hierarchies. We usually recommend the following approach:

  • Node specific data
  • Application and stage data
  • Location or network zone data
  • OS specific data (only if needed)
  • Common or global data

More information about Hiera can be found on the Puppet Hiera website.

One can place any kind of key value pairs to the hiera yaml files.
This allows us to also use hiera for node classification by querying a specific key using the Puppet lookup function.
Within the lookup function we can specify several default values:

  • key name
  • expected data type
  • merge behavior

Hiera Array

To benefit most from hiera one should make use of adding classifications into specific hiera levels (application1-dev, net-dmz, os-version, ...).
Usually hiera will return the first value found when iterating over the hierarchies. This can be overwritten by specifying the merge behavior:

lookup( {
  'name'          => 'classes',
  'value_type'    => Array,
  'default_value' => [],
  'merge'         => {
    'strategy' => 'unique',
  },
} ).each | $c | {
  # Note: we can not use the variable `$class` as this is a reserved word!
  include $c 
}
Enter fullscreen mode Exit fullscreen mode

Within the data hierarchies we can then add the 'classes' key where needed:

# common.yaml
---
classes:
  - profile::base

# os/CentOS.yaml
---
classes:
  - profile::base::centos

# stage/dev.yaml
---
classes:
  - profile::accounts::dev

# application/app01.yaml
---
classes:
  - profile::webserver::nginx
  - profile::application::app01
Enter fullscreen mode Exit fullscreen mode

Another nice feature built into hiera is the possibility to set the lookup behavior within hiera data itself.

One can place the merge behavior into common.yaml and remove it from lookup function:

# common.yaml
---
lookup_options:
  'classes':
    merge:
      strategy: 'unique'
Enter fullscreen mode Exit fullscreen mode

We can now remove the merge key from the lookup function

# manifests/site.pp
lookup( {
  'name'          => 'classes',
  'value_type'    => Array,
  'default_value' => []
}).each | $c | {
  include $c
}
Enter fullscreen mode Exit fullscreen mode

If you need to configure a single system where things are different, you can use the lookup_options on a higher hierarchy.
e.g. only add some specific users to a system, but not all users from common or any lower layer than the node data layer.

In this case you can add the lookup_option to the node hierarchy:

# node/hr_server.domain.tld.yaml
---
lookup_options:
  'classes':
    merge:
      strategy: 'first'

classes:
  - profile::base::hr
  - profile::accounts::hr
Enter fullscreen mode Exit fullscreen mode

Please note that all classifications from other hierarchies must be added at this location.

Hiera Hash

An even more sophisticated option is the usage of hashes instead of arrays, which allow overrides and exceptions to the list of classes.

We must adopt two settings:

common.yaml: switch the merge strategy from unique to deep

---
lookup_options:
  'classes':
    merge:
      strategy: 'deep'
Enter fullscreen mode Exit fullscreen mode

site.pp: switch value_type from Array to Hash and iterate using $key, $value

lookup( { 'name'          => 'classes',
          'value_type'    => Hash,
          'default_value' => {}
}).each | $key, $c | {
  if $c =! '' {
    include $c
  }
}
Enter fullscreen mode Exit fullscreen mode

Now we can set Hashes in hiera:

# common.yaml
---
classes:
  base_class: 'profile::base'

# os yaml
---
classes:
  security_class: 'profile::base::centos'

# stage yaml
---
classes:
  accounts_class: 'profile::accounts::dev'

# app yaml
classes:
  webserver_class: 'profile::webserver::nginx'
  application_class: 'profile::application::app01'
Enter fullscreen mode Exit fullscreen mode

The classes hash consists of keys which are string words and values which are class names.
The keys are used within hiera only, not within Puppet.

The key:value option allows us to override classification for a node from common classification:

# node yaml
---
classes:
  webserver_class: 'profile::webserver::tomcat'
Enter fullscreen mode Exit fullscreen mode

On this node we overwrite the webserver class to not use nginx, but using a tomcat class.

We can even decide to NOT manage a class key at all on a node (or group of nodes, according to where on Hiera we make the configuration):

---
classes:
  application_class: ''
Enter fullscreen mode Exit fullscreen mode

which overrides the classes defined in more general Hiera layers and uses an empty class instead. Within the Puppet Code we omit to include classes with empty names. Optionally one can make use of the notice function to log the information about an empty class hash element.

Hiera Multiple Hashes

Another option is to use different lookups for different purposes like common, os, application classes.

The following is an example for the usage of different Hiera Hash keys to identify different classification based on 'kernel' fact:

# common.yaml
---
lookup_options:
  /.*_classes/:
    merge:
      strategy: 'deep'

# manifests/site.pp
$kernel_down = $facts['kernel'].downcase    
lookup( { 'name'          => "${kernel_down}_classes",
          'value_type'    => Hash,
          'default_value' => {}
}).each | $key, $c | {
  if $c != '' {
    include $c
  }
}
Enter fullscreen mode Exit fullscreen mode

Now we can add the os specific classes key:

linux_classes:
  hostname: 'profile::linux::hostname'
  repo: 'profile::linux::repo'
  sudo: 'profile::linux::sudo'
  ssh: 'profile::linux::ssh'
  mail: 'postfix'
  webshop: 'profile::application::webshop::nginx'

windows_classes:
  hostname: 'profile::windows::hostname'
  hosts: 'profile::windows::hosts'
  features: 'profile::windows::features'
  time: 'profile::windows::time'
  users: 'profile::windows::ad_auth'
  webserver: 'iis'
Enter fullscreen mode Exit fullscreen mode

We can even expand this hash driven lookup by implementing ordering using pre and post classes in combination with Puppet tags and build dependencies:

# manifests/site.pp
lookup( { 'name'          => "pre_classes",
          'value_type'    => Hash,
          'default_value' => {}
}).each | $key, $c | {
  if $c != {
    class { $c:
      tags => 'pre',
    }
  }
}

# manifests/site.pp
$kernel_down = $facts['kernel'].downcase
lookup( { 'name'          => "${kernel_down}_classes",
          'value_type'    => Hash,
          'default_value' => {}
}).each | $key, $c | {
  if $c != '' {
    include $c
    Class<| tags == 'pre' |> -> Class[$c]
  }
}
Enter fullscreen mode Exit fullscreen mode

This pattern even allows one to add application specific lookups.

Conclusion

Depending on your infrastructure and your requirements, you want to check which of the mentioned options are useful for you.

A more complex infrastructure needs a more sophisticated node classification process.

Due to the reason that node resource and roles are to limited for most setups, we usually recommend to make use of the more flexible Hiera based node classification.

Depending on the variations of servers and operatingsystems you might consider the array based classification for less complex infrastructure and use the hash based classification for infrastructures with a huge variety of operatingsystems, stages, applications, ...

The option to use multiple hashes is always useful in case that you have a complex infrastructure with several teams with split responsibility.
Here one can split the classification from base and applications by adding additional lookups for 'app_classes' or even 'pre_classes' and 'post_classes'.

Happy puppetizing,
Martin Alfke

Top comments (1)

Collapse
 
alvagante profile image
Alessandro Franceschi

Nice post Martin. Worth mentioning that the Hiera Multiple hashes approach is used and available out of the box, in example42's psick module.

It's enough to include (classify) the psick class and then manage classification via Hiera using the following keys (for each one an hash of classes cat be set):

psick::firstrun::linux_classes
psick::pre::linux_classes
psick::base::linux_classes
psick::profiles::linux_classes
Enter fullscreen mode Exit fullscreen mode

Same entrypoints are available for windows_classes and darwin_classes.