Puppet with Foreman - Infrastructure

#automation #foreman #puppet #confdroid

Understanding Puppet Infrastructure: Minimum Requirements Without Foreman

Puppet serves as a powerful tool for configuration management, allowing administrators to define and enforce the desired state of systems across an infrastructure. In this continuation of the blog series on Puppet Core with Foreman as the External Node Classifier (ENC), the focus shifts to the general structure of a Puppet infrastructure. Specifically, the minimum requirements are explored without incorporating Foreman, emphasizing core components and setups for effective deployment. This series is all about open-source Puppet, not Core or Enterprise.

The Absolute Minimum: One Primary Server

At its core, a Puppet infrastructure requires just one primary server to function. This server acts as the central authority, compiling catalogs—detailed instructions for node configurations—and distributing them to agents. For open-source Puppet, system requirements remain straightforward, but scaling considerations apply based on the number of managed nodes.
According to official documentation for Puppet Enterprise (which shares similarities with open-source Puppet in hardware needs), the primary server in a standard architecture supporting up to 2,500 nodes has the following minimum hardware specifications:

Trial use: 2 cores, 8 GB RAM, 20 GB for /opt/, 24 GB for /var/.
11–100 nodes: 6 cores, 10 GB RAM, 50 GB for /opt/, 24 GB for /var/.
101–500 nodes: 8 cores, 12 GB RAM, 50 GB for /opt/, 24 GB for /var/.
501–1,000 nodes: 10 cores, 16 GB RAM, 50 GB for /opt/, 24 GB for /var/.
1,000–2,500 nodes: 12 cores, 24 GB RAM, 50 GB for /opt/, 24 GB for /var/.

Supported operating systems for the primary server include Red Hat Enterprise Linux (RHEL) 7–10 (x86_64, ARM64, ppc64le for RHEL 9), Ubuntu 18.04–24.04 (amd64, aarch64), and Microsoft Windows Server 2012 R2 for FIPS-compliant setups. Architectures typically require x86_64 (or amd64 for Ubuntu). For smaller open-source setups, a dual-core processor with at least 4 GB RAM suffices for basic operations, though production environments benefit from quad-core processors and 8 GB or more RAM.
These specifications ensure the server can handle catalog compilation, which involves processing manifests and data files. Disk space allocations account for storing modules, logs, and reports, with /opt/ often housing Puppet's installation and /var/ managing runtime data.

High Availability: Adding Compile Masters

For enhanced reliability and scalability, high availability (HA) configurations introduce additional compile masters. These secondary servers offload catalog compilation from the primary server, distributing the workload in larger environments. The primary server remains the certificate authority (CA) and central point for data, while compile masters handle requests from agents.
In an HA setup, compile masters mirror the primary server's configuration and require similar hardware specs based on node volume. Synchronization occurs through shared storage or replication tools, ensuring consistency. This approach minimizes downtime, as agents can failover to available masters if the primary encounters issues. We'll look into the setup of such a configuration at a later stage.

Various Puppet Modes

Puppet offers flexibility through different operational modes, each suited to specific use cases. These modes determine how configurations are compiled and applied:

Puppet server with Puppet agents: A traditional pull-based model where agents periodically check in with a Puppet server to retrieve their desired configuration state (catalog), which is then enforced locally.
Serverless Puppet: A mode where the catalog is compiled directly on the node itself, often combined with a remote version control repository for masterless, pull-based deployments.
Puppet Bolt: An orchestration tool that uses a push model to apply configurations, compiling the catalog on a controller and deploying it to managed nodes.
Local puppet apply: The simplest mode, where the catalog is compiled and applied locally without a server, commonly used for provisioning virtual machines or images.

Selecting a mode depends on infrastructure size, security needs, and administrative preferences. The server-agent model prevails in enterprise settings for centralized control.

The Role and Advantages of PuppetDB

PuppetDB plays a crucial role as a storage backend for Puppet data, enhancing query capabilities and enabling advanced features. It collects and stores facts (system details from agents), catalogs, and reports, making this information accessible for querying and analysis.
One key advantage involves exported resources, where nodes can "export" configuration elements for collection by others. For instance, a server might export monitoring checks that populate a Nagios server automatically. This facilitates dynamic integrations, such as auto-registering hosts in monitoring tools without manual intervention.
PuppetDB requires an external PostgreSQL database for production use, as the built-in HSQLDB suits only testing. PostgreSQL ensures performance and scalability, with recommendations for dedicated servers in large deployments to handle query loads efficiently. Installation involves configuring Puppet to connect to PostgreSQL, typically with TLS/SSL for security.

Role and Requirements for Puppet Agents

Puppet agents run on managed nodes, enforcing configurations by applying catalogs received from the server. They operate in the background, typically checking in every 30 minutes to ensure compliance.
Requirements for agents are minimal compared to the primary server:

Supported OS: Similar to the server, including various Linux distributions, Windows, macOS, and more.
Hardware: Low overhead—sufficient RAM (at least 512 MB) and CPU for occasional runs; no dedicated resources needed beyond the node's baseline.
Software: Ruby (version depending on Puppet release), with the agent package handling dependencies.

Agents communicate via HTTPS, requiring certificates signed by the primary server's CA. This setup ensures secure, authenticated interactions.

Diagram of a full infrastructure

This is how a full setup could look like:

Environments in Puppet

Environments provide isolation for code and configurations, allowing different versions of modules or manifests to coexist. By default, Puppet includes a "production" environment, but custom ones like "development" or "staging" can be created for testing changes without affecting live systems.
Each environment has its own directory structure for manifests, modules, and data files. Nodes are assigned to environments via configuration files or ENCs (though Foreman is excluded here). This separation supports safe rollouts, version control integration (e.g., with Git), and compliance with development workflows.
In summary, a basic Puppet infrastructure starts with a single primary server and scales through HA and tools like PuppetDB. Understanding these components lays the foundation for robust automation, paving the way for more advanced topics in future posts.

Did you find this post helpful? You can support me.