Introduction
We will create a new Ansible role and a playbook to automate the installation of the Command line tools I always install on Ubuntu servers. Having the installer and Ansible role is not enough. It is always a good practice to document the role, what it is for and how people can use it, so we will discuss that too.
The new features we will learn about today are the following:
- Using multiple tasks files and including a tasks file in the
main.yml
. - Using Ansible facts, disabling them and gathering a subset of the available facts.
- Creating a symbolic link
- Updating the apt repository cache
- Using the folder "vars" in addition to "defaults"
- Using regular expressions in Ansible
We will start from the source code of the 7th episode:
https://github.com/rimelek/homelab/tree/tutorial.episode.7
Table of contents
- Before you begin
- Ansible playbook and optional APT cache update
-
Ansible role
- Ansible role overview
- Creating a symbolic link
- Using multiple tasks files
-
Install the latest yq from GitHub
- Using the GitHub API to get the latest release
- Get the version number of the latest release
- Get the architecture and operating system of the server
- Saving helper variables in addition to defaults
- Installing the desired version of yq
- Skip downloading when the existing version is the desired one
- Full yq tasks file
- Documenting Ansible roles
- Conclusion
Before you begin
Requirements
- The project requires Nix which we discussed in Install Ansible 8 on Ubuntu 20.04 LTS using Nix
- You will also need an Ubuntu remote server. I recommend an Ubuntu 22.04 virtual machine.
Download the already written code of the previous episode
If you started the tutorial with this episode, clone the project from GitHub:
git clone https://github.com/rimelek/homelab.git
cd homelab
If you cloned the project now, or you want to make sure you are using the exact same code I did, switch to the previous episode in a new branch
git checkout -b tutorial.episode.7b tutorial.episode.7
Have the inventory file
Copy the inventory template
cp inventory-example.yml inventory.yml
- Change
ansible_host
to the IP address of your Ubuntu server that you use for this tutorial, - and change
ansible_user
to the username on the remote server that Ansible can use to log in. - If you still don't have an SSH private key, read the Generate an SSH key part of Ansible playbook and SSH keys
- If you want to run the playbook called
playbook-lxd-install.yml
, you will need to configure a physical or virtual disk which I wrote about in The simplest way to install LXD using Ansible. If you don't have a usable physical disk, Look fortruncate -s 50G <PATH>/lxd-default.img
to create a virtual disk. - You will need an encrypted secret file which I wrote about in the Encrypt a file section of "Use SOPS in Ansible ro read your secrets".
Activate the Python virtual environment
How you activate the virtual environment, depends on how you created it. In the episode of The first Ansible playbook describes the way to create and activate the virtual environment using the "venv" Python module and in the episode of The first Ansible role we created helper scripts as well, so if you haven't created it yet, you can create the environment by running
./create-nix-env.sh venv
Optionally start an ssh agent:
ssh-agent $SHELL
and activate the environment with
source homelab-env.sh
Ansible playbook and optional APT cache update
Before we can talk about the role, we have to start with a playbook. Previously, we only had playbooks for specific tasks like installing and removing LXD. The goal is to have a playbook that installs the common dependencies with which you can play on the remote servers even without Ansible, so when you are trying to do something new, you don't have to start with yaml files without even knowing what you want to do in the end. Let's call this playbook file "playbook-system-base.yml
", and for now, add only the role that we will create soon.
- hosts: all
roles:
- role: cli_tools
We still assume that all our machines that we configure in the inventory file are targets. It will change, but not in this post.
This ansible role will contain the installation of lots of APT packages. We could have other roles that want to install APT packages, so we also want to make sure the APT cache is up-to-date. It would be a waste of time to update the cache in every role, so we will update it in a pre task:
- hosts: all
pre_tasks:
- name: APT update
become: true
changed_when: false
when: config_apt_update | default(false, true) | bool
ansible.builtin.apt:
update_cache: true
roles:
- role: cli_tools
In this case we use the built-in "apt" module to update the cache, without installing anything, but apparently, updating the cache will also mean the task will always report a change. To disable that, we add changed_when: false
to the task. We also want a way to skip the updater pre task. When you have to run a playbook 10 times in two minutes while you are developing it, updating the cache every time is simply not necessary. We add a condition which will use the new config_apt_update
variable. If it is not defined in the inventory file, we use "false" as default value, but you can always override it from command line.
./run.sh playbook-system-base.yml \
-e config_apt_update=true
I will define it in my inventory, so this is how the global vars section looks like now:
all:
vars:
ansible_user: ansible-homelab
sops: "{{ lookup('community.sops.sops', 'secrets.yml') | ansible.builtin.from_yaml }}"
config_apt_update: true
config_lxd_zfs_pool_disks:
- /dev/disk/by-id/scsi-1ATA_Samsung_SSD_850_EVO_500GB_S2RBNX0J103301N-part6
Ansible facts
There is one more line we need to add to the playbook.
When you run a playbook, as the very first step, Ansible detects devices and collects information for example about networks and the version of the Linux distribution. The collected information will be available through variables and these are the facts. Sometimes you don't need these facts, and you want to speed up the execution of the playbook, especially when you have to run it on 100 servers or on just a couple but very often during development. If that is the case, you can set gather_facts: false
in the playbook like this:
- hosts: all
gather_facts: false
pre_tasks:
- name: APT update
become: true
changed_when: false
when: config_apt_update | default(false, true) | bool
ansible.builtin.apt:
update_cache: true
roles:
- role: cli_tools
If you use roles you didn't write, and you don't want to find out what facts they need, just leave the facts gathering enabled.
Now you may think you understand the difference between the variables we used before and the facts, but in fact, you can also define facts using the set_fact builtin module. So one more important difference is the scope. You can define a variable in a task, but that variable will not be available in the next task. Facts are available everywhere, and you can also cache them, so when you define a fact, run the playbook, remove the definition and rerun the playbook, you can still read the fact from the cache. Of course it depends on the used cache plugin, and the default is memory. So by default, the facts are not available when you run a playbook the second time. If ou want to see how persistent fact caching works, the following example can show it.
Run the following commands in terminal:
export ANSIBLE_CACHE_PLUGIN=jsonfile
export ANSIBLE_CACHE_PLUGIN_CONNECTION="$PWD/var/cache"
Use the following playbook:
- hosts: localhost
gather_facts: false
tasks:
- ansible.builtin.set_fact:
cacheable: true
mytest: hello
- ansible.builtin.debug:
var: ansible_facts.mytest
After running the playbook, you will find a file named "localhost" in the folder you specified in the plugin connection.
{
"mytest": "hello"
}
Then run the following playbook:
- hosts: localhost
gather_facts: false
tasks:
- ansible.builtin.debug:
var: ansible_facts.mytest
And Ansible will still remember the value of "mytest":
ok: [localhost] => {
"ansible_facts.mytest": "hello"
}
Ansible role
Ansible role overview
The new Ansible role will be called "cli_tools". The structure of the role will be the following:
-
defaults/
- main.yml: The place for default parameter values.
-
vars/
- main.yml: A file to store helper variables which are not intended to be changed by the user. You can use this file if the alternative is storing the variables in the tasks file, which requires creating a block only for those variables.
-
tasks/
- main.yml: The default tasks file that we always used
-
yq.yml: An additional tasks file which we can refer to and load in the
main.yml
.
- README.md: This is basically the documentation of the role containing everything that helps the user to understand how the role can be used, what it expects to be already installed and so on. We will discuss it in more details later.
Creating a symbolic link
Most of the packages I install on a Debian-based Linux can be installed from an APT repository, but using the built-in apt
module that we already used before is not really interesting, so let's just jump to the interesting part. Sometimes, I just want to have an alias for a command, and that's where I will create a symbolic link like now to point to the pygmentize command. The built-in files module can create a symbolic link if the state field is "link".
- name: Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputs
become: true
ansible.builtin.file:
state: link
src: /usr/bin/pygmentize
dest: "{{ cli_tools_highlight_dest }}"
The destination could have been static, but I wanted to make it changeable, so I will have a default value for that in defaults/main.yml
.
Using multiple tasks files
Installing yq
will be complicated, but I don't want to complicate my main tasks file.
The built-in include_tasks module can load another tasks file and expects the name of the file and executes the tasks in it.
- name: Include tasks from another file
ansible.builtin.include_tasks: file.yml
It can be useful in different situations, but in this case, I didn't want to keep the most complicated installation process in the main file. The main.yml
can also be shorter this way. For more details about how this module can be used, don't forget to check the documentation I linked above.
The following code is the part of main.yml
in the cli_tools
role which shows all the 3 modules I used in the main.yml
, and also includes a block. Most of the tasks will be familiar since we used the APT module before and I also shared the symlink part, but the last task is an include.
roles/cli_tools/tasks/main.yml
- name: Install formatting tools for scripting and user friendly outputs
block:
- name: APT packages | Install formatting tools for scripting and user-friendly outputs
become: true
ansible.builtin.apt:
name:
- jq # to handle json files
- python3-pygments # to highlight codes with "pygmentize"
- name: Create "highlight" as a symbolic link to "pygmentize" | Install formatting tools for scripting and user-friendly outputs
become: true
ansible.builtin.file:
state: link
src: /usr/bin/pygmentize
dest: "{{ cli_tools_highlight_dest }}"
- ansible.builtin.include_tasks: yq.yml
I didn't use the "name" parameter in the last task, because the tasks in the included file will have names, so it wouldn't really help to understand the role better and wouldn't add more value to the logs either. It was not my idea. I named every single task until I read about this point of view and I agreed. Unfortunately, I don't have a link to the source.
Install the latest yq from GitHub
Using the GitHub API to get the latest release
We can finally discuss the most interesting part. I want to install "yq" from GitHub, which will require two more default variables in defaults/main.yml
:
cli_tools_yq_version:
cli_tools_yq_dest: /usr/local/bin/yq
The version number is empty, which will mean that I want to install the latest version. I tried to find a link directly to the latest release, but it turned out, there was no such link. However, the GitHub API can tell us which one is the latest. If you just want to get the URL to download the latest version, you can try the following in the terminal:
curl -sL https://api.github.com/repos/mikefarah/yq/releases/latest
It will return a json which is too long to show it, but let's see the relevant part:
{
"html_uri": "https://github.com/mikefarah/yq/releases/tag/v4.40.5",
"assets": [
{
"browser_download_url": "https://github.com/mikefarah/yq/releases/download/v4.40.5/yq_linux_amd64"
},
{
"browser_download_url": "https://github.com/mikefarah/yq/releases/download/v4.40.5/yq_darwin_arm64"
}
]
}
This will be really important, because it has all the information we need, and it has it more than once.
Let's see how you can call the API endpoint from Ansible:
- name: Get latest version info as json
when: cli_tools_yq_version | default('', true) == ''
ansible.builtin.uri:
url: https://api.github.com/repos/mikefarah/yq/releases/latest
register: _yq_latest
The built-in uri module allows us to call the endpoint and save the json response into a variable. Of course we want to do that only if the requested version number is empty, that's why we compare the version number to an empty string.
Get the version number of the latest release
In the previous section, you could see that we could get the download url from the json response, which contains the version number, the architecture and also the operating system. The response also shows that these are the only differences in the download URLs. The download URL is the only thing we need, but sometimes we want to specify the version number instead of getting the latest version. So instead of using the above information to filter to the URL that we know exactly how it looks like, we can just build the URL from scratch. The first important part of that URL is the version number, but the version number can also be found in the html_url
field, which does not require to list the release files.
Assuming you already have jq on the server, you can run the following:
curl 'https://api.github.com/repos/mikefarah/yq/releases/latest' -s \
| jq -r '.html_url' \
| xargs -- basename \
| sed 's/^v//'
Output:
4.50.5
We need to the version number in Ansible. We registered the json response in _yq_latest
. It will have a property called "json", which is not a string. It is in fact a decoded version of the json string, since The "uri" module recognized json in the HTTP response header. The above bash command can be replaced with the following Jinja template in Ansible:
_yq_latest_version_number: "{{
_yq_latest.json.html_url
| basename
| regex_replace('^v(.*)', '\\1')
}}"
We also used a very simple regular expression telling Ansible to remove the leading "v" from the version number. Removing the "v" is not really important. It was just my preference to work with only the numbers.
We now have the latest version number, and we know that we want to use that as the default value and also be able to override it. This is how you do it:
_yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
Get the architecture and operating system of the server
The next important thing after the version number is the release name. The release always starts with "yq_" followed by the operating system and the architecture. We will need the uname
command to get name of the operating system (darwin on macOS and linux on Linux) and the arch
command to get the CPU architecture. Unfortunately, amd64 can also be called x86_64 and arm64 can also be called aarch64, so let's use sed
to fix that.
uname
arch
Output:
Linux
x86_64
While we could use the uname
and the arch
commands to get the operating system and the CPU architecture in the terminal, we can use facts in Ansible. Since we disabled the fact gathering, we have to use the built-in setup module to get the architecture and the operating system.
- name: Collect architecture facts
ansible.builtin.setup:
gather_subset: architecture
After that you can get operating system and the architecture from the ansible_facts
variable.
- vars:
info:
os: "{{ ansible_facts.system }}"
arch: "{{ ansible_facts.architecture }}"
debug:
var: info
Although I prefer using ansible_facts
, so I can search for where I'm using facts, you could use the variables prefixed with ansible_
.
- vars:
info:
os: "{{ ansible_system }}"
arch: "{{ ansible_architecture }}"
debug:
var: info
Saving helper variables in addition to defaults
Variables in Ansible can be defined in many places. In a role, we can have defaults, but we can also have variables which are not for changing them (although we could change them too), but only for organizing our templates, so we don't have to define all the variables in the tasks files.
The architecture and the operating system is the two most important pieces of information to build the final URL. We have to convert those to a format that can be used in the download URL.
uname | tr '[:upper:]' '[:lower:]'
arch \
| sed 's/x86_64/amd64/' \
| sed 's/aarch64/arm64/'
We converted the nme of the operating system to lowercase, and replaced the architecture with the alternative names. Yes, in this project we support only these two.
Output:
linux
amd64
In Ansible, we will save the templates in vars/main.yml
, so it is another folder called "vars/" at the same level as "defaults/".
cli_tools_yq_archs:
x86_64: amd64
amd64: amd64
aarch64: arm64
arm64: arm64
cli_tools_yq_os: "{{ ansible_facts.system | lower }}"
cli_tools_yq_arch: "{{ cli_tools_yq_archs[ansible_facts.architecture] }}"
cli_tools_yq_release_name: "{{ 'yq_' + cli_tools_yq_os + '_' + cli_tools_yq_arch }}"
This is how we will always get arm64 or amd64. Since we get the OS name from fact, it would also work on macOS. It doesn't mean the whole role would work, since we also use the APT package manager, but you could try to move the yq installation into a separate role. Whether you want to use multiple tasks files or a new role, it's up to you.
The last thing we did was defining the full release name.
Installing the desired version of yq
We finally have all the information the build the download url:
_url_base: https://github.com/mikefarah/yq/releases/download/
_url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
We can use the following task, but in this case, we choose the built-in get_url module instead of uri.
- name: Install yq
become: true
failed_when: _yq_install.status_code not in [200, 304]
vars:
_yq_latest_version_number: "{{
_yq_latest.json.html_url
| basename
| regex_replace('^v(.*)', '\\1')
}}"
_yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
_url_base: https://github.com/mikefarah/yq/releases/download/
_url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
ansible.builtin.get_url:
url: "{{ _url }}"
dest: "{{ cli_tools_yq_dest }}"
owner: root
group: root
mode: 0775
force: true
There is one parameter I have to explain.
force: true
Without this parameter wo couldn't update an already installed yq. It tells Ansible to override the downloaded file.
Skip downloading when the existing version is the desired one
Previously, we always overwrote the installed version, which required downloading the file every time. To avoid that we need the version of the already installed yq, and do that only if it is already installed.
To find out if the file is already downloaded, we can use the built-in stat module.
- name: Check if {{ cli_tools_yq_dest }} exists
ansible.builtin.stat:
path: "{{ cli_tools_yq_dest }}"
register: _yq_existing_dest_check
Now the boolean _yq_existing_dest_check.stat.exists
variable tells you whether it exists or not. In the terminal, you would get the version number like this:
yq --version
Output:
yq (https://github.com/mikefarah/yq/) version v4.40.5
It's not just a version number, so we will use regular expression again, but first we get the version info in Ansible:
- name: Get the version information of the existing yq command
changed_when: false
when: _yq_existing_dest_check.stat.exists
ansible.builtin.command: "{{ cli_tools_yq_dest }} --version"
register: _yq_existing_version_info
I used the cli_tools_yq_dest
parameter so the task will work even if the path of the base folder is missing from the PATHS environment variable.
We need to apply the following filter on the version info:
regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1')
As a template variable:
_yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"
We will also need to add the following condition to the task:
when:
- not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number
The final task is below:
- name: Install yq
become: true
when:
- not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number
vars:
_yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"
_yq_latest_version_number: "{{
_yq_latest.json.html_url
| basename
| regex_replace('^v(.*)', '\\1')
}}"
_yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
_url_base: https://github.com/mikefarah/yq/releases/download/
_url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
ansible.builtin.get_url:
url: "{{ _url }}"
dest: "{{ cli_tools_yq_dest }}"
owner: root
group: root
mode: 0775
force: true
Full yq tasks file
Now let's see how yq.yml
looks like:
roles/cli_tools/tasks/yq.yml
- name: Collect architecture facts
ansible.builtin.setup:
gather_subset: architecture
- name: Get latest version info as json
when: cli_tools_yq_version | default('', true) == ''
ansible.builtin.uri:
url: https://api.github.com/repos/mikefarah/yq/releases/latest
register: _yq_latest
- name: Check if {{ cli_tools_yq_dest }} exists
ansible.builtin.stat:
path: "{{ cli_tools_yq_dest }}"
register: _yq_existing_dest_check
- name: Get the version information of the existing yq command
changed_when: false
when: _yq_existing_dest_check.stat.exists
ansible.builtin.command: "{{ cli_tools_yq_dest }} --version"
register: _yq_existing_version_info
- name: Install yq
become: true
when:
- not _yq_existing_dest_check.stat.exists or _yq_existing_version_number != _yq_desired_version_number
vars:
_yq_existing_version_number: "{{ _yq_existing_version_info | regex_replace('.*version v(\\d+\\.\\d+\\.\\d+).*', '\\1') }}"
_yq_latest_version_number: "{{
_yq_latest.json.html_url
| basename
| regex_replace('^v(.*)', '\\1')
}}"
_yq_desired_version_number: "{{ cli_tools_yq_version | default(_yq_latest_version_number, true) }}"
_url_base: https://github.com/mikefarah/yq/releases/download/
_url: "{{ _url_base }}v{{ _yq_desired_version_number }}/{{ cli_tools_yq_release_name }}"
ansible.builtin.get_url:
url: "{{ _url }}"
dest: "{{ cli_tools_yq_dest }}"
owner: root
group: root
mode: 0775
force: true
Run the final playbook
./run.sh playbook-system-base.yml \
-e config_apt_update=true
Documenting Ansible roles
When you write an Ansible role, you can forget about the parameters and how you can use them. You can forget about some requirements which are needed before you use the role. It is a good practice to have a README file in the root folder of the role. If you want to share the role, it is even more important.
The README file could have any structure, but the recommended one is the following markdown structure:
text
role_name
=========
Description
Requirements
------------
List of requirements like the supported operating systems
Role variables
--------------
```yaml
role_variable: value
```
Description of the above variable
Dependencies
------------
List of dependencies like other roles
Example playbook
----------------
```yaml
- hosts: all
roles:
- role: role_name
role_variable: value
```
License
-------
The name of the license
Author information
------------------
Your name or the name of your team and optional email address.
The description part is usually short, but I thought it would be a good idea to describe all the tools that the role would install, so mine is really long. I don't want to share the whole documentation, but you can find it on GitHub.
Conclusion
This is how a very simple task becomes a very complicated. I wanted to show you what command line tools I usually install on my Linux servers, which become a separate article. In Ansible, it required talking about Ansible facts and organizing our variables better. In my original role, I never overwrote the existing yq binary, and when I needed a new version, I could just remove the binary on the server and rerun the playbook. If you have many servers, it is better to automatically checking whether you have the desired version or not. It also demonstrated what it means to detect the existing state if there is no module to do that for you.
Now that we have a role to install the most important command line tools, we can reuse it later. For example, when we use Ansible to run new virtual machines in which we also want to have these tools and more. Coming soon in a following tutorial.
The final source code of this episode can be found on GitHub:
https://github.com/rimelek/homelab/tree/tutorial.episode.8
README
This project was created to help you build your own home lab where you can test your applications and configurations without breaking your workstation, so you can learn on cheap devices without paying for more expensive cloud services.
The project contains code written for the tutorial, but you can also use parts of it if you refer to this repository.
Tutorial on YouTube in English: https://www.youtube.com/watch?v=K9grKS335Mo&list=PLzMwEMzC_9o7VN1qlfh-avKsgmiU8Jofv
Tutorial on YouTube in Hungarian: https://www.youtube.com/watch?v=dmg7lYsj374&list=PLUHwLCacitP4DU2v_DEHQI0U2tQg0a421
Note: The inventory.yml file is not shared since that depends on the actual environment so it will be different for everyone. If you want to learn more about the inventory file watch the videos on YouTube or read the written version on https://dev.to. Links in the video descriptions on YouTube.
You can also find an example inventory file in the project root. You can copy that and change the content, so you will use your IP…
Top comments (0)