DEV Community

Thomas H Jones II
Thomas H Jones II

Posted on • Originally published at thjones2.blogspot.com on

1

Increasing Verbosity of Ansible Jobs

Sometimes, Ansible doesn't have really native methods for installing and/or configuring some types of content. As a result, you may find yourself resorting to using Ansible's shell: or command: modules. – basically a shell-out or "escape" method to execute such tasks tasks. While these methods are ok for smaller tasks, for larger tasks they can expose you to a number of problems:

  • The shell-out can take quite a long time to return
  • The shell-out can leave you guessing what it's actually doing – leaving you wondering:
    • "is it actually still working"
    • "is it hung"
    • "is this going to leave me waiting forever or will it ultimately time out"
    • etc.
  • The shell-out can be uninformative.
  • The shell-out can return an incorrect status:
    • It may report a change where none actually occurred
    • It may report a success where there was partial- or even total-failure
    • It may report a failure where there was partial-success

While there's often not a lot that can be done about execution time – things take as long as they take – the other problems are addressable.

Trying to fix the "what's it actually doing" problem from within the shell-out, itself, often isn't meaningful: Ansible gathers up all the shell-output and only returns it once the shell exits. That said, improving your shell-out's output isn't a wholly-wasted effort: making your shell-out more verbose can help you if it does error-out; it can also provide greater assurance if you want to pore through the change: or ok: (success) output. This can help you with some of the "can be uninformative" problem, even if only after-the-fact.

Similarly, trying to fix the "return an incorrect status" problem strictly from within the shell-out doesn't necessarily provide a full solution. It can improve the overall reliability of the shell-out. However, it doesn't necessarily fix the status that Ansible uses when it tries to decide, "should I abort this run or should I continue" or any other contingent- or branching-logic you might want to implement.

Recently, I ran into these kinds of problems at one of my customer sites. They're a shop that has a significant percentage of their user-base that are data-science oriented. As such, they use Ansible to install the R language binaries along with a few hundred modules that their various users like to use. While they're an Ansible shop, they'd implemented the installation as a call to an external shell script. Ansible first pushes the script out and then executes it on the targets.

The script, itself, wasn't especially robustly-written: next to no error-handling or -reporting. It's basically a launch-and-pray kind of tool. On the plus side, they had thought ahead well enough to provide a mechanism for determining where in the shell-managed installation-process things had died. Basically, as the script runs, it creates a simple file containing a list of yet-to-be-installed modules from which you could infer that one or more of them had failed. That said, to see that file, you have to manually SSH to the managed-system and go view it.

Unfortunately, because the script is doing so many module-installs, it takes a really long time to execute (a few hours!). Because Ansible only reports script output when an invoked-script exits, Ansible pauses for a loooong time with no nerve-soothing output. And, as previously mentioned, if it does fail, you're stuck having to visit managed-systems to try to figure out why.

I'm not real tolerant of waiting for things to run with no output to tell me, "at least I know that it's doing something." So, I set out to refactor. While I'd hoped there was a native Ansible module for the task, my Google-fu wasn't able to turn anything up. So, I too resorted to the shell-escape method.

That said, upon looking at the external shell script they wrote, I realized it was an extremely simple script. As such, I opted to replace it with an equivalent shell: | block in the associated Ansible plays.

---
- name: Iteratively install R-modules
args:
executable: /bin/bash
changed_when: "'Added' in modInstall_result.stdout"
environment:
http_proxy: 'http://{{proxy_user}}:{{proxy_password}}@{{proxy_host}}:80/'
https_proxy: $http_proxy
failed_when: "modInstall_result.rc != 0 or 'had non-zero exit status' in modInstall_result.stderr"
register: modInstall_result
shell: |
if [[{{ item }} =~ ":"]]
then
PACKAGE="$( echo {{ item }} | cut -d : -f 1 )"
VERSION="$( echo {{ item }} | cut -d : -f 2 )"
VERSTRING="version = '${VERSION}',"
else
PACKAGE="{{ item }}"
VERSTRING=""
fi

Rscript --slave --no-save --no-restore-history -e "
if (! ('${PACKAGE}' %in% installed.packages()[,'Package'])) {
require(devtools);
install_version(package = '${PACKAGE}', ${VERSTRING} upgrade = 'never', repos=c('http://cran.us.r-project.org'));
print('Added');
} else {
print('Already installed');
}"
with_items: "{{ Rmodules.split('\n') }}"
...

The value of the refactored approach is that, instead of waiting hours for output, there's output associated with each installation-attempt. Further, the output is all captured on the host running the Ansible play: no having to visit managed systems to find something resembling a logfile.

Explaining the Play...

When I construct plays, I like to order the YAML alphabetically (with the exception of the "name" parameter – that always goes first). Which is to say, anything at a given directive-level will be ordered from A-Z. Some people prefer to put things in something more-resembling a functional order. I choose alphabetical because it makes it easier for me to cross-reference with the documentation.

  • "name" is fairly self-explanatory. It just provides a human-friendly indication of what Ansible is doing. In this case, iteratively installing R-modules (duh!).
  • "args": This can have a number of sub-parameters (I have yet to dig through the documentation or source to find all of them). I've only ever had use for the "executable" sub-parameter.
  • "changed_when": This parameter allows you to tell Ansible how to know that the shell-escape changed something. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
  • "executable": this allows you to explicitly tell Ansible which interpreter to use. I'm pedantic, so, I like to tell it, "use /bin/bash".
  • "environment": this allows you to set execution-specific environment-variables for the shell to use. In this case, I'm setting the "http_proxy" and "https_proxy" environmentals. This is necessary because the build environment is isolated and I'm trying to let R's in-built URL-fetcher pull content from public, internet-hosted repositories (see this vendor documentation for explanation). The customer doesn't have a full CRAN mirror, so, leveraging this installation method minimizes having to account for dependencies.
  • "failed_when": This parameter allows you to tell Ansible how to know that the shell-escape failed. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
  • "register": Used in this manner, it collects all of the inputs to and outputs produced by the shell and its exit code and store it in a JSON-formatted variable. In this case, the variable is named "modInstall_result". Data can be extracted via regular JSON-extraction methods
  • "shell": This is the actual code that Ansible will execute (via the previously-requested /bin/bash interpreter). I'll explain the block's contents, shortly
  • "with_items": This is one of the ways that Ansible allows you to run a given play iteratively. External to this play, I had defined the "Rmodules" variable to read in a text file – via Ansible's lookup() function – that contained one R module-name per line (and, optionally, an associated version-number). The "with_items" parameter-value is in the form of a list. The lookup() function originally created the "Rmodules" value as a single string with embedded newlines. Using the .split() function converts that string into a list. As Ansible iterates this list, each list-element is popped off and assigned to the temporary-variable "item".

Explaining the script-stanza:

The "shell:" module can be invoked as either a single line or as a block of text. Using the "|" as the value for the module tells the module that the following, indented block of text is to be treated as a single input-block. For readability of anything but very short script-content, I prefer the readability of the block of text.

The BASH content, itself, is basically two parts.

  • The first part takes the value of the iterated "item" temporary-value and parses it. If the string contains a colon, the string is split into defined PACKAGE and VERSION BASH-variables (with the VERSION BASH-variable being further expanded to an version-string statement suitable for use with the Rscript command). If the string does not contain a colon, the PACKAGE BASH-variable is set to the R module-name and the version-string BASH-variable is set to a null/empty value.
  • The second part is the Rscript logic. Using R's installed.packages() function, the existing R installation is checked for the presence of the R module-name contained in the PACKAGE BASH-variable. If not present, R's install_version() function is used, along with the PACKAGE BASH-variable and the version-string variable to install (an optionally versioned) R-module. The if check helps with idempotency – preventing attempts to reinstall the module and the not-inconsiderable time that reinstallation can take.

Note that, in order for the Rscript logic to work, the devtools (link goes to a specific, older version; other versions should work) R module must have been previously installed. In my case, this installation has been taken care of in a prior Ansible play (not presented here: the primary focus of this article was to illustrate how to use iteration to make for more-verbose configuration-management)

Side-note: yes, Galaxy likely has community-modules that could be leveraged to completely avoid shell-outs, but most of my customers don't have access to the Galaxy repositories. Similarly, some of my customers' Ansible deployments are such that creating plays that rely on non-core content could create problems when moving automation from dev up through test and prod.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more