DEV Community

loading...

Taming the CUDA (Pt. II)

ferricoxide profile image Thomas H Jones II Originally published at thjones2.blogspot.com on ・3 min read

So, today, finally had a chance to implement in Ansible what I'd learned in Taming the CUDA.

Given that it takes a significant time to run the uninstall/new-install/reboot operation, I didn't want to just blindly execute the logic. So, I wanted to implement logic that checked to see what version, if any, of the CUDA drivers were already installed on the Ansible target. First step to this was as follows:

 - name: Gather the rpm package facts
   package_facts:
     manager: auto

This tells Ansible to check the managed-host and gather relevant package-information for the base cuda RPM and stuff the return of the action into a registered variable cuda_pkginfo. This variable is a JSON structure that's then referencable by subsequent Ansible actions. Since I'm only interested in the installed version, I'm able to grab that information by grabbing the cuda_pkginfo.results[0].version value from the JSON structure and using it in a when conditional.

Because I had multiple actions that I wanted to make conditional on a common condition, I didn't want to have a bunch of configuration-blocks with the same conditional statement. Did some quick Googling and found that, yes, Ansible does support executing multiple steps within a shared-condition block. You just have to use (wait for it...) the block statement in concert with the shared condition-statement. When you use that statement, you then nest actions that you might otherwise have put in their own, individual action-blocks. In my case, the block ended up looking like:

 - name: Update CUDA drivers as necessary
   block:
   - name: Copy CUDA RPM-repository definition
     copy:
     src: files/cuda-rhel7-11-0-local.repo-DSW
     dest: /etc/yum.repos.d/cuda-rhel7-11-0-local.repo
     group: 'root'
     mode: '000644'
     owner: 'root'
     selevel: 's0'
     serole: 'object_r'
     setype: 'etc_t'
     seuser: 'system_u'
   - name: Uninstall previous CUDA packages
     shell: |
         UNDOID=$( yum history info cuda | sed -n '/Transaction ID/p' | \
                   cut -d: -f 2 | sed 's/^[]\*//g' | sed -n 1p )
         yum -y history undo "${UNDOID}"
   - name: Install new CUDA packages (main)
     yum:
       name:
       - cuda
       - nvidia-driver-latest-dkms
       state: latest
   - name: Install new CUDA packages (drivers)
     yum:
       name: cuda-drivers
       state: latest
  when:
    ansible_facts.packages['cuda'][0].version.split('.')[0]|int < 11

I'd considered doing the shell-out a bit more tersely – something like:

yum -y history undo $( yum history info cuda | \
sed -n '/Transaction ID/p' | cut -d: -f 2 | sed -n 1p )

But figured what I ended up using was marginally more readable for the very junior staff that will have to own this code after I'm gone. Any way you slice it, though, I'm not super chuffed that I had to resort to a shell-out for the targeted/limited removal of packages. So, if you know a more Ansible-y way of doing this, please let me know.

I'd have also finished-out with one yum install-statement rather than the two, but the nVidia documentation for EL7 explicitly states to install the two groups separately. 🤷

Oh... And because I didn't want my when statement to be tied to the full X.Y.Z versioning of the drivers, I added the split() method so I could match against just the major number. Might have to revisit this if they ever reach a point where they care about the major and minor or the major, minor and release number. But, for now, the above suffices and is easy enough to extend via a compound when statement. Similarly, because Ansible defaults to string-output, I needed forcibly cast the string-output to an integer so that numeric comparison would work properly.

Final note: I ended up line-breaking where I did because yamllint had popped "too wide" alerts when I ran my playbook through it.

Discussion

pic
Editor guide