Mirek Sedzinski

Posted on Nov 20, 2020

A few practical Terraform tips

#ansible #terraform

Terraform is a popular, open-source infrastructure as a code software tool.
This article aims to present a few tips on how to use it, based on hands-on experience. Readers are assumed to have at least some level of Terraform working knowledge.

Teamwork - the State

Let's say we created a bunch of Terraform scripts. Most probably we keep them in the repository of our choice. By doing so, we can easily share them between team members.
The question arises: what about the State?

By default, it's stored in a file in a current working directory where Terraform was run. Should it be pushed to the repository together with Terraform scripts?
Actually, it's not a best idea. State file is machine-generated and there is a significant probability of frequent merge conflicts between different revisions. Those conflicts will have to be resolved by hand and it won't be easy.

There are two options to handle this:

Local state - state kept as a file in a shared location. Sharing can be achieved with the use of a network-attached storage. Or there can be one dedicated "builder" machine reused by the whole team.
Remote state - state kept on a remote storage. This is a feature of Backends, and there are several of them to choose from. What's good to check and be aware of is whether given Backend supports locking mechanism (for example Oracle Object Storage currently doesn't). Locking mechanism is a measure to avoid two or more different users accidentally running Terraform at the same time, and thus ensure that each Terraform run begins with the most recent updated State.

Teamwork - running the scripts

Whatever Backend we use, and regardless of whether it supports the locking mechanism or not, if two users run the same set of terraform scripts, which are out of sync, we are in a trouble.

Let's imagine a situation where two developers pull the same scripts from the repository. Developer A modify scripts by adding an additional Compute instance. She runs the scripts and the instance is provisioned. Shared state is updated.
A few minutes after that, Developer B runs his version of the scripts (which he didn't modify). Terraform compares the content of shared state with the content of the scripts and finds out that:

Instance was provisioned on the infrastructure [information from the State]
There is no instance in the current scripts

Based on the above, Terraform comes to conclusion that the Compute instance has to be decomissioned. Obviously, this is not what we expected.

To prevent such situations, one must make sure that Terraform is run always using up-to-date scripts. It can be done by defining a manual process or with a tool.
CI/CD pipeline or a job can be created for that purpose. Or specific service can be used like Resource Manager, that is part of OCI offering.

Organising the scripts

Two things should be taken into consideration here: avoiding redundancy and planning for efficient use.

For redundancy part, one should consider:

Moving common elements to modules to promote reusability
Using variables to parametrise the scripts

Side note
All sensitive data should be removed from the scripts and loaded from external variables. In this post I'm talking about one possible approach to do that in a safe way: Link

When it comes to efficiency, we should first reflect on how we are going to provision and decommission our infrastructure. Things that we want to provision/deprovision together should obviously go together in the scripts.
However, at the same time, we should keep in mind that in Terraform we usually use "everything or nothing" approach. In other words - either we provision everything or nothing. The same holds true for decommissioning. Of course, there are ways to narrow down the scope (the "-target" option can be used to focus Terraform's attention on only a subset of resources), but it should be treated more like and exception than a rule.
So, it's better to have a few independent set of scripts which we can run separately and orchestrate as needed, even if they are tightly coupled and pertain to the same piece of the software and infrastructure.

For example, let's say we want to provision Kubernetes cluster. Instead of putting everything into one big set of scripts, we can divide it into following components:

Scripts to provision identity provider
Load Balancer
Image registry
Control + data plane
Extensions like storage, cert manager, etc.

Each component, from the list above, is a complex thing. It's good to have a possibility to approach them separately or together, depending on a need.

Terraform vs Ansible

Frequent question is: which tool should be used, Ansible or Terraform?

To answer it let's first make a differentiation between management of:

Infrastructure - vm, storage, networking etc.
Configuration - software installed on top of the infrastructure

So, we can definitely use either tool to cover both areas. It especially makes sense for easy use cases. For example, we can go with Terraform only and use cloud-init/provisioners for configuration management.

However, in more complex situations, in my opinion it's good to use the tools for what they were originally designed for, which means: Terraform for infrastructure and Ansible for configuration management. It just makes things easier and more natural.

Idempotency

And last important advice: regardless of the tool used, scripts should be idempotent. It increases a bit effort to implement them (especially in case of Ansible) but pays off greatly later on.

DEV Community