DEV Community

Cover image for Terraform - Mastering Idempotency Violations - Handling Resource Conflicts and Failures in Azure
Marcel.L
Marcel.L

Posted on

Terraform - Mastering Idempotency Violations - Handling Resource Conflicts and Failures in Azure

Idempotency: The Backbone of Terraform

Welcome to a new Terraform series, Terraform ERRORS! In this series, we'll explore common errors and issues that you may encounter when working with Terraform and how to resolve them.

Idempotency is one of Terraform's most powerful features, ensuring that you can apply your infrastructure code multiple times and always get the same result. This consistency is essential for managing cloud resources in Microsoft Azure, and other providers, in addition permissions and RBAC can also be managed efficiently. But what happens when idempotency breaks or cannot be maintained due to various reasons outside of our control? How do we handle these violations and ensure that our infrastructure and configurations remain consistent, reliable and more robust to better handle these violations?

This series will focus mainly on idempotency violations and how to handle them when working with Terraform and Microsoft Azure. These errors are normally classed under a StatusCode=409 and can be difficult to troubleshoot and resolve as they do not show up in the Terraform plan, but will fail during the Terraform apply.

Let's dive in!


Common Idempotency Violations using Terraform

When idempotency breaks, it can lead to issues such as Duplicate Key/Entry Error, Resource Conflict Errors, or Already Exists Errors. Understanding what idempotency means in practical scenarios and knowing how to resolve these sorts of failures are crucial for maintaining reliable Infrastructure as Code. The main problem with some errors are that the terraform plan will not show any problems, but the deployment will fail when applied.

Here's a common example of an idempotency violation using Terraform when configuring RBAC/IAM on Microsoft Azure.

Role Assignment (RBAC) Already Exists

Scenario: You try to assign a role definition (RBAC/IAM permissions) on Azure, but get an error with StatusCode=409, stating that the assignment already exists. The most likely reason being that the permission was set outside of your Terraform configuration, or during a different deployment.

Example:

# Create a Resource Group
resource "azurerm_resource_group" "rg" {
  name     = var.resource_group_name
  location = var.location
  tags     = var.tags
}

# Write a resource creation of a user assigned managed identity
resource "azurerm_user_assigned_identity" "uai" {
  name                = "${var.resource_group_name}-uai"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
}

# Create a role assignment (twice to cause the violation)
resource "azurerm_role_assignment" "rbac" {
  count                = 2
  principal_id         = azurerm_user_assigned_identity.uai.principal_id
  role_definition_name = "Contributor"
  scope                = azurerm_resource_group.rg.id
}
Enter fullscreen mode Exit fullscreen mode

In the above example we simulate the violation by trying to create a role assignment with the same principal_id (of an user assigned identity), role_definition_name (Contributor access) and scope (Resource Group) twice by using count=2. As you can see the plan will not show any errors. However, during the deployment the contributor permission will only be granted to the user assigned identity on the resource group during the first iteration, and on the second iteration it will fail with a StatusCode=409 error.

Terraform Plan:

image.png

image.png

Error Message:

╷
│ Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="RoleAssignmentExists" Message="The role assignment already exists."
│
│   with azurerm_role_assignment.rbac[0],
│   on foundation_resources.tf line 22, in resource "azurerm_role_assignment" "rbac":
│   22: resource "azurerm_role_assignment" "rbac" {
│
╵
Enter fullscreen mode Exit fullscreen mode

As you can see the error message, it is clear that the role assignment already exists!

Status=409 Code="RoleAssignmentExists" Message="The role assignment already exists.
Enter fullscreen mode Exit fullscreen mode

Cause: In a real world scenario, this violation can happen when the role assignment was created outside of Terraform, for example by an Operations or Security team, or by Azure Policy to enforce certain security or operational conditions, or perhaps the permission was set as part of a previous Terraform configuration with a different state file. So when our current Terraform configuration tries to create the same role assignment again, it fails as the permission already exists.

Let's take a look at some strategies to handle these types of violations when they arise, and ensure that our Terraform configurations remain consistent and reliable.


Solution 1: Add Conditions using a variable flag/switch

One solution is to create a condition to control the azurerm_role_assignment. For example creating a flag/switch to be set to create the role assignment only if the switch is set to true, such as a variable called, create_role_assignment. This way we can avoid the violation by creating the role assignment only when needed.

variable "create_role_assignment" {
  description = "Flag to create the role assignment"
  type        = bool
  default     = false
}

resource "azurerm_role_assignment" "rbac" {
  count                = var.create_role_assignment ? 1 : 0
  principal_id         = azurerm_user_assigned_identity.uai.principal_id
  role_definition_name = "Contributor"
  scope                = azurerm_resource_group.rg.id
}
Enter fullscreen mode Exit fullscreen mode

The above code will only create the role assignment if the create_role_assignment variable is explicitly set to true. This way we can avoid the violation by creating the role assignment only when needed.

This method is useful when you want to create the role assignment conditionally but is somewhat limited as it will not work if you have multiple identities or multiple role definitions that need to be maintained.


Solution 2: Use the Terraform import block

Another method to apply is to import the permission into the current configuration. You would need to decide if you want to take over the management of the existing permission using terraform, if it was created outside of terraform.

To accomplish this you can use the import command, or code block to import the existing role assignment into your current Terraform configurations state file. This way you can avoid the violation by importing existing role assignments and manage them from your Terraform configuration onwards.

Let's take a look at how to do this using the import block, in the example below we will import existing role assignments that was created outside of Terraform into the current configuration.

Firstly we meed to establish the existing role assignments IDs in order to import them by using Az CLI or the Azure Portal.

# Log in to Azure
az login --tenant ${TENANT_ID}

# Get the object id of the user assigned identity (principal_id)
az identity show --name ${IDENTITY_NAME} --resource-group ${RESOURCE_GROUP_NAME} --query "{ObjectId:principalId}" -o tsv

# Get the ID/s of the existing role assignment/s (Resource level)
az role assignment list --assignee ${IDENTITY_NAME} --scope ${RESOURCE_ID} --query "[].id" -o tsv

# Get the ID/s of the existing role assignment/s (Resource Group level) ~ This is the one we want to import on this example ~
az role assignment list --assignee ${IDENTITY_NAME} --scope /subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RESOURCE_GROUP_NAME} --query "[].id" -o tsv

# Get the ID/s of the existing role assignment/s (Subscription level)
az role assignment list --assignee ${IDENTITY_NAME} --scope /subscriptions/${SUBSCRIPTION_ID} --query "[].id" -o tsv
Enter fullscreen mode Exit fullscreen mode

Since we want to import the existing role assignments at the Resource Group level, the Azure CLI command output will be structured as follows:

/subscriptions/<SUB_ID>/resourcegroups/<RESOURCE_GROUP>/providers/Microsoft.Authorization/roleAssignments/<ROLE_ASSIGNMENT_NAME>
Enter fullscreen mode Exit fullscreen mode

In this example we have 2 existing role assignments Contributor and Reader assigned at the Resource Group that we want to import.

/subscriptions/829efd7e-aa80-4c0d-9c1c-7aa2557f8e07/resourceGroups/Demo-Inf-Dev-Rg/providers/Microsoft.Authorization/roleAssignments/1a533459-6925-4770-9c4e-0d341ae69691
/subscriptions/829efd7e-aa80-4c0d-9c1c-7aa2557f8e07/resourceGroups/Demo-Inf-Dev-Rg/providers/Microsoft.Authorization/roleAssignments/38e0ac0b-8342-40d9-ba29-7bfc16de6352
Enter fullscreen mode Exit fullscreen mode

image.png

In our Terraform configuration we will create a locals block to store the existing role assignments and their IDs.

Next we'll add the import block to the azurerm_role_assignment resource to import the existing role assignments from the locals map.

Finally we'll create the azurerm_role_assignment resource to manage the role assignments from the Terraform configuration.

# 1. Create a locals map of the RBAC permissions on the Resource Group level
locals {
  role_assignments = {
    Reader      = "/subscriptions/829efd7e-aa80-4c0d-9c1c-7aa2557f8e07/resourceGroups/Demo-Inf-Dev-Rg/providers/Microsoft.Authorization/roleAssignments d5ee3efa-0ebe-44b7-a6ff-cdf1abc64418",
    Contributor = "/subscriptions/829efd7e-aa80-4c0d-9c1c-7aa2557f8e07/resourceGroups/Demo-Inf-Dev-Rg/providers/Microsoft.Authorization/roleAssignments/511b6d94-4d69-41bd-898d-1d6ce49a9834"
  }
}

# 2. Import the existing role assignments into Terraform's state file
import {
  for_each = local.role_assignments
  to       = azurerm_role_assignment.rbac[each.key] # Pay attention to the 'To' setting here, it defines the resource to import the existing role assignment into (next step)
  id       = each.value
}

# 3. Create the azurerm_role_assignment resource importing the existing role assignments
resource "azurerm_role_assignment" "rbac" {
  for_each             = local.role_assignments
  principal_id         = azurerm_user_assigned_identity.uai.principal_id
  role_definition_name = each.key
  scope                = azurerm_resource_group.rg.id
}
Enter fullscreen mode Exit fullscreen mode

As you can see in the example above, the Terraform Plan will use the import block to import the existing role assignments from the locals into the resource.

image.png

Now we can avoid the violation and the existing role assignments can be managed from the terraform configuration.

NOTE: Once the role assignments are imported into Terraform's state file, you can remove or comment out the import block from the configuration as it is only needed to import the existing role assignments into Terraform's state file and can now be managed from a terraform configuration.

This method is useful when you want to import existing role assignments into Terraform and manage them from there in your code, however it may not always be practical or possible due to other teams managing the permissions or the complexity of the permissions.


Solution 3: Use null_resource with a local-exec provisioner using az CLI to create the role assignment

In some cases you may not want to manage the existing role assignments in Terraform as it is maintained by someone else or Azure Policy, but still may need create role assignments conditionally for functionality of you code. For example you write a module that builds and AKS (Azure Kubernetes) cluster that attaches as User Assigned Managed Identity to the cluster, and have to give the identity access to an ACR (Azure Container Registry), but perhaps the ACR already have the identity permissioned by an Azure Policy or a different module deployment.

In such rare cases, you might want consider another way to handle the violation or skip existing permissions and only set them if necessary outside of the Terraform code.

Luckily, still using terraform you can accomplish this by using a resource called null_resource in combination with a local-exec provisioner to create role assignments. Let's look at how we can create the role assignment we need using Az CLI only when needed using this method.

In this example we will use a User Assigned Managed Identity to create the role assignment for Contributor and Reader on the Resource Group that already has Contributor permissions set, but not Reader.

# Create a null resource with a local-exec provisioner to create the role assignment for 'contributor' and 'reader' from a var.permissions list
# Using classic 'az' login to authenticate and create the role assignment
resource "null_resource" "rbac" {
  for_each = toset(["Contributor", "Reader"])
  triggers = {
    always_run = timestamp()
  }

  provisioner "local-exec" {
    command = <<EOT
      az login --service-principal --username $ARM_CLIENT_ID --password $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID --output none
      az account set --subscription $ARM_SUBSCRIPTION_ID --output none
      az role assignment create --assignee ${azurerm_user_assigned_identity.uai.principal_id} --role ${each.key} --scope ${azurerm_resource_group.rg.id}
    EOT
  }
}
Enter fullscreen mode Exit fullscreen mode

In the example above, by using az CLI inside of Terraform this way, the CLI will inherently skip any existing RBAC/IAM permissions (Contributor) and only create the permissions that are not there (Reader), this way we can avoid the violation by skipping existing role assignments and only creating missing ones we may need for functionality.

The downside to this method is that it uses az CLI, which may not be available in all environments or may require additional setup on the build agent as well as the changes made will be outside of Terraform not be persisted in the State File. This can lead to Drift and State Confusion if not managed properly.

IMPORTANT!: When using az CLI like this you need to be aware that you will need a way for your build agent to authenticate to Azure using az CLI and have the necessary permissions to create the role assignment. This can be done by setting up environment variables such as, ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_TENANT_ID and ARM_SUBSCRIPTION_ID on the build agent to use a service principal with the necessary permissions. As you can see from the command above, we are using a service principal to authenticate to Azure and then creating the role assignment.

# Authenticate to Azure using a service principal
az login --service-principal --username $ARM_CLIENT_ID --password $ARM_CLIENT_SECRET --tenant $ARM_TENANT_ID --output none
az account set --subscription $ARM_SUBSCRIPTION_ID --output none

# Create the role assignment using the az CLI
az role assignment create --assignee ${azurerm_user_assigned_identity.uai.principal_id} --role ${each.key} --scope ${azurerm_resource_group.rg.id}
Enter fullscreen mode Exit fullscreen mode

If you are using GitHub Actions, you can set these environment variables on your worker/runner/build agent in GitHub Secrets and use them in your workflows as variables.

image.png

Federated identities using OIDC or other methods can also be used to authenticate to Azure using the az CLI in the local-exec provisioner. For more details on how to authenticate to Azure using the az CLI, see Authenticate Azure CLI

image.png


Best Practices to Avoid Problems with Idempotency

  1. Import Existing Resources: Add unmanaged resources to Terraform's state before applying changes. but in some cases, this may not be possible or practical due to the complexity of the resource or the number of resources or perhaps teams involved in managing them when it comes to the business. For example if RBAC is managed by operations or security teams.
  2. Add Conditions: based on the data sources or variables to create resources conditionally.
  3. Ignore Unimportant Changes: Use lifecycle rules to avoid unnecessary updates and changes.
  4. Limit Provisioners: Only use local-exec provisioners for tasks Terraform can't handle natively or for last resort special cases.
  5. Plan Before Apply: Always run terraform plan before applying your configuration. This step helps you preview the changes Terraform will make, ensuring they align with your expectations. For beginners, planning is especially critical as it can catch common issues like misconfigurations or unintended resource changes before they happen. It's a simple but powerful way to avoid surprises and maintain control over your infrastructure. Always run terraform plan to preview changes and catch potential issues early. But remember that the plan will not show any errors for certain violations or conditions, so you will need to check the apply output for the error message in these cases.
  6. Sync with Cloud State: Use terraform refresh to update Terraform's state before applying changes.

Conclusion

Idempotency makes Terraform a reliable tool for managing cloud infrastructure. By understanding common problems and using the strategies in this blog, you can avoid errors and keep your infrastructure predictable. Whether you're working on Azure RBAC or other setups, these tips will help you write better Terraform configurations. With careful planning and good practices, you can ensure that Terraform runs smoothly and efficiently every time.

Have you faced idempotency problems in Terraform? Share your solutions in the comments!

If you enjoyed this post and want to learn more about Terraform and Azure, check out my other Terraform Series Terraform Pro Tips.

Author

Like, share, follow me on: 🐙 GitHub | 🐧 X/Twitter | 👾 LinkedIn

Top comments (0)