Terraform Gotchas: Data Sources and depends_on

#aws #devops #terraform

Terraform is a very popular tool for implementing your cloud Infrastructure as code (IaC). It is only rivalled by the likes of Ansible, Opentofu, Pulumi, etc. If you've used it before, you may be familiar with data sources. In short, data sources allow you to retrieve data from your provider and perform a few specialised functions, as seen below:

data "aws_caller_identity" "current" {}

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::123456789:root"]
    }
  }
}

In the first example, we are using it to retrieve the account id from the default AWS provider. In the second example, we are creating a reusable policy document that can be assigned to any policy. You can even use data sources to archive files or make local files available in your configuration. It's quite the Swiss Army Knife.

However, when used in modules that depend on each other, you may have a problem where changes in one module trigger changes in unrelated resources in the second module. This was quite the head scratcher when I first encountered it, but it's yet another reason to be more precise when creating dependencies instead of relying on the depends_on meta-attribute between modules.

Let's Set the Stage

To demonstrate this, let's look at a project with one root module and two child modules (A and B) following the directory structure below:

project-root/
├── main.tf
├── moduleA/
│   ├── main.tf
└── moduleB/
    ├── main.tf

In moduleA/main.tf, create a CloudWatch log group and an SSM parameter to store a random value

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

resource "aws_cloudwatch_log_group" "app_log_group" {
  name = "/app/log_group"
}

resource "aws_ssm_parameter" "app_id" {
  name  = "/app/app_id"
  type  = "String"
  value = "1234567890"
}

In moduleB/main.tf, create a task definition that writes logs to the log group in module A

terraform {
  required_providers {
    aws = {
      source = "hashicorp/aws"
    }
  }
}

resource "aws_ecs_task_definition" "app_task" {
  family = "app_task"
  container_definitions = jsonencode([
    {
      name   = "app"
      image  = "node:latest"
      cpu    = 10
      memory = 512
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          awslogs-group  = "/app/log_group"
          awslogs-region = "eu-west-2"
        }
      }
    },
  ])
}

And then in main.tf, declare both modules:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5"
    }
  }
}

provider "aws" {
  region = "eu-west-2"
}

module "moduleA" {
  source = "./moduleA"
}

module "moduleB" {
  source     = "./moduleB"
  depends_on = [module.moduleA]
}

Note the presence of the depends_on meta-attribute in the module B declaration. It ensures that all moduleA resources are ready before moduleB resources are created. It's a quick and easy way to make sure the CloudWatch log group is available before the task definition is created.

Initialise your project

terraform init

Check the plan

terraform plan -out=tf.plan

You should see a plan with this at the bottom: Plan: 3 to add, 0 to change, 0 to destroy.
Apply it:

terraform apply tf.plan

If you change the ssm value in module A and rerun the plan, you would only see the ssm parameter in the changes. Now apply it:

  # module.moduleA.aws_ssm_parameter.app_id will be updated in-place
  ~ resource "aws_ssm_parameter" "app_id" {
        id              = "/app/app_id"
      + insecure_value  = (known after apply)
        name            = "/app/app_id"
        tags            = {}
      ~ value           = (sensitive value)
      ~ version         = 1 -> (known after apply)
        # (9 unchanged attributes hidden)
    }

This configuration clearly gets the job done. You can keep it as is and will always have a predictable output.

Where It Goes Wrong

However, as your project grows, you may need to introduce a data source, which could cause some unexpected behaviours to emerge. To demonstrate this, apply the following Terraform configuration in moduleB/main.tf. :

data "aws_caller_identity" "current" {}

resource "aws_iam_role" "app_role" {
  name = "app_role_${data.aws_caller_identity.current.id}"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ecs-tasks.amazonaws.com"
      }
    }]
  })
}

Now let's again change the ssm value and check the plan. You will notice that this time, apart from the SSM parameter changes, the IAM role is also being replaced, even though nothing about it or the data source has changed. But why?

Well, if you take a closer look at the Terraform plan output, you'll see a line that says # (depends on a resource or a module with changes pending). The depends_on meta attribute we set earlier caused Terraform to assume that the changes in module A might affect its value, although clearly not true, so it takes a conservative approach and triggers a change in the IAM role.

One potential solution I have come across involves specifying the resource in moduleA that module B depends on:

# in main.tf
depends_on = [module.moduleA.aws_cloudwatch_log_group]

I have not found this to work in practice; perhaps it might be due to the configuration being applied to a different cloud provider.

The Fix

A future-proof solution is to avoid the depends_on meta-attribute unless as a last resort. Terraform is very good at figuring out the order of resource creation as long as you reference the specific resource attributes you need. So in our project, instead of creating a sweeping dependency between the modules, we can be more precise by exporting the log group name, passing it as a variable to module B and then assigning it to the task definition. When Terraform reads the configuration, it creates a dependency between only those two resources and does not trigger unexpected changes in every plan

To implement this, export the log group in module A

output "log_group_name" {
  value = aws_cloudwatch_log_group.app_log_group.name
}

Create a variable for the log group name in module B

variable "log_group_name" {
  type = string
}

Replace the log group name in the task definition:

resource "aws_ecs_task_definition" "app_task" {
  family = "app_task"
  container_definitions = jsonencode([
    {
      name   = "app"
      image  = "node:latest"
      cpu    = 10
      memory = 512
      logConfiguration = {
        logDriver = "awslogs"
        options = {
          awslogs-group  = var.log_group_name # text replaced with variable here
          awslogs-region = "eu-west-2"
        }
      }
    },
  ])
}

Finally, remove the depends_on meta-attribute and set the variable in the module declaration

module "moduleB" {
  source         = "./moduleB"
  log_group_name = module.moduleA.log_group_name # reference the module A output here
  # remove depends_on meta-attribute
}

When you run the plan again, you should now only see the ssm parameter changes and no others.

Conclusion

The depends_on meta-attribute is a great tool to have. There are circumstances where a resource cannot be referenced, so explicitly setting the dependency is appropriate. It could also be a great quick fix if a more fleshed-out solution will take too much time due to legacy systems or some other reason. Ultimately, your infrastructure is there to serve a purpose, business or otherwise, so you have to weigh all your options and decide the best course of action. However, I highly recommend you only use it as a last resort to avoid surprises like the one described in this article.

I hope you found this useful. Please feel free to share any thoughts or ask any questions in the comments