Terraform: Silent Resource Ownership Conflict

#terraform #aws #devops #infrastructureascode

Introduction

Today I'll talk about a more interesting and tricky problem, the Resource Ownership Conflict occurs when Terraform tries to manage a resource that is already managed by another Terraform configuration, or not managed by it at all.
In simple scenarios, Terraform might show an EntityAlreadyExists error, this is easy to resolve by renaming the resource or using a data source.
What about the silent ones? The one that does not show any error messages?

Background of my issue

Situation

I have two repositories.
On Repo 1: creates ECS cluster, Aurora postgres cluster and through a security group permits the ECS cluster to access the db. So we have some ingress rules -

  ingress {
    description     = "Allow Mobile API ECS to connect to Aurora"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [module.mobile_api_sg.id]
  }

  ingress {
    description     = "Allow bastion to connect to Aurora"
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [module.bastion_sg.id]
  }

Repo 2: is for ETL, here a Glue job needs to access the RDS db that is managed by repo 1. The simplest thing we can do is, inherit the db security group using a data source by name and add another rule to it. Just like this -

 # inherit rds sg
data "aws_security_group" "rds_sg" {
  filter {
    name   = "tag:Name"
    values = [local.db_sg_name]
  }
}

# add ingress rule
# Add ingress rule to RDS sg
resource "aws_security_group_rule" "postgres_ingress" {
  type                     = "ingress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  description              = "Allow Glue to access RDS"
  source_security_group_id = module.glue_sg.id                
  security_group_id        = data.aws_security_group.rds_sg.id
}

What went wrong?

Now unlucky for me, I did not see any error message. And to be honest there should not be any.
Since Repo1 owns the security group, it considers any rules added outside its configuration (e.g., from Repo2) as drift—and removes them on apply.

The obvious reason is, the state files. State file on the repo1 owns the RDS security group, and while creating the security group resource we added the rules inside it. So it will see any other changes as drift and will remove them. Drift - what you want vs what is actually present.
So it's a conflict created by proud me!

Solution to Silent Resource Ownership Conflict

Terraform keeps track of resources. So the solution would be simple. Create a security group resource, no rules/configs to it. Add rules in a separate resource block.

So, for Repo1 we will create an empty security group and add rules using aws_security_group_rule block as such -

resource "aws_security_group" "db_sg" {
  name        = local.db_sg_name
  description = "Aurora DB Security Group"
  vpc_id      = aws_vpc.vpc.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = local.db_sg_name
  }
}

# add the rules outside, using sg rule block -
resource "aws_security_group_rule" "admin_api_db_access" {
  type              = "ingress"
  description       = "Allow backend ECS to connect to Aurora"
  from_port         = 5432
  to_port           = 5432
  protocol          = "tcp"
  security_group_id = aws_security_group.db_sg.id
  source_security_group_id = module.admin_api_sg.id
}

# other rules just like that...

Repo2 remains as is.

How this solves the conflict/bug/error? When we define rules inside the aws_security_group block, it takes it as source of truth, now any rules not defined inside the block is a drift for it, so it will destroy those.
If we define the rules from outside the resource using aws_security_group_rule, terraform only cares about the aws_security_group_rule block, if that's there then the beast will remain silent and won't bother you.

It proved my suspicion when I ran a plan -

It said three resources to add, but did not say anything about the rules I removed from the aws_security_group block. It only cares about what it sees.

It's like convincing a child that no one is playing with it's toy.

Have fun practicing DevOps!!
See you in another one.