How to Manage Terraform State in AWS

#cloud #terraform #aws #devops

You have probably noticed that every time you run the terraform apply or terraform plan commands, Terraform somehow identifies which resources have been created, which have not, and the differences with respect to the configurations. The response is the Terraform state, which is a JSON file containing all the relevant information about the managed resources and their current state.

Code on GitHub

https://github.com/jorgetovar/terraform-aws-remote-state

You may have also noticed that, in order to work as a team, we usually need to share this state. At the beginning of a project with IaC, we usually save the Terraform state in the GitHub repository. However, this is a problem because the Terraform state may contain sensitive information, such as secrets, that we do not want to be public.

In addition, resolving conflicts between infrastructure changes can be a headache. In the past, I made this mistake partly because we did not use a pipeline to deploy resources; instead, we did it from our local environment.

Problems with the local terraform.tfstate 😵

The file that is generated locally by default is terraform.tfstate, and the problem in projects with multiple team members is that somehow we have to share this file. Another problem is race conditions; in fact, two engineers can deploy simultaneously, causing inconsistencies and corruption of the state file.

Finally, it is important to isolate our state file in relation to the deployed environments. It was a challenging project at the time, but as soon as we moved the state to AWS, generated locks with DynamoDB, and executed from a pipeline with strictly necessary permissions, everything returned to normal, and it was even enjoyable to make changes to the infrastructure.

Role and privileges required to create the Remote state 🤖

To create our remote state, we need to create a role with the necessary privileges, a table in DynamoDB, and finally an S3 bucket.

Let's break down the code step by step:

First, a data block named aws_caller_identity is defined, which is used to obtain the identity of the one executing the current infrastructure changes. This is used later to define the role's permissions.

data "aws_caller_identity" "current" {}

Next, a block of local variables is defined. In this case, a local variable named principal_arns is defined. This variable will be assigned to var.principal_arns if it has a value; otherwise, it will be assigned to an array with the ARN (Amazon Resource Name) obtained from the aws_caller_identity data block. This variable will be used later to specify who can assume this role.

locals {
  principal_arns = var.principal_arns != null ? var.principal_arns : [data.aws_caller_identity.current.arn]
}

Next, a resource of type aws_iam_role is defined, which is the IAM role we are creating. It is given a name based on the local.namespace variable, and in the assumption policy, it is specified which entities (in this case, the ARNs stored in local.principal_arns) can assume this role.

resource "aws_iam_role" "iam_role" {
  name = "${local.namespace}-tf-assume-role"

  assume_role_policy = <<-EOF
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Principal": {
              "AWS": ${jsonencode(local.principal_arns)}
          },
          "Effect": "Allow"
        }
      ]
    }
  EOF

  tags = {
    ResourceGroup = local.namespace
  }
}

Then, a data block named aws_iam_policy_document is defined, which is used to specify the permissions that will be granted to the entity or service assuming the role. Three policy statements are defined here to allow access to specific resources, such as listing an S3 bucket, accessing objects in the S3 bucket, and accessing a DynamoDB table.

data "aws_iam_policy_document" "policy_doc" {
  statement {
    actions = [
      "s3:ListBucket"
    ]

    resources = [
      aws_s3_bucket.state_bucket.arn
    ]
  }

  statement {
    actions = [
      "s3:GetObject",
      "s3:PutObject",
      "s3:DeleteObject"
    ]

    resources = [
      "${aws_s3_bucket.state_bucket.arn}/*",
    ]
  }

  statement {
    actions = [
      "dynamodb:GetItem",
      "dynamodb:PutItem",
      "dynamodb:DeleteItem"
    ]
    resources = [aws_dynamodb_table.state_lock_table.arn]
  }
}

Next, a resource of type aws_iam_policy is created, which is the IAM policy that contains the permissions defined in the previous data block.

resource "aws_iam_policy" "iam_policy" {
  name   = "${local.namespace}-tf-policy"
  path   = "/"
  policy = data.aws_iam_policy_document.policy_doc.json
}

Finally, the created policy is attached to the IAM role using the aws_iam_role_policy_attachment resource.

resource "aws_iam_role_policy_attachment" "policy_attach" {
  role       = aws_iam_role.iam_role.name
  policy_arn = aws_iam_policy.iam_policy.arn
}

Resources of the Remote state 💾

aws_s3_bucket: Creates an S3 bucket in AWS. It is given a name based on local.namespace, and it is specified whether the bucket should allow forceful deletion or not, using the value of var.force_destroy_state. It is also assigned a tag ResourceGroup based on local.namespace.
aws_s3_bucket_server_side_encryption_configuration: Configures server-side encryption for the previously created S3 bucket. A rule is defined to apply default server-side encryption to all objects in the bucket using a KMS (Key Management Service) key specified in aws_kms_key.kms_key.arn.
aws_s3_bucket_versioning: Enables versioning for the previously created S3 bucket. The versioning_configuration with status = "Enabled" indicates that object versioning is enabled on the bucket.
aws_s3_bucket_public_access_block: Configures public access block for the S3 bucket. This ensures that certain settings do not allow public

access to objects in the bucket, preventing unwanted public policies or permissions.

aws_dynamodb_table: Creates a DynamoDB table in AWS. The table is named based on local.namespace and has a hash key called "LockID" of type "S" (string). The billing mode is set to "PAY_PER_REQUEST," meaning you only pay for the operations performed. It is also assigned a tag ResourceGroup based on local.namespace.

resource "aws_s3_bucket" "state_bucket" {
  bucket        = "${local.namespace}-state-bucket"
  force_destroy = var.force_destroy_state

  tags = {
    ResourceGroup = local.namespace
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "example" {
  bucket = aws_s3_bucket.state_bucket.id

  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.kms_key.arn
      sse_algorithm     = "aws:kms"
    }
  }

}

resource "aws_s3_bucket_versioning" "versioning_example" {
  bucket = aws_s3_bucket.state_bucket.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_public_access_block" "s3_bucket" {
  bucket                  = aws_s3_bucket.state_bucket.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "state_lock_table" {
  name         = "${local.namespace}-state-lock"
  hash_key     = "LockID"
  billing_mode = "PAY_PER_REQUEST"
  attribute {
    name = "LockID"
    type = "S"
  }
  tags = {
    ResourceGroup = local.namespace
  }
}

Output 📚

description: It is a description of what this output contains. It provides information about the values that will be included in the output. In this case, it mentions that the output will contain configuration details, such as the name of the created S3 bucket, the AWS region of the S3 bucket, the ARN of the IAM role created for the backend, and the name of the created DynamoDB table for locking.
value: It is the actual value that will be included in the output. Here, a map is defined that contains the following fields:
- bucket: Contains the name of the created S3 bucket. This is obtained through the reference aws_s3_bucket.state_bucket.bucket, where aws_s3_bucket.state_bucket is the resource that creates the bucket, and .bucket refers to the "bucket" attribute of that resource.
- region: Contains the AWS region name where the S3 bucket was created. This is obtained through the reference data.aws_region.current.name, where data.aws_region.current is a data block that retrieves information about the current region, and .name refers to the "name" attribute of that data block.
- role_arn: Contains the ARN (Amazon Resource Name) of the IAM role created for the backend. This is obtained through the reference aws_iam_role.iam_role.arn, where aws_iam_role.iam_role is the resource that creates the IAM role, and .arn refers to the "arn" attribute of that resource.
- dynamodb_table: Contains the name of the DynamoDB table created for locking. This is obtained through the reference aws_dynamodb_table.state_lock_table.name, where aws_dynamodb_table.state_lock_table is the resource that creates the DynamoDB table, and .name refers to the "name" attribute of that resource.

Isolating the state by environment ♟️

To isolate the Terraform state and avoid conflicts when working with different environments (e.g., development, staging, production), there are two main approaches: using workspaces and designing the project layout.

Workspaces (Espacios de trabajo):

Terraform provides the concept of "workspaces" to handle multiple isolated instances of the configuration state. Each workspace is an independent copy of the state, allowing different configurations to coexist without interfering with each other.

To use workspaces:

Create a new workspace: You can create a new workspace with the terraform workspace new <nombre_workspace> command.
Switch workspaces: You can switch between workspaces with the terraform workspace select <nombre_workspace> command.
List workspaces: You can see a list of available workspaces with terraform workspace list.

It is important

to note that workspaces share the same configuration code, so you must be careful when sharing common resources between them to avoid conflicts.

Layout of the project:

The design of the project layout is a practice that involves organizing the Terraform code into different directories to isolate environments and components. Each directory contains its own configuration file and state.

For example:

project
|-- dev
|   |-- main.tf
|   |-- variables.tf
|   |-- ...
|-- staging
|   |-- main.tf
|   |-- variables.tf
|   |-- ...
|-- production
|   |-- main.tf
|   |-- variables.tf
|   |-- ...
|-- modules
|   |-- module-aws-community-builder
|   |   |-- main.tf
|   |   |-- variables.tf
|   |   |-- ...
|   |-- module-jt-state
|   |   |-- main.tf
|   |   |-- variables.tf
|   |   |-- ...

Each directory (dev, staging, production) represents a different environment and contains its own configuration file, variables, and may have its own Terraform state.

The design of the project layout provides a clearer isolation between environments and allows for greater flexibility in managing the state and configurations.

Conclusion 📖

When working with infrastructure as code (IaC), the importance of isolating, locking, and managing the state lies in the severe consequences that errors can have in this context. Unlike developing applications, errors in infrastructure code can affect all applications, databases, etc. Thus, it is crucial to include additional safety mechanisms when working with IaC.

provider "aws" {
  region = "us-west-2"
}

module "remote_state" {
  source  = "jorgetovar/remote-state/aws"
  version = "1.0.2"
}

output "state_config" {
  value = module.s3backend.config                      
}