There is nothing better than starting a new week with a full terraform infra refactoring using a clean (and shared) remote state.
What do I have?
- an old and functional terraform state
- a new code folder better organized
- an empty s3 bucket to store the new state
Starting with the clean and shared remote state:
variable "the_profile" {}
variable "region" {}
provider "aws" {
region = var.region
profile = var.the_profile
}
resource "aws_s3_bucket" "state_bucket" {
bucket = "my-uniquely-named-state-bucket"
acl = "private"
versioning {
enabled = "true" # this is a must
}
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "my-lock-table"
read_capacity = 1
write_capacity = 1
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
Creating this init structure will then allow us to properly configure our backend:
## Backend to keep status
provider "aws" {
region = var.region
profile = var.the_profile
}
terraform {
backend "s3" {
bucket = "my-uniquely-named-state-bucket"
key = "mystate.tfstate"
region = "INSERT_YOUR_REGION_HERE"
encrypt = "true"
dynamodb_table = "my-lock-table"
}
}
Things are starting to happen.
After some copy+paste oriented programing, I managed to place all my files from the old structure into the new and I'm ready do run my first plan.
And terraform said:
Plan: WAY TOO MANY to add, 0 to change, 0 to destroy.
( ok, it didn't say that, but that's what I read )
Adding the VPC, the Subnets, the Routing Tables, the Security Groups, the Instances and all the other "direct resources" was fairly easy. Nothing that a few queries here and there and some terraform import TERRAFORM_RESOURCE_NAME AWS_ID
didn't handle. And the easiest part was done.
Enter the Security Group Rules
This is where things started to get interesting.
Quoting directly from terraform aws_security_group_rule page,
Security Group Rules can be imported using the security_group_id, type, protocol, from_port, to_port, and source(s)/destination(s) (e.g. cidr_block) separated by underscores (_). All parts are required.
Luckily I had access to my previous functional terraform state. And with a little bit of python I could find and extract the required ID for each missing security group rule.
So, once more the
terraform import aws_security_group_rule.sg_allow_stuff sg-001122334455_ingress_tcp_8080_8080_sg-001122334455
Thing is, the plan continued to show things like:
-/+ resource "aws_security_group_rule" "sg_allow_stuff" {
- cidr_blocks = [] -> null
~ id = "sgrule-12345678" -> (known after apply)
- ipv6_cidr_blocks = [] -> null
- prefix_list_ids = [] -> null
~ self = false -> true # forces replacement
~ source_security_group_id = "sg-001122334455" -> (known after apply)
# (5 unchanged attributes hidden)
}
when I noticed that line
self = false -> true # forces replacement
and the fact that the source and destiny security group IDs were the same.
So, the self was the issue. No problem, just add it at the end of the AWS_ID part on the import command:
terraform import aws_security_group_rule.sg_allow_stuff sg-001122334455_ingress_tcp_8080_8080_sg-001122334455_self
And it imported!
And the stubborn Key
Two minutes (to midnight) after and all the security group rules were added into the new state.
There was only this aws_key_pair missing.
Ok! Let's go!
terraform import aws_key_pair.auth thekeyname
Ok! It imported as all the other resources! Nothing new so far...
But, the next apply, THAT key was still:
# aws_key_pair.auth will be created
+ resource "aws_key_pair" "auth" {
+ arn = (known after apply)
+ fingerprint = (known after apply)
+ id = (known after apply)
+ key_name = "thekeyname"
+ key_pair_id = (known after apply)
+ public_key = "#### EMPTY"
}
Oh, that "#### EMPTY". I forgot to add the resource on my side.
A few keystrokes after (now with the correct key), the same:
# aws_key_pair.auth will be created
+ resource "aws_key_pair" "auth" {
+ arn = (known after apply)
+ fingerprint = (known after apply)
+ id = (known after apply)
+ key_name = "thekeyname"
+ key_pair_id = (known after apply)
+ public_key = "ssh-rsa yaddayaddayadda..."
}
Hello google my old friend :)
Turns out I wasn't the first to hit this wall
Ok, so let's try one of the suggested solutions there.
- Download the new state from S3
- Edit the state, search for the entry corresponding to the key
- Enter the public key manually on the
public_key
field (not forgetting the quotes) - Save the state
- Upload it and ...
!!!RUN TERRAFORM!!! ( !!! RUN FOR YOUR LIVES !!! )
Sadly, Terraform says NO! :(
Error: Error loading state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: 5d3a61408eb4ba89aeaf09818561d0a7
Like anyone in trouble, I just focused on the first part of the message. "Things were looking bad" I thought. On top of that, I was so confident that I didn't even backup the state before altering it.
BUT ... Bucket versioning to the rescue!
aws s3api get-object --bucket my-uniquely-named-state-bucket --key mystate.tfstate --version abcdversionreferencehere mystate.tfstate
Of course this didn't solve the problem (as I figured out after a second upload). I decided to calm down and read everything.
DynamoDB Digest? Interesting. So, there was no big problem after all. Turns out the tfstate md5 digest was stored on dynamoDB and there was a couple of missing steps on my procedure:
- Download the new state from S3
- Edit the state, search for the entry corresponding to the key
- Enter the public key manually on the
public_key
field (not forgetting the quotes) - Save the state
- get the md5 sum of the state
- update the md5 digest on the DynamoDB terraform_state_lock table
- Upload the modified state
And everything worked like it was supposed to!
What did I learn?
- READ everything. Not just the begining
- Always BACKUP (or Enable versioning when possible)
- Google is your friend. (nothing really new here)
Top comments (0)