DEV Community

Migrating to Terraform: a retrospective

Summary:

At 15Five we migrated to Terraform in 8 months with a 3-person team. It went smoothly and gave us more confidence in our AWS infrastructure. When it came to setup a new VPC for a customer, the hours we invested returned to us in a speedy and simple setup, a process that would have been a massive pain if we had done it manually.

The Stats:

  • Lines of code: ~16,000
  • Number of commits: 903
  • PR's: 514
  • The team: Paul (principal engineer), James (devops manager), Caleb (SRE)

The Timeline:

2019.09 - I played around with Terraform for the first time in a company hackathon.
2019.09 - I used Terraform to do my first AWS change.
2019.11 - James joins team & starts work on terraform. Repo is split into a main terragrunt repo and a terraform repo for our various modules. At this point we only have two modules. CI is setup.
2020.01 - we migrated our first environment
2020.04.28 - preview environment migration attempt 1
2020.04.29 - preview migration attempt 2 ✔️
2020.05 - staging migration
2020.06 - solitary customer VPC migration
2020.06 - production migration

Takeaways:

Caleb:

Pain points:

  • Having two different repos is a bit awkward. I feel that having a monorepo or more CI would have been better, as the lack of CI feedback when changing code in the modules repo slowed things down. We'll probably be moving to a monorepo in the future.

What went well:

  • Hackathons are good for inspiring change.
  • Checklists are great. It simply feels good to check things off, for one. It also makes you think of all the items to do ahead of time and makes sure that you don't forget anything. Hospitals have used checklists to massively lower infection rates, although the results depend on how effectively they are used.
  • Terraform is great for developing a deep understanding of all the services involved in your infrastructure. By describing your infrastructure in code you develop a mental model of all the components involved, and have a written record of the tiny fiddly settings that might normally be hidden away behind some unintuitive UI icons.
  • Migrating production went pretty smooth, that's worth its weight in gold.

James:

Pain Points:

  • The monolith structure of Terraform modules results in very large artifacts - it would have preferable to break it down into multiple git repositories, one repo for each module.

What went well:

  • We have been able to upgrade Terraform without incident.
  • We were able to develop 30-40% faster because of having a small agile team without overly restrictive bureaucracy.

Paul:

Pain points

  • We should have pinned the Terraform version in version control to prevent accidental upgrade and to keep everyone on the same version (we ended up doing this later on with tfenv and a .terraform-version file).
  • None of the solutions for handling multiple environments seemed ideal, but James ended up solving this with Terragrunt once he came on board.

What went well?

  • It was a strong relief to have Terraform handle the setup of the environment. You just had to click a button, walk away, and the infrastructure would be ready by the time you came back.
  • Using Terragrunt was a very clean way of configuring infrastructure across multiple environments.
  • It checked off a lot of boxes for compliance and disaster recovery.

What could we have done better?

  • In retrospect I could have handed off more product work to other people to have more time to write Terraform and ensure best practices.
  • We could have had nightly test runs - spinning up and tearing down infrastructure in another region to be confident it would always work. Metrics such as timing and money spent in setup would have been useful. The more metrics the better.

Any other thoughts?

  • I wish I knew Terraform wrappers like Palumi were an option. However, it probably would have been overkill at the time.
  • I have found that it saves more time to configure a cloud service with Terraform first and then checking UI. Doing it in the UI first and then translating that to Terraform is slower.

Top comments (0)