DEV Community

Cover image for Provisioning an azure kubernetes cluster with Terraform

Posted on

Provisioning an azure kubernetes cluster with Terraform

In this post you will learn how to set up an Azure Kubernetes cluster using Terraform.

NOTE: This article assumes some basic knowledge of cloud concepts and the Microsoft Azure platform

Terraform ??

Terraform is an Infrastructure as code tool that allows developers and operations teams to automate how they provision their infrastructure.

Why write more code for my infrastructure ?

If you are new to Infrastructure as code it could seem like an extra step , when you could just click a few buttons on you cloud provider of choice's dashboard and be on your way. But IaC(Infrastructure as code) offers quite a few advantages.

  1. Because your infrastructure is now represented as code it is testable
  2. Your environments are now very much reproducible
  3. You can now track changes to your infrastructure over time with a version control system like Git
  4. Deployments are faster,because you interact with the cloud provider less.

Before diving into Terraform you need a brief understanding of

Hashicorp configuration language(HCL).


Yes Terraform uses a it's own configuration language, this may seem daunting at first but it's quite easy. Here's a quick peek at what it looks like.

resource "azurerm_resource_group" "resource-group" {
  name     = "staging-resource-group"
  location = "West Europe"
Enter fullscreen mode Exit fullscreen mode

Your infrastructure in Terraform are represented as "resources", everything from networking to databases or virtual machines are all resources.

This is exactly what the resource block represents. Here we are creating an azurerm_resource_group as the name implies , it's a resource group. Resource groups are how you organize resources together, typical use case would be putting all your servers for single project under the same resource group.

Next we give the resource block a name , think of this as a variable name we can use throughout the Terraform file. Within the resource block we give our resource group a name, this is the name that would be given to our resource group in Azure. Finally we give a location where we want the resource group to be deployed.

If you are coming from something like Ansible you might notice how different Terraform's approach to configuration is, this is because Terraform uses whats known as an imperative style of configuration, simply put. In an imperative style of configuration you declare the state you want your infrastructure in and not how you want to achieve that state. You can learn more about declarative and imperative configuration here

Now that you have an idea of what Terraform configuration looks like lets dive in.

Project setup


Once you have all that setup, login to your Azure account through the command line using the following command

$ az login
Enter fullscreen mode Exit fullscreen mode

Next clone the sample project.

$ git clone
Enter fullscreen mode Exit fullscreen mode

Before we begin we need to run terraform init. This would download any plugins that the Azure provider depends on.

$ terraform init
Enter fullscreen mode Exit fullscreen mode

Taking a quick look at our folder structure you should have something like this.

├── modules
│   └── cluster
│       ├──
│       └──

2 directories, 6 files
Enter fullscreen mode Exit fullscreen mode

Starting from the top lets look at

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "2.39.0"
provider "azurerm" {
  features {}

module "cluster" {
  source                = "./modules/cluster"
  ssh_key               = var.ssh_key
  location              = var.location
  kubernetes_version    = var.kubernetes_version

Enter fullscreen mode Exit fullscreen mode

First we declare what Provider we are using, which is how Terraform knows what cloud platform we intend on using , this could be Google cloud , AWS or any other provider they support. You can learn more about Terraform providers here. Its also important to note that each provider block is usually in the documentation so you don't need to write this out each time.

Next we define a module block and pass it the folder where our module is located and a few variables.


Modules in Terraform are a way to separate your configuration so each module can handle a specific task. Sure we could just dump all of our configuration in but that makes things clunky and less portable.

Now lets take a look at the cluster folder in modules directory.


0 directories, 2 files
Enter fullscreen mode Exit fullscreen mode

Lets take a look at


resource "azurerm_resource_group" "aks-resource" {
    name = "kubernetes-resource-group"
    location = var.location

resource "azurerm_kubernetes_cluster" "aks-cluster" {
    **name = "terraform-cluster"
    location = azurerm_resource_group.aks-resource.location
    resource_group_name =
    dns_prefix = "terrafo**rm-cluster"
    kubernetes_version = var.kubernetes_version

    default_node_pool {
      name = "default"
      node_count = 2
      vm_size = "Standard_A2_v2"
      type = "VirtualMachineScaleSets"

  identity {
    type = "SystemAssigned"

    linux_profile {
        admin_username = var.admin_user
        ssh_key {
            key_data = var.ssh_key

    network_profile {
      network_plugin = "kubenet"
      load_balancer_sku = "Standard"

Enter fullscreen mode Exit fullscreen mode

in the first part of the part of the configuration we define a resource group for our cluster and cleverly name it "kubernetes-resource-group", and give it a location which would come from a variable which is defined in The next part are the actual specs of our kubernetes cluster. First we tell Terraform we want an azure kubernetes cluster using resource "azurerm_kubernetes_cluster" , then we give our cluster a name , location and a resource group. We can use the location of the resource group we defined earlier by using the reference name aks-resource plus the value we want. In this case it's the location so we use aks-resource.location.

There are two more blocks that we need to pay attention too. The first being default_node_pool block and the second linux_profile.

default_node_pool block lets us define how many nodes we want to run and what type of virtual machines we want to run on our nodes. Its important you pick the right size for your nodes as this can affect cost and performance. You can take a look at what VM sizes azure offers and their use cases over here. node_count tells terraform how many nodes we want our cluster to have. Next we define the VM. Here I'm using and A series VM with 4 gigs of ram and two CPU cores. Lastly we give it a type of "virtual machine scale sets" which basically lets you create a group of auto scaling VM's

The last block we need to look at is linux_profile . This creates a user we can use to ssh into one of our nodes in case something goes wrong. Here we simply pass the block variables.

I intentionally didn't go over all the blocks because most times you don't need to change them and if you do the documentation is quite easy to go through.

Finally lets take a look at as you might have guessed this is were we define all the variables we referenced earlier.

variable "location" {
    type = string
    description = "resource location"
    default = "East US"

variable "kubernetes_version" {
  type = string
  description = "k8's version"
  default = "1.19.6"

variable "admin_user"{
  type = string
  description = "username for linux_profile"
  default = "enderdragon"

variable "ssh_key" {
   description = "ssh_key for admin_user"
Enter fullscreen mode Exit fullscreen mode

to define a variable we use the variable keyword , give it a name and within the curly braces we define what type it is, in this case it's a string, an optional description and a default value, which is also optional.

Now we are almost ready to create our cluster but first we need to generate an ssh key , if you remember we created a variable for it earlier. if you have an ssh key pair you can skip this step

$ ssh-keygen -t rsa -b 4096
Enter fullscreen mode Exit fullscreen mode

you can leave everything as default by pressing enter. Next we export the key into an environment variable.

$ export TF_VAR_ssh_key=$( cat ~/.ssh/

Enter fullscreen mode Exit fullscreen mode

Notice the TF_VAR prefix before the name of the actual variable name. This so Terraform is aware of the environment variable and can make use of it. You should also note that the variable name should correspond to the one in

Before we actually create our infrastructure its always a good idea to see what exactly Terraform would be creating luckily Terraform has a command for that

$ terraform plan
Enter fullscreen mode Exit fullscreen mode

The output should look something like this

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.cluster.azurerm_kubernetes_cluster.aks-cluster will be created
  + resource "azurerm_kubernetes_cluster" "aks-cluster" {
      + dns_prefix              = "terraform-cluster"
      + fqdn                    = (known after apply)
      + id                      = (known after apply)
      + kube_admin_config       = (known after apply)
      + kube_admin_config_raw   = (sensitive value)
      + kube_config             = (known after apply)
      + kube_config_raw         = (sensitive value)
      + kubelet_identity        = (known after apply)
      + kubernetes_version      = "1.19.1"
      + location                = "eastus"
      + name                    = "terraform-cluster"
      + node_resource_group     = (known after apply)
      + private_cluster_enabled = (known after apply)
      + private_fqdn            = (known after apply)
      + private_link_enabled    = (known after apply)
      + resource_group_name     = "kubernetes-resource-group"
      + sku_tier                = "Free"

      + addon_profile {
          + aci_connector_linux {
              + enabled     = (known after apply)
              + subnet_name = (known after apply)

          + azure_policy {
              + enabled = (known after apply)

          + http_application_routing {
              + enabled                            = (known after apply)
              + http_application_routing_zone_name = (known after apply)

          + kube_dashboard {
              + enabled = (known after apply)

          + oms_agent {
              + enabled                    = (known after apply)
              + log_analytics_workspace_id = (known after apply)
              + oms_agent_identity         = (known after apply)

      + auto_scaler_profile {
          + balance_similar_node_groups      = (known after apply)
          + max_graceful_termination_sec     = (known after apply)
          + scale_down_delay_after_add       = (known after apply)
          + scale_down_delay_after_delete    = (known after apply)
          + scale_down_delay_after_failure   = (known after apply)
          + scale_down_unneeded              = (known after apply)
          + scale_down_unready               = (known after apply)
          + scale_down_utilization_threshold = (known after apply)
          + scan_interval                    = (known after apply)

      + default_node_pool {
          + max_pods             = (known after apply)
          + name                 = "default"
          + node_count           = 2
          + orchestrator_version = (known after apply)
          + os_disk_size_gb      = (known after apply)
          + os_disk_type         = "Managed"
          + type                 = "VirtualMachineScaleSets"
          + vm_size              = "Standard_A2_v2"

      + identity {
          + principal_id = (known after apply)
          + tenant_id    = (known after apply)
          + type         = "SystemAssigned"

      + linux_profile {
          + admin_username = "enderdragon"

          + ssh_key {
              + key_data = "jsdksdnjcdkcdomocadcadpadmoOSNSINCDOICECDCWCdacwdcwcwccdscdfvevtbrbrtbevF
CDSCSASACDCDACDCDCdsdsacdq$q@#qfesad== you@probablyyourdesktop"

      + network_profile {
          + dns_service_ip     = (known after apply)
          + docker_bridge_cidr = (known after apply)
          + load_balancer_sku  = "Standard"
          + network_plugin     = "kubenet"
          + network_policy     = (known after apply)
          + outbound_type      = "loadBalancer"
          + pod_cidr           = (known after apply)
          + service_cidr       = (known after apply)

          + load_balancer_profile {
              + effective_outbound_ips    = (known after apply)
              + idle_timeout_in_minutes   = (known after apply)
              + managed_outbound_ip_count = (known after apply)
              + outbound_ip_address_ids   = (known after apply)
              + outbound_ip_prefix_ids    = (known after apply)
              + outbound_ports_allocated  = (known after apply)

      + role_based_access_control {
          + enabled = (known after apply)

          + azure_active_directory {
              + admin_group_object_ids = (known after apply)
              + client_app_id          = (known after apply)
              + managed                = (known after apply)
              + server_app_id          = (known after apply)
              + server_app_secret      = (sensitive value)
              + tenant_id              = (known after apply)

      + windows_profile {
          + admin_password = (sensitive value)
          + admin_username = (known after apply)

  # module.cluster.azurerm_resource_group.aks-resource will be created
  + resource "azurerm_resource_group" "aks-resource" {
      + id       = (known after apply)
      + location = "eastus"
      + name     = "kubernetes-resource-group"

Plan: 2 to add, 0 to change, 0 to destroy.


Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
Enter fullscreen mode Exit fullscreen mode

If every thing looks good we can apply our configuration using:

$ terraform apply
Enter fullscreen mode Exit fullscreen mode

Terraform will prompt you one last time to make sure you want to proceed enter yes and watch the magic happen. Once the resources have been provisioned head over to your azure dashboard a look. You should see something like this:

cluster specs

As you can see, Terraform configured everything we needed to spin up a cluster, and we didn't have to specify everything. Click on terraform-cluster and lets make sure everything looks good.


And there you have it, we deployed a kubernetes cluster with our desired specifications and Terraform all did the heavy lifting.

Once you are done it's as easy as running terraform destroy to tear down all the resources you have just provisioned.

Quick recap

You learnt :

  • Why Infrastructure as code is important is important
  • The basics of HCL(Hashicorp configuration language)
  • How to provision a kubernetes cluster with terraform

If you are wondering where to go from here. Here are somethings you can try.

  • Here we authenticated through the azure CLI but that's not completely ideal. Instead you might want to use a service principal with more streamlined permissions. Check that out over here
  • You should never store your state file in version control as that might contain sensitive information. Instead you can put it in an azure blob store .
  • There are better ways to pass variables to terraform which i did not cover here, but this post on the terraform website should walk you through it nicely.
  • Finally. This article couldn't possibly cover all there is to terraform, so i highly suggest taking a look at the Terraform documentation, it has some boilerplate configuration to get yo u started provisioning resources.

All the code samples used in this tutorial can be found here

Top comments (1)

raphink profile image
Raphaël Pinson

Nice explanation!

If you want a full-fledged AKS cluster, see also which uses Terraform + ArgoCD to automatically deploy all you need!