Day 13: Terraform Data Sources

#aws #terraform

Today marks the Day 3 of 30 Days of AWS Terraform Challenge Initiative by Piyush Sachdev. Today we will do deep dive into the Terraform Data Sources, what exactly is a Data Source and how it will help us in writing terraform code for AWS resources such as EC2, VPC, subnet and so..

Data Source:

Think of Data source like a phone directory with usename and phone number like key and value with an api, so whenever you want any value with the help of API, you can retrieve that using key instead of hardcoding.

In Short, Terraform data sources with AWS allow you to retrieve information about existing AWS resources or external data that can then be referenced within your Terraform configurations.

For Example, While Creating a EC2 instance, we need and AMI id for which we need to go the AMI release page and find the latest AMI ID and then fetch that which is not the best approach for our current projects, So we need to find a way to automatically get that AMI ID while creating EC2 instances without any manual intervention or hardcoding. So for this we will be using a resource in Terraform named "Data Source". Almost all resources in AWS have data Source and you can get their ID and all details for that resource using the Data Source Like ask for linux_2 OS based ami and it will fetch you that AMI ID.

In short, Data sources allow Terraform to read information about existing infrastructure. They:

Don't create, update, or delete resources
Allow you to reference resources managed elsewhere
Enable sharing infrastructure between teams
Are defined with data blocks instead of resource blocks

How to Use AWS Data Sources:

You define a data source using the data block in your Terraform configuration. The syntax is as follows:

data "provider_type" "name" {
  # Configuration settings for filtering or identifying the data source
}

provider_type: Specifies the type of AWS data source (e.g., aws_ami, aws_vpc, aws_s3_bucket).

name: A local name to reference this data source within your configuration or for internal communication.

Configuration Settings: These vary depending on the data source and are used to filter or identify the specific resource you want to retrieve information about. This often includes id, name, or filter blocks with name and values arguments.

Examples between Data vs. Resource Block:

# Resource Block - Terraform MANAGES this
resource "aws_vpc" "my_vpc" {
  cidr_block = "10.0.0.0/16"
}

# Data Block - Terraform READS this
data "aws_vpc" "existing_vpc" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
}

In the above code block, you can see some key differences between resources block and data sources block.

In the resource block, you can see that the Terraform entirely manages that VPC based on our CIDR block, whereas in Data source block, we are just referencing the existing vpc in our AWS account with the name "existing_vpc" and we will be using that while creating other resources.

In filters section, we have given some values telling Terraform to retreive data from the VPC which is matching with the filters provided.

Task:

In this task, we will first create a VPC using terraform and then we will try to use data source to create EC2 instance using ami data source to fetch ami_id first and then try to create EC2 in the VPC block we created initially.

Creation of VPC :

# This simulates infrastructure created by another team
provider "aws" {
  region = "us-east-1"
}

resource "aws_vpc" "shared" {
  cidr_block = "10.0.0.0/16"
  tags = {
    Name = "shared-network-vpc"  
  }
}

resource "aws_subnet" "shared" {
  vpc_id     = aws_vpc.shared.id
  cidr_block = "10.0.1.0/24"
  tags = {
    Name = "shared-primary-subnet"  # ← This tag is important!
  }
}

Initially, we didn't have any VPC with the CIDR block: 10.0.0.0/16, now we will create a VPC with the above CIDR with a tag: shared-network-vpc.

terraform plan

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
  + create

Terraform will perform the following actions:

  # aws_subnet.shared will be created
  + resource "aws_subnet" "shared" {
      + arn                                            = (known after apply)
      + assign_ipv6_address_on_creation                = false
      + availability_zone                              = (known after apply)
      + availability_zone_id                           = (known after apply)
      + cidr_block                                     = "10.0.1.0/24"
      + enable_dns64                                   = false
      + enable_resource_name_dns_a_record_on_launch    = false
      + enable_resource_name_dns_aaaa_record_on_launch = false
      + id                                             = (known after apply)
      + ipv6_cidr_block_association_id                 = (known after apply)
      + ipv6_native                                    = false
      + map_public_ip_on_launch                        = false
      + owner_id                                       = (known after apply)
      + private_dns_hostname_type_on_launch            = (known after apply)
      + region                                         = "us-east-1"
      + tags                                           = {
          + "Name" = "shared-primary-subnet"
        }
      + tags_all                                       = {
          + "Name" = "shared-primary-subnet"
        }
      + vpc_id                                         = (known after apply)
    }

  # aws_vpc.shared will be created
  + resource "aws_vpc" "shared" {
      + arn                                  = (known after apply)
      + cidr_block                           = "10.0.0.0/16"
      + default_network_acl_id               = (known after apply)
      + default_route_table_id               = (known after apply)
      + default_security_group_id            = (known after apply)
      + dhcp_options_id                      = (known after apply)
      + enable_dns_hostnames                 = (known after apply)
      + enable_dns_support                   = true
      + enable_network_address_usage_metrics = (known after apply)
      + id                                   = (known after apply)
      + instance_tenancy                     = "default"
      + ipv6_association_id                  = (known after apply)
      + ipv6_cidr_block                      = (known after apply)
      + ipv6_cidr_block_network_border_group = (known after apply)
      + main_route_table_id                  = (known after apply)
      + owner_id                             = (known after apply)
      + region                               = "us-east-1"
      + tags                                 = {
          + "Name" = "shared-network-vpc"
        }
      + tags_all                             = {
          + "Name" = "shared-network-vpc"
        }
    }

Now we can see that a new VPC named/tagged "shared-network-vpc" is created.

Now we will be creating an EC2 instance, using the ami data source and vpc, subnet data source.

Below are the Data source blocks for VPC and Subnet resources:

# Data source to get the existing VPC
data "aws_vpc" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-network-vpc"]
  }
}

# Data source to get the existing subnet
data "aws_subnet" "shared" {
  filter {
    name   = "tag:Name"
    values = ["shared-primary-subnet"]
  }
  vpc_id = data.aws_vpc.shared.id  # ← Using aws_vpc data source
}

data "aws_vpc" - reads VPC information from AWS
filter - searches for VPC with specific tag
shared - local name to reference this data source
Returns: VPC ID, CIDR block, and other attributes
Searches for subnet with specific tag
vpc_id - narrows search to our VPC

Now we will go through the Data source for AMI which helps us in fetching the AMI_ID for amazon_linux_2 OS.

# Data source for the latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}

most_recent = true - gets latest matching AMI
owners = ["amazon"] - only official Amazon AMIs
Multiple filters for precise matching with tag of names and values matching amzn2 ami values.
Wildcards (*) allow flexible pattern matching

Now in main.tf, we will create an EC2 instance utilizing all the above Data Sources of VPC, Subnet and AMI ID.

resource "aws_instance" "main" {
  ami           = data.aws_ami.amazon_linux_2.id    # AMI - Data source
  instance_type = "t2.micro"
  subnet_id     = data.aws_subnet.shared.id           # Subnet - Data source
  private_ip    = "10.0.1.50"

  tags = {
    Name = "day13-instance"
  }
}

data.aws_ami.amazon_linux_2.id - references AMI data source matching with Linux_2 OS.
data.aws_subnet.shared.id - references subnet data source
Instance will be created in existing infrastructure
Private IP must be within subnet's CIDR range

terraform plan
data.aws_vpc.shared: Reading...
data.aws_ami.amazon_linux_2: Reading...
aws_vpc.shared: Refreshing state... [id=vpc-09527ed20e76d002e]
data.aws_ami.amazon_linux_2: Read complete after 3s [id=ami-0156001f0548e90b1]
data.aws_vpc.shared: Read complete after 3s [id=vpc-09527ed20e76d002e]
data.aws_subnet.shared: Reading...
data.aws_subnet.shared: Read complete after 1s [id=subnet-0e8357d0d5a07c57b]
aws_subnet.shared: Refreshing state... [id=subnet-0e8357d0d5a07c57b]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.main will be created
  + resource "aws_instance" "main" {
      + ami                                  = "ami-0156001f0548e90b1"
      + arn                                  = (known after apply)
      + associate_public_ip_address          = (known after apply)
      + availability_zone                    = (known after apply)
      + disable_api_stop                     = (known after apply)
      + disable_api_termination              = (known after apply)
      + ebs_optimized                        = (known after apply)
      + enable_primary_ipv6                  = (known after apply)
      + force_destroy                        = false
      + get_password_data                    = false
      + host_id                              = (known after apply)
      + host_resource_group_arn              = (known after apply)
      + iam_instance_profile                 = (known after apply)
      + id                                   = (known after apply)
      + instance_initiated_shutdown_behavior = (known after apply)
      + instance_lifecycle                   = (known after apply)
      + instance_state                       = (known after apply)
      + instance_type                        = "t2.micro"
      + ipv6_address_count                   = (known after apply)
      + ipv6_addresses                       = (known after apply)
      + key_name                             = (known after apply)
      + monitoring                           = (known after apply)
      + outpost_arn                          = (known after apply)
      + password_data                        = (known after apply)
      + placement_group                      = (known after apply)
      + placement_group_id                   = (known after apply)
      + placement_partition_number           = (known after apply)
      + primary_network_interface_id         = (known after apply)
      + private_dns                          = (known after apply)
      + private_ip                           = "10.0.1.50"
      + public_dns                           = (known after apply)
      + public_ip                            = (known after apply)
      + region                               = "us-east-1"
      + secondary_private_ips                = (known after apply)
      + security_groups                      = (known after apply)
      + source_dest_check                    = true
      + spot_instance_request_id             = (known after apply)
      + subnet_id                            = "subnet-0e8357d0d5a07c57b"
      + tags                                 = {
          + "Name" = "day13-instance"
        }
      + tags_all                             = {
          + "Name" = "day13-instance"
        }
      + tenancy                              = (known after apply)
      + user_data_base64                     = (known after apply)
      + user_data_replace_on_change          = false
      + vpc_security_group_ids               = (known after apply)

      + capacity_reservation_specification (known after apply)

      + cpu_options (known after apply)

      + ebs_block_device (known after apply)

      + enclave_options (known after apply)

      + ephemeral_block_device (known after apply)

      + instance_market_options (known after apply)

      + maintenance_options (known after apply)

      + metadata_options (known after apply)

      + network_interface (known after apply)

      + primary_network_interface (known after apply)

      + private_dns_name_options (known after apply)

      + root_block_device (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.

In the above terraform plan execution, you could see it is first referencing data sources of ami, vpc and subnet and then it started creating EC2 instance.

After running terraform apply, we can see that the EC2 instance is created.

As everything has been created using Data sources of AMI, VPC and subnet, we can finally delete the resources using terraform destroy.

terraform destroy


Plan: 0 to add, 0 to change, 3 to destroy.
aws_subnet.shared: Destroying... [id=subnet-0e8357d0d5a07c57b]
aws_instance.main: Destroying... [id=i-057687d8f123d3e00]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 10s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 10s elapsed]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 20s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 20s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 30s elapsed]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 30s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 40s elapsed]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 40s elapsed]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 50s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 50s elapsed]
aws_instance.main: Still destroying... [id=i-057687d8f123d3e00, 1m0s elapsed]
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 1m0s elapsed]
aws_instance.main: Destruction complete after 1m10s
aws_subnet.shared: Still destroying... [id=subnet-0e8357d0d5a07c57b, 1m10s elapsed]
aws_subnet.shared: Destruction complete after 1m15s
aws_vpc.shared: Destroying... [id=vpc-09527ed20e76d002e]
aws_vpc.shared: Destruction complete after 1s

Conclusion:

This marks the conclusion of Day 13 of 30 days of Terraform Challenge by Piyush Sachdev and we have deep dived into Terraform Data Sources. We have understood what exactly is a Data source and how it helps us to create resources efficiently.

Below is the Youtube Video for reference: