DEV Community

Cover image for Day 13: Decoupling Your Infrastructure - Mastering Terraform Data Sources
Tran Huynh An Duy (Andy)
Tran Huynh An Duy (Andy)

Posted on

Day 13: Decoupling Your Infrastructure - Mastering Terraform Data Sources

We’ve spent the last few days mastering expressions and functions to make our code reusable. Today, we tackle a critical concept for enterprise environments: Data Sources.

Data Sources allow your Terraform configuration to read information about resources outside of your current configuration—meaning resources that already exist in your AWS environment but were not created by this specific Terraform code. This ability to reference pre-existing components is crucial for decoupling and sharing infrastructure across multiple teams.

Why Data Sources? The Need for Decoupling

When provisioning new infrastructure, you often rely on shared or existing components. For example, if you need to provision an EC2 instance, you need several pieces of external information:

1. AMI ID: The Amazon Machine Image (AMI) is necessary for the instance, but the AMI itself is not stored inside your AWS environment; it's pulled from an external, often open-source, repository. You don't want to hardcode the AMI ID, which changes with new releases; you want the latest one dynamically.

2. Shared VPC/Subnets: In an enterprise setting, infrastructure like Virtual Private Clouds (VPCs) and subnets are often pre-provisioned and shared among development, QA, and DevOps teams. When creating new resources, you must reference these existing network components rather than creating new ones.

Data Sources solve this by fetching these details dynamically, eliminating the need for manual intervention or hardcoding IDs.

Terraform data sources

How Data Sources Work: The Syntax

To use a Data Source, you use the data keyword followed by the resource type (e.g., aws_vpc) and a local name you define:

data "aws_vpc" "vpc_name" {
  // configuration (filters) to find the specific VPC
}
Enter fullscreen mode Exit fullscreen mode

The data source then provides outputs (like ID, CIDR block, etc.) that your resources can reference.

Case Study: Referencing Existing Resources

Here is how we use Data Sources to pull information about an existing VPC, a subnet, and the latest Linux AMI.

1. Finding the Shared VPC and Subnet
Instead of hardcoding the VPC ID, we use filters to look up the default VPC based on its Name tag:

Code Example (VPC Data Source):
data "aws_vpc" "vpc_name" {
  filter {
    name   = "tag:Name"
    values = ["default"] // Assumes the default VPC is tagged 'default' [7]
  }
}

// Data Source for Subnet within the shared VPC
data "aws_subnet" "shared" {
  filter {
    name   = "tag:Name"
    values = ["subnet A"] // Finds the subnet tagged 'subnet A' [8]
  }
  vpc_id = data.aws_vpc.vpc_name.id // Reference the ID found by the VPC data source [8]
}
Enter fullscreen mode Exit fullscreen mode

In this configuration, we successfully filter existing resources in the AWS environment based on tags. We haven't created the VPC or subnet, yet we are correctly referencing the existing ones.

2. Finding the Latest AMI ID

We use the aws_ami data source to fetch the most recent Amazon Linux 2 image:
Code Example (AMI Data Source):

data "aws_ami" "linux2" {
  most_recent = true // Ensures we get the latest release [10]
  owners      = ["amazon"] // Owned by Amazon, not us [10]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-gp2"] // Uses a wildcard filter for the name [10]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Provisioning the EC2 Instance

Finally, we use the outputs from these data sources in our resource definition:
Code Example (EC2 Instance using Data Source Outputs):

resource "aws_instance" "example_instance" {
  instance_type = "t2.micro" 

  // Use the AMI ID retrieved by the data source
  ami           = data.aws_ami.linux2.id 

  // Use the Subnet ID retrieved by the data source
  subnet_id     = data.aws_subnet.shared.id 
}

Enter fullscreen mode Exit fullscreen mode

This demonstrates how Data Sources provide the necessary external IDs (AMI ID, Subnet ID) without hardcoding, allowing the instance to be provisioned correctly using existing, shared infrastructure.


Data Sources are fundamental to creating flexible and maintainable configurations, especially in environments where infrastructure management is shared across multiple teams. @piyushsachdeva

Top comments (0)