Terraform Data Source (AWS)

#devops #terraform #awschallenge #cloud

When working with Terraform, one of the most powerful concepts you’ll use is data sources. These allow you to fetch and reference existing resources rather than creating new ones. In real-world cloud environments—especially in large organizations—you often work with infrastructure that already exists, such as shared VPCs, pre-defined subnets, approved AMIs, or centrally managed security groups.

Instead of hardcoding these values or manually copying IDs, Terraform’s data sources give you a clean, dynamic, and error-free way to retrieve them.

This blog explains what Terraform data sources are, why they matter, and how to use them with practical AWS examples, including VPC, Subnet, and AMI lookups.

What Are Terraform Data Sources?

A data source in Terraform is a read-only lookup to an existing resource. Instead of creating something new, Terraform queries the cloud provider (AWS in this case) and returns information that can be used inside your configuration.

You use data sources when:

A resource is already created (shared VPCs, existing AMIs).
Another team manages the resource (network or security team).
Your Terraform module should not own or recreate the resource.
You need the latest or filtered version of something (latest AMI).
You want to avoid hardcoding identifiers such as IDs or ARNs.

This leads to cleaner, more dynamic infrastructure code.

Example 1: Fetching VPC ID Using a Data Source

In many organizations, networking is centralized. The VPC already exists, and your Terraform code will only deploy application resources inside it.

With the following data source, we fetch a VPC by matching its Name tag:

data "aws_vpc" "vpc_name" {
  filter {
    name   = "tag:Name"
    values = ["default-vpc"]
  }
}

Here’s what this does:

Searches for a VPC where the tag Name = default-vpc
Returns the VPC’s ID
Allows you to use the ID later using data.aws_vpc.vpc_name.id

This avoids the need to manually capture or maintain the VPC ID.

Example 2: Fetching Subnet ID from a Specific VPC

Once the VPC is retrieved, you often need a subnet inside it.
This subnet might also be managed by another team, or it may vary by environment.

data "aws_subnet" "shared_subnet" {
  filter {
    name   = "tag:Name"
    values = ["subnet-a"]
  }
  vpc_id = data.aws_vpc.vpc_name.id
}

Important details:

It fetches a subnet with Name = subnet-a
It ensures the subnet belongs to the VPC we fetched earlier
It returns a single subnet ID

This helps Terraform deploy EC2 or Lambda resources into the correct shared subnet without hardcoding anything.

Example 3: Fetching the Latest Amazon Linux 2 AMI

AMI IDs change frequently across regions, and using outdated or hardcoded AMIs leads to deployment failures.
With a data source, Terraform automatically selects the most recent approved AMI:

data "aws_ami" "linux2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }

  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }

  filter {
    name   = "architecture"
    values = ["x86_64"]
  }
}

This configuration ensures:

You get the latest Amazon Linux 2 AMI
Only official images from the Amazon account are selected
The AMI matches the required architecture and virtualization type

This is a perfect example of where data sources solve a real problem—keeping images up to date.

Using the Data Sources to Launch an EC2 Instance

Once the VPC, Subnet, and AMI are fetched, we can provision an EC2 instance using those dynamic values:

resource "aws_instance" "ec2-one" {
  ami           = data.aws_ami.linux2.id
  instance_type = var.instance_type
  subnet_id     = data.aws_subnet.shared_subnet.id
  tags          = var.tags
}

This resource:

Uses the AMI from the data source
Places the EC2 inside the shared subnet
Applies the user-provided instance type and tags

The result is a reusable, environment-independent, and future-proof Terraform configuration.

Why Data Sources Matter

1. Avoids Hardcoding

No need to store IDs, ARNs, AMIs manually.

2. Enables Multi-Team, Multi-Account Use

Teams can reference central resources without needing permissions to modify them.

3. Improves Reusability

Modules become generic and work across dev, test, and prod seamlessly.

4. Supports Dynamic and Automated Infrastructure

Fetching latest AMIs ensures security and consistency.

5. Reduces Human Error

Manual copy-paste of IDs is error-prone; data sources eliminate this.

Conclusion

Terraform data sources are essential for building dynamic, secure, and production-ready infrastructure. They allow your code to interact with existing resources in AWS—like VPCs, subnets, AMIs, and more—without recreating them. The examples above represent real-world scenarios where infrastructure teams rely heavily on these patterns, especially in shared network environments.

By using data sources effectively, your Terraform setup becomes more scalable, maintainable, and aligned with best DevOps practices.