When working with Terraform, one of the most powerful concepts you’ll use is data sources. These allow you to fetch and reference existing resources rather than creating new ones. In real-world cloud environments—especially in large organizations—you often work with infrastructure that already exists, such as shared VPCs, pre-defined subnets, approved AMIs, or centrally managed security groups.
Instead of hardcoding these values or manually copying IDs, Terraform’s data sources give you a clean, dynamic, and error-free way to retrieve them.
This blog explains what Terraform data sources are, why they matter, and how to use them with practical AWS examples, including VPC, Subnet, and AMI lookups.
What Are Terraform Data Sources?
A data source in Terraform is a read-only lookup to an existing resource. Instead of creating something new, Terraform queries the cloud provider (AWS in this case) and returns information that can be used inside your configuration.
You use data sources when:
- A resource is already created (shared VPCs, existing AMIs).
- Another team manages the resource (network or security team).
- Your Terraform module should not own or recreate the resource.
- You need the latest or filtered version of something (latest AMI).
- You want to avoid hardcoding identifiers such as IDs or ARNs.
This leads to cleaner, more dynamic infrastructure code.
Example 1: Fetching VPC ID Using a Data Source
In many organizations, networking is centralized. The VPC already exists, and your Terraform code will only deploy application resources inside it.
With the following data source, we fetch a VPC by matching its Name tag:
data "aws_vpc" "vpc_name" {
filter {
name = "tag:Name"
values = ["default-vpc"]
}
}
Here’s what this does:
- Searches for a VPC where the tag
Name = default-vpc - Returns the VPC’s ID
- Allows you to use the ID later using
data.aws_vpc.vpc_name.id
This avoids the need to manually capture or maintain the VPC ID.
Example 2: Fetching Subnet ID from a Specific VPC
Once the VPC is retrieved, you often need a subnet inside it.
This subnet might also be managed by another team, or it may vary by environment.
data "aws_subnet" "shared_subnet" {
filter {
name = "tag:Name"
values = ["subnet-a"]
}
vpc_id = data.aws_vpc.vpc_name.id
}
Important details:
- It fetches a subnet with
Name = subnet-a - It ensures the subnet belongs to the VPC we fetched earlier
- It returns a single subnet ID
This helps Terraform deploy EC2 or Lambda resources into the correct shared subnet without hardcoding anything.
Example 3: Fetching the Latest Amazon Linux 2 AMI
AMI IDs change frequently across regions, and using outdated or hardcoded AMIs leads to deployment failures.
With a data source, Terraform automatically selects the most recent approved AMI:
data "aws_ami" "linux2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
filter {
name = "architecture"
values = ["x86_64"]
}
}
This configuration ensures:
- You get the latest Amazon Linux 2 AMI
- Only official images from the Amazon account are selected
- The AMI matches the required architecture and virtualization type
This is a perfect example of where data sources solve a real problem—keeping images up to date.
Using the Data Sources to Launch an EC2 Instance
Once the VPC, Subnet, and AMI are fetched, we can provision an EC2 instance using those dynamic values:
resource "aws_instance" "ec2-one" {
ami = data.aws_ami.linux2.id
instance_type = var.instance_type
subnet_id = data.aws_subnet.shared_subnet.id
tags = var.tags
}
This resource:
- Uses the AMI from the data source
- Places the EC2 inside the shared subnet
- Applies the user-provided instance type and tags
The result is a reusable, environment-independent, and future-proof Terraform configuration.
Why Data Sources Matter
1. Avoids Hardcoding
No need to store IDs, ARNs, AMIs manually.
2. Enables Multi-Team, Multi-Account Use
Teams can reference central resources without needing permissions to modify them.
3. Improves Reusability
Modules become generic and work across dev, test, and prod seamlessly.
4. Supports Dynamic and Automated Infrastructure
Fetching latest AMIs ensures security and consistency.
5. Reduces Human Error
Manual copy-paste of IDs is error-prone; data sources eliminate this.
Conclusion
Terraform data sources are essential for building dynamic, secure, and production-ready infrastructure. They allow your code to interact with existing resources in AWS—like VPCs, subnets, AMIs, and more—without recreating them. The examples above represent real-world scenarios where infrastructure teams rely heavily on these patterns, especially in shared network environments.
By using data sources effectively, your Terraform setup becomes more scalable, maintainable, and aligned with best DevOps practices.
Top comments (0)