A data source is used to access information (data) from outside of your current Terraform configuration. This could be a list of available EC2 AMI1 IDs in AWS, or metadata for an existing resource group in Azure. There are a few more specific use-cases for data sources in Terraform, but since my lessons are focused on what I consider to be most important for the Terraform Associate Certification I will not go into these use-cases2. In fact, this will be a short lesson because in my experience you seldom use data sources.
In this lesson I continue to go through part 8 of the Certified Terraform Associate exam curriculum. This part of the curriculum is outlined below:
| Part | Content | 
|---|---|
| 8 | Read, generate, and modify configuration | 
| (a) | Demonstrate use of variables and outputs | 
| (b) | Describe secure secret injection best practice | 
| (c) | Understand the use of collection and structural types | 
| (d) | Create and differentiate resourceanddataconfiguration | 
| (e) | Use resource addressing and resource parameters to connect resources together | 
| (f) | Use HCL and Terraform functions to write configuration | 
| (g) | Describe built-in dependency management (order of execution based) | 
To be specific: I will cover parts of 8 (d) and (e) in this lesson. I will also briefly touch on the subjects in 8 (f).
Define a data source in HCL
The general format of the data block in HCL looks like this:
data "data_source_type" "local_name" {
    argument1 = <expression>
    argument2 = <expression>
    argument3 = <expression>
    ...
}
This has the same format as the resource block, with two labels.
- The first label defines the data source type, and it depends on what data sources are exposed from the providers you use.
- The second label is the local name of this data source, and you will use it to refer to this data source from other parts of your Terraform configuration.
The list of arguments is dependent on what data source type you are using, see the documentation for your provider.
Using data sources
I think of data sources as existing in two different varieties:
- Data sources for existing resources
- Data sources for lists of alternatives
I will show examples of both and explain what I mean.
Use a data source to access existing resources
When we start using Terraform chances are we already have a lot of cloud resources and other types of resources deployed all over the place. These could be imported into our Terraform state3, but that is an advanced operation and sometimes not practical. We could hard-code values for our existing resources, but what if something about them changes? Then we need to update our Terraform configurations to account for those changes. A data source can help us here.
Let us look at an example of a data source from the azurrm provider:
data "azurerm_resource_group" "my_existing_rg" {
  name = "rg-existing-group"
}
This is a simple data source of the type azurerm_resource_group. It has a single required argument, name. Using this data source we can get a handle for the resource group, and use references to it just like if it was created using a resource block. In a different part of my Terraform configuration I could have an expression like this:
azurerm_resource_group.my_existing_rg.location
This would access the location property of the existing resource group.
Use a data source to read lists of alternatives
I thought about what I should call this variety of data sources but could not come up with a better name than ... to read lists of alternatives. So let's just go with it!
AWS has a number or regions. When you configure the AWS provider for Terraform you select one of these regions. Each region has a number of availability zones, which are data centers where you could create your resources. How many availability zones are there? You might have learned the number by heart, but is there a better way? This is where data sources can help you!
Let's look at an example:
data "aws_availability_zones" "available" {
  state = "available"
}
This data source is of the aws_availability_zones type. In the arguments I specify that I want the availability zones that have the state of available. What I get back is a list containing the available availability zones! I can access them like this:
data.aws_availability_zones.available.names[0]
data.aws_availability_zones.available.names[1]
data.aws_availability_zones.available.names[2]
...
To find out how many availability zones there are I could use the length function:
length( data.aws_availability_zones.available.names )
Summary
Data sources are useful sometimes, and in this lesson I showed two of the most common use-cases for them. As I mentioned before I have found that data sources are not used a lot, especially compared to resources. I don't think I had a single question concerning specifics of data sources on the certification exam, but it is of course a good idea to be ready for them anyway!
In summary we look at:
- How to define a data source in HCL using the datablock.
- How we can use data sources to access information from existing resources.
- How we can use data sources to obtain lists of alternatives, e.g. lists of availability zones in an AWS region, or lists of available subscriptions in Azure.
- 
EC2 stands for Elastic Compute Cloud and it is the virtual machines of the AWS cloud, and an AMI is an Amazon Machine Image, it is an image that you create virtual machines from. ↩ 
- 
Read the official documentation for data sources at https://developer.hashicorp.com/terraform/language/data-sources. ↩ 
- 
More about Terraform state in a future lesson. ↩ 
 

 
    
Oldest comments (0)