Panchanan Panigrahi

Posted on Sep 21, 2024 • Edited on Sep 22, 2024

How To Use Terraform Data Source

#terraform #devops #beginners #aws

What is a Data Source in Terraform?
How Data Sources Work
What we will cover in this project:
Conclusion

Terraform, developed by HashiCorp, is a powerful Infrastructure as Code (IaC) tool that allows users to define, provision, and manage cloud infrastructure across multiple providers such as AWS, Azure, and Google Cloud. By using declarative configuration files, Terraform enables the automation of infrastructure, making it easily version-controlled and shared across teams. This ensures consistency, scalability, and reliability across different environments.

A key concept in Terraform is data sources, which play a vital role in enhancing the accuracy and flexibility of your infrastructure management. In this post, we'll explore what data sources are, how they work, and why they are important for building adaptive, maintainable, and efficient infrastructure.

What is a Data Source in Terraform?

A data source in Terraform is a way to query and retrieve information about existing resources that have already been created, either by Terraform itself or by other means, such as cloud providers. Instead of creating new resources, data sources allow you to reference and incorporate external data into your Terraform configuration.

Data sources provide a powerful mechanism to:

Integrate existing infrastructure: Leverage resources managed outside Terraform (e.g., manually provisioned databases or networks) without recreating them.
Enhance modularity: Data sources enable Terraform modules to dynamically adapt to different environments by fetching relevant information, such as an existing Virtual Private Cloud (VPC) ID or security group rules.
Increase efficiency: Rather than hardcoding values like resource IDs or manually looking up resource properties, data sources allow Terraform to dynamically retrieve and use the most up-to-date data.

How Data Sources Work

In Terraform, a data source is defined using the data block. When executed, Terraform queries the specified provider for the requested data, retrieves it, and makes it available within the configuration.

Syntax of data block:



data "<PROVIDER>_<RESOURCE_TYPE>" "<NAME>" {
  # Configuration arguments
}

Syntax for reference to this data block resource attribute:



data.<provider>_<resource_type>.<name>.<attribute>

These things may look overwhelming to you, so let's understand data source through a hands-on example project.

What we will cover in this project:

In this project, we’ll explore how to leverage Terraform Data Sources to retrieve and use existing AWS resources, specifically the default VPC and its associated subnets. We will create an Ubuntu EC2 instance, install Nginx, and associate the instance with the default VPC and a specific subnet using Terraform’s data sources. By doing this, you’ll gain a clear understanding of how data sources work and how they can make your Terraform configurations more dynamic and flexible.

AWS provides a default VPC and subnets in each region, which are ready to use. These resources are often sufficient for basic projects. If you don’t already have a default VPC in your AWS environment, follow these steps to create one:

Navigate to the VPC section of the AWS Console.
On the left-hand menu, click Your VPCs.
Click on the Actions dropdown menu.
Select Create Default VPC.

This will automatically create the default VPC and associated subnets, which you can then use in your Terraform configuration.

If you get stuck at any point, you can refer to the code examples and configurations in my GitHub repo for this blog: Terraform_Data_source.

Now create any directory and a file called main.tf inside that. Add the following code snippet to this main.tf file.

Create Terraform configuration files:

In the first step, we have to tell terraform that we will be deploying infrastructure on AWS. We can do this by configuring the AWS cloud provider plugin.



terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.56"
    }
  }
}

This configuration tells Terraform to use the AWS provider and ensures compatibility with Terraform version 1.0 or higher. The provider version is locked to maintain stability and prevent unexpected updates.

Configure Terraform AWS provider block:

The next step is to configure the AWS provider block, which accepts various config parameters. We will start by specifying the region to deploy the infra in, us-east-1.



provider "aws" {
  region = "us-east-1"
}

Create Data Block for Default VPC:



data "aws_vpc" "default" {
  default = true
}

By setting default = true, Terraform automatically fetches the default VPC in the current region, making it easier to launch resources without manually specifying VPC details. This ensures flexibility and reduces the chance of errors when managing infrastructure across different environments.

Create Data Block for Default Subnet:

The following Terraform code retrieves details of an existing subnet in AWS:



data "aws_subnet" "default" {
  vpc_id = data.aws_vpc.default.id

  filter {
    name   = "availability-zone"
    values = ["us-east-1a"]
  }
}

Explanation:

Data Source: The code uses the aws_subnet data source to fetch details about an existing subnet in AWS.
VPC Association: The vpc_id attribute ties the subnet to a specific VPC using data.aws_vpc.default.id This ensures that the retrieved subnet is part of the correct VPC, maintaining necessary network isolation and connectivity for your resources.
Availability Zone Filter: Filters to retrieve subnets specifically in the us-east-1a availability zone, optimizing resource placement.

Create a security group resource inside Default VPC:



resource "aws_security_group" "allow_ssh_http_https" {
  vpc_id = data.aws_vpc.default.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "allow-ssh-http"
  }
}

Explanation:

Security Group Creation: The code defines an AWS security group named allow_ssh_http_https, linked to a default VPC using vpc_id = data.aws_vpc.default.id, ensuring it operates within the intended network.
Ingress Rules: It includes two ingress rules: one allowing SSH on port 22 and another for HTTP on port 80, both accessible from all IP addresses (0.0.0.0/0), enabling secure connections and web traffic.
Egress Rule: The egress rule allows all outbound traffic with the protocol set to -1 and 0.0.0.0/0 as the CIDR block, permitting resources to communicate freely with external networks.

Create and store SSH key pair using terraform:

To enable secure access to our AWS resources, we'll generate an SSH key pair. This key pair will be used for accessing instances securely:



resource "tls_private_key" "ssh_key" {
  algorithm = "RSA"
  rsa_bits  = 4096
}

resource "local_file" "private_key" {
  content  = tls_private_key.ssh_key.private_key_pem
  filename = "./.ssh/terraform_rsa"
}

resource "local_file" "public_key" {
  content  = tls_private_key.ssh_key.public_key_openssh
  filename = "./.ssh/terraform_rsa.pub"
}

This configuration generates an RSA key pair with a 4096-bit key length. The private and public keys are then saved to files in the .ssh directory, ready for use in connecting to our AWS instances.

Creating AWS key pair using our SSH public key:

Next, we'll create an AWS key pair using the public SSH key we generated:



resource "aws_key_pair" "deployer" {
  key_name   = "ubuntu_ssh_key"
  public_key = tls_private_key.ssh_key.public_key_openssh
}

This resource uploads the public key to AWS, allowing you to securely access your EC2 instances using the corresponding private key.

Create an EC2 Instance(Ubuntu) and Install Nginx inside Default VPC:

To provision an EC2 instance with the required configuration, we can define the following resource in Terraform:



resource "aws_instance" "ubuntu_instance" {
  ami                         = "ami-0a0e5d9c7acc336f1"
  instance_type               = "t2.micro"
  subnet_id                   = data.aws_subnet.default.id
  vpc_security_group_ids      = [aws_security_group.allow_ssh_http_https.id]
  key_name                    = aws_key_pair.deployer.key_name
  associate_public_ip_address = true

  user_data = <<-EOF
              #!/bin/bash
              sudo apt update -y
              sudo apt install -y nginx
              echo "<h1>Hello From Ubuntu EC2 Instance!!!</h1>" | sudo tee /var/www/html/index.html
              sudo systemctl restart nginx
              EOF

  tags = {
    Name = "ubuntu-instance"
  }
}

Explanation:

Instance Configuration:
- AMI and Type: The instance uses a specified Amazon Machine Image (AMI) for Ubuntu (ami-0a0e5d9c7acc336f1) and is of type t2.micro, making it suitable for low-cost and lightweight applications.
Networking Setup:
- Subnet and Security Group: The instance is launched in a subnet identified by data.aws_subnet.default.id, and it’s associated with the previously defined security group (aws_security_group.allow_ssh_http_https.id), ensuring proper network access.
Key Pair and Public IP:
- SSH Access: It uses a specified key pair (aws_key_pair.deployer.key_name) for SSH access and is configured to have a public IP address, allowing external connections.
User Data Script:
- Initial Setup: The user_data block contains a Bash script that runs on instance startup. It updates the package manager, installs NGINX, creates a simple HTML file in the web server directory, and restarts NGINX to display a greeting message.
Resource Tagging:
- Instance Naming: The instance is tagged with the name ubuntu-instance, which helps in identifying the resource in the AWS management console.

Create Output Variables:

The following Terraform code defines outputs to display important information after provisioning resources:



# Output the Public IPs
output "ubuntu_instance_public_ip" {
  value = aws_instance.ubuntu_instance.public_ip
}

# Output VPC CIDR Block
output "vpc_cidr_block" {
  value       = data.aws_vpc.default.cidr_block
  description = "The CIDR block of the default VPC"
}

# Output Subnet CIDR Block
output "subnet_cidr_block" {
  value       = data.aws_subnet.default.cidr_block
  description = "The CIDR block of the default subnet"
}

Key Points:

Public IP of the EC2 Instance:
- The first output block, ubuntu_instance_public_ip, captures the public IP address of the newly created EC2 instance. This information is crucial for accessing the instance over the internet.
VPC CIDR Block:
- The second output, vpc_cidr_block, retrieves and displays the CIDR block of the default VPC. This block helps in understanding the IP address range used by the VPC, facilitating network management.
Subnet CIDR Block:
- The third output, subnet_cidr_block, provides the CIDR block of the default subnet. Knowing this information is important for managing subnets and ensuring that resources are correctly addressed within the network.

Initialize your Terraform Configuration:

Once you have your main.tf, the next step is to initialize your Terraform environment. Run the following command in your project directory:



terraform init

This command initializes the working directory containing your Terraform configuration files. It downloads the necessary provider plugins, sets up the backend, and prepares the environment for future Terraform operations.

Plan the Infrastructure Changes:

To preview the actions Terraform will take without actually applying any changes, run:



terraform plan

This command generates an execution plan, which shows you what resources will be created, modified, or destroyed. It helps you ensure that everything is configured correctly before applying any changes.

Apply the Configuration:

Once we verify the planned changes. We will deploy the changes to AWS via the terraform apply command.

Confirm the changes by typing “yes”.

Awesome! You just created your Ubuntu EC2 instance via Terraform.

Check through AWS UI:

Navigate to the AWS Management Console to verify your instance and other resources. You can now view the public IP address and other details directly in the console.

Access Through Port 80 in Your Browser:

Open your web browser and enter http://your-public-ip:80 in the address bar, replacing your-public-ip with the EC2 instance's public IP. You should see your "Hello From Ubuntu EC2 Instance!!!" message.

Destroy the Infrastructure:

If you want to tear down the infrastructure you created, use the terraform destroy command

Confirm the changes by typing “yes”.

Delete all the resources defined in your configuration, ensuring a clean removal of everything Terraform created.

Conclusion

In this project, we used Terraform to provision an Ubuntu EC2 instance in AWS’s default VPC, demonstrating the value of data sources. By retrieving existing resources, we created a flexible infrastructure setup. We configured the instance, installed Nginx, and learned to output important resource information. This exercise underscores Terraform's efficiency in automating cloud infrastructure management.

Stay tuned for our upcoming blogs where we’ll dive deeper into advanced Terraform concepts, including modules, remote backends, state management, and more. We’ll explore how these features can further enhance your infrastructure management and automation practices.

Top comments (2)

Kudzai Murimi • Sep 21 '24

Thanks for sharing with the community, keep up the good work

Panchanan Panigrahi • Sep 21 '24

Thank you! I really appreciate your kind words. I'm glad to contribute to the community and will keep working hard!

DEV Community

How To Use Terraform Data Source

What is a Data Source in Terraform?

How Data Sources Work

What we will cover in this project:

Create Terraform configuration files:

Configure Terraform AWS provider block:

Create Data Block for Default VPC:

Create Data Block for Default Subnet:

Explanation:

Create a security group resource inside Default VPC:

Explanation:

Create and store SSH key pair using terraform:

Creating AWS key pair using our SSH public key:

Create an EC2 Instance(Ubuntu) and Install Nginx inside Default VPC:

Explanation:

Create Output Variables:

Key Points:

Initialize your Terraform Configuration:

Plan the Infrastructure Changes:

Apply the Configuration:

Check through AWS UI:

Access Through Port 80 in Your Browser:

Destroy the Infrastructure:

Conclusion

Top comments (2)

Read next

Debugging HTTPS localhost: httponly cookie issues

Fixing Z-Axis Character Jitter: A Practical Guide

Fixing '@layer utilities...' Tailwind Error: A Quick Guide

Host a static website on AWS: A detailed step-by-step guide