Hi, I’m the founder of Anyshift. We’ve developed a solution to handle Terraform downstream dependencies across mono and multi-repositories without relying on additional frameworks. Instead, we leverage the rich data in Terraform State Files and a graph-based approach to solve these challenges. Here’s how we did it.
1. The Challenges We Wanted to Solve
Managing Terraform dependencies often involves three key challenges:
- Hardcoded values: These make configurations brittle and difficult to scale.
- Remote state dependencies: Keeping track of changes across multiple states can be error-prone.
- Intricate modules: Handling public and private modules across repositories adds complexity.
We realized that the data needed to address these issues was already available in Terraform State Files, enabling us to manage dependencies without introducing additional frameworks.
2. Key Principles Behind Our Approach
Our approach was guided by three core principles:
2.1 Infrastructure is a Graph
Infrastructure is inherently interconnected. Using Neo4j, a graph database, we modeled the relationships between resources, modules, and states.
2.2 All Data Is in the Cloud and Code
By parsing Terraform code and state files, we could surface the complete chain of dependencies without adding unnecessary overhead.
2.3 Build a Digital Twin of Your Infrastructure
We created a "digital twin" that combines information from Terraform code, state files, and the cloud. This unified view allowed us to catch and prevent issues early.
3. Our Framework-Free Solution
Our solution works through two key steps:
Create a digital twin of the infrastructure
A graph captures the relationships between IaC code and deployed cloud resources.Query the graph for every PR
Using Cypher, Neo4j’s query language, we retrieve downstream dependencies and surface the impact of changes directly in the PR.
3.1 Leveraging Terraform State Files
Terraform state files contain more than just the representation of deployed infrastructure. They include rich metadata such as:
- Resource types
- Unique identifiers
- Relationships between modules and their resources
By parsing these state files, we extracted key insights across repositories and environments. They act as a bridge between code-defined intentions and cloud-deployed realities.
3.2 Building the Cloud-to-Code Graph with Neo4j
We structured our graph as:
- Nodes: Representing infrastructure resources like EC2 instances, VPCs, or Security Groups.
- Relationships: Capturing interactions like "CONNECTED_TO" or "IN_REGION."
For example, an EC2 instance (node) may be connected to a Security Group (another node). This graph provided a precise and actionable representation of infrastructure.
3.3 Reconciling Data Across Sources
To ensure accuracy, we reconciled:
- Terraform code: Capturing intended resource configurations.
- Terraform state files: Reflecting deployed resources.
- Cloud environments: Verifying the current state of resources.
We labeled nodes to differentiate between these sources (e.g., TF_CODE
, TF_STATE
) to provide a clear view of intent vs. reality.
4. Querying the Graph to Manage Dependencies
We used the graph to proactively manage downstream dependencies during pull requests.
4.1 Step 1: Make a Change
For example, expanding the CIDR range of a VPC.
4.2 Step 2: Query the Graph
Use Cypher to identify downstream resources affected by the change.
Example: Expanding the CIDR range may impact 2 ECS instances and 1 security group.
4.3 Step 3: Surface Results in the PR
Display the affected resources directly in the pull request, enabling proactive resolution of conflicts before merging.
5. Current Limitations
While the solution has been effective, there are a few limitations:
Cypher Query Flexibility
The power of the solution depends on how well we define the queries. We’re continually refining them to handle more use cases.Terraform-Specific
Currently, the solution works only with Terraform. However, the concept could potentially be extended to other IaC frameworks like Pulumi.
Conclusion
By leveraging Terraform State Files and Neo4j, we’ve created a robust, framework-free solution for managing Terraform downstream dependencies. This approach bridges the gap between code-defined intentions and cloud-deployed realities, enabling teams to catch and resolve issues earlier in the development process.
Top comments (0)