Agustín Abero

Posted on Mar 9 • Edited on Mar 16

Mapping the Territory

#ai #devops #infrastructureascode #graph

We thought the "Multi-Repo Verse" would save us from spaghetti code. Instead, we traded a tangled web of single files for a labyrinth of repositories, modules, and remote state. The boundaries were supposed to give us clarity, but suddenly, understanding a single depends_on rule became an archaeological dig through half a dozen browser tabs. The code was perfectly structured, but the reality was invisible.

A quick recap

In Part 1 of the "Making Invisible Infrastructure Visible" series, I talked about "The Fog of Code": that overwhelming anxiety of flying blind when making changes to modern, distributed infrastructure. Tab Fatigue is real. The fear of the Butterfly Effect is real.

To stop flying blind, I needed a map. Not just a text search across repositories, but a true semantic map of the territory. I needed to turn static syntax into a dynamic, queryable reality.

Here is exactly how I built the foundation of Infra-Graph to do just that.

The harsh reality of parsing HCL

When I first sat down to solve this, the plan felt straightforward: parse the .tf files, find the references, and draw the lines.

Then I actually looked at the Abstract Syntax Tree (AST).

You open a file and see a seemingly simple string: depends_on = [aws_security_group.web.id]. But to a machine, that’s just text. To build a visualization tool, you have to map every variable, module output, and resource attribute into a rigid, structured schema.

I started using python-hcl2 to turn .tf files into Python dictionaries. For simple files, it worked beautifully. But real-world infrastructure isn't simple. When dealing with nested module calls, interpolations, and dynamic blocks, regex fails spectacularly. The AST is a deeply nested, unforgiving nightmare of dictionaries and lists.

To give you an idea of what I was looking at, a simple 6-line resource block like this:

resource "aws_lb" "main" {
  name               = "${var.project_name}-alb-${var.environment}"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = [aws_subnet.public.id]
}

...becomes this deeply nested list-of-dictionaries format in python-hcl2:

{
  "resource": [
    {
      "aws_lb": {
        "main": {
          "name": "${var.project_name}-alb-${var.environment}",
          "internal": false,
          "load_balancer_type": "application",
          "security_groups": [
            "${aws_security_group.alb.id}"
          ],
          "subnets": [
            "${aws_subnet.public.id}"
          ]
        }
      }
    }
  ]
}

The realization hit hard: translating disjointed, human-written infrastructure code into a rigid database schema wasn't going to be a quick hack. It was going to be technical trench warfare.

Designing the Schema

To map the territory, I needed a database that understood relationships natively. Relational databases, with their rigid tables and expensive JOINs, were the wrong tool for a system defined by its interconnectedness. I chose Neo4j because it is built from the ground up for graph traversal, allowing me to naturally model and query recursive dependencies like depends_on chains without writing thousands of lines of SQL CTEs.

I designed the graph schema around three core Neo4j entities:

Nodes: These represent the blocks of HCL.
- A Resource (e.g., aws_instance) uniquely identified by its file_path and logical_name.
- A Module uniquely identified by its module_id.
- Variable and Provider nodes.
Properties: The metadata attached to nodes, housing the actual configuration (names, files, line numbers, and AWS-specific attributes).
Relationships: This is the wiring:
- [:DEPENDS_ON] connects explicit resource references.
- [:USES_VAR] connects resources and modules to the variables they consume.
- [:CONTAINS] links modules to their internal components.

// The explicit wiring of the Multi-Repo Verse
(:Module {id: "vpc"})-[:CONTAINS]->(:Resource {type: "aws_subnet", name: "main"})
(:Resource {type: "aws_subnet"})-[:USES_VAR]->(:Variable {name: "region"})
(:Resource {type: "aws_instance"})-[:DEPENDS_ON]->(:Resource {type: "aws_security_group"})
(:Resource {type: "aws_instance"})-[:PROVIDED_BY]->(:Provider {name: "aws"})

It looks clean on a whiteboard. In practice, it required an intense normalization logic (an internal mapping script) to flatten the chaotic HCL dictionaries into clean, standardized dataclasses before they ever touched the database.

The Mechanical Grind

Populating the graph brought its own set of friction. It wasn't enough to just CREATE nodes. If you run a scan twice, you don't want a duplicated infrastructure graph.

Writing the Cypher ingestion queries was an exercise in idempotency. I lived and breathed the MERGE clause, ensuring the graph updated existing nodes rather than cloning them. I batched transactions to handle hundreds of resources without locking up the database.

MATCH (source:Resource {id: $sid})
MATCH (target:Resource {type: $ttype, name: $tname})
MERGE (source)-[:DEPENDS_ON]->(target)

It was a mechanical, unglamorous grind. Traversing the ASTs, writing the parsing logic, and meticulously wiring the explicit pointers across files felt farther from "saving DevOps" than I had hoped. I was just moving data from text files into a graph.

Connecting the Dots

And then came the moment that made it all worth it.

After days of wrestling with dictionaries and Cypher queries, I wrapped the Neo4j database with a FastAPI backend and built an interactive Angular frontend using D3.js.

I pointed the parser at a sprawling, complex Terraform directory, the kind that normally requires six open editor windows to mentally model, and initiated the first full infrastructure scan.

When the web interface finally rendered, the result was immediate. The unreadable chaos of HCL text files organized itself into a clear visual model. The nodes snapped into a physics-based, force-directed graph. Solid lines perfectly connected the explicit depends_on rules, variable references, and module outputs.

Selecting a node instantly populated the sidebar with its exact configuration, pulling its properties directly from the local Neo4j instance.

The "Tab Fatigue" vanished. The sprawling "Multi-Repo Verse" was no longer an invisible labyrinth; it was a single, zoomable UI. I could trace an interconnected path from an API Gateway to a subnet, without opening a single text file.

I had mapped the territory.

But as I stared at the perfect structural map, I realized something was missing. The map only showed what was written. It didn't show what it was meant. It didn't connect a security group port 80 to the load balancer that implicitly relied on it. It was strictly explicit.

Next up in Part 3: how I integrated local AI to discover the hidden truth.

Making Invisible Infrastructure Visible series index

Top comments (1)

klement Gunndu • Mar 10

The python-hcl2 AST pain is real — that nested list-of-dicts format for a simple resource block is brutal to traverse. Using Neo4j to make those implicit relationships queryable is the right call.