The Oracle

#ai #devops #infrastructureascode #llm

I thought I was done. After days of wrestling with Neo4j and flattening deeply nested Terraform ASTs, I finally had a beautiful, physics-based graph of my entire infrastructure. It was a masterpiece of explicit dependencies. But when I looked at the graph and asked myself a simple question: Is my database actually exposed to the internet? My masterpiece couldn't show me the answer. It just stared silently back at me, rendering explicit syntax while missing the semantic truth.

I had built a perfect map of my code, but code is not reality.

A quick recap

In Part 1, I talked about that overwhelming anxiety of flying blind when making changes to modern, distributed infrastructure. Tab Fatigue is real. The fear of the Butterfly Effect is real. To stop flying blind, I needed a map. In Part 2, I showed you how I built that map. I created a Terraform parser that flattens deeply nested ASTs into a clean, normalized JSON format. I then imported that data into Neo4j, creating a physics-based graph where every resource is a node and every dependency is an edge and built a D3.js frontend that renders this graph in real-time, allowing me to visually trace connections across repositories and services.

It was beautiful. It was explicit. And it was incomplete.

The Limits of the "Perfect" Graph

My Neo4j graph meticulously plotted every depends_on block and every explicitly passed variable. If an ECS task definition referenced a specific Subnet ID, the D3.js frontend drew a solid, satisfying line between them. But modern infrastructure is defined just as much by what isn't explicitly linked.

When I configure a Security Group to allow inbound traffic on port 80, I am implicitly connecting it to a Load Balancer or a web server. To a human engineer, that dependency is obvious:

resource "aws_security_group" "alb" {
  name        = "${var.project_name}-alb-sg"
  description = "Controls access to the ALB"
  vpc_id      = aws_vpc.main.id

  ingress {
    protocol    = "tcp"
    from_port   = 80
    to_port     = 80
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    protocol    = "-1"
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_lb" "main" {
  name               = "${var.project_name}-alb-${var.environment}"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = [aws_subnet.public.id]
}

resource "aws_lb_target_group" "app" {
  name        = "${var.project_name}-tg"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.main.id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.app.arn
  }
}

To my parser, they were just two isolated nodes floating in the void. An explicit map could tell me the database existed, but it couldn't trace the complex web of routing tables, internet gateways, and security groups to tell me if that database was bleeding data to the public internet.

The frustration hit hard. I didn't just want to look at my infrastructure; I wanted to understand it. I realized that if I wanted answers to the richest, most critical questions, I couldn't just stare at a static map. Who better to explain the infrastructure to me than the infrastructure itself? What if I could talk to it?

The structural map was built, but it needed an intelligence layer. It needed an Oracle.

Building the Oracle

To build this intelligence layer, I knew I needed an LLM capable of deep semantic reasoning. But I also knew it had to be completely local. The entire point of Infra-Graph is sovereign visibility. I wasn't about to start piping my infrastructure blueprints to a third-party API, at least not yet. So, I integrated Ollama running Llama 3 locally.

First, I tasked the LLM with uncovering the hidden semantic reality. I built an async pipeline that fed isolated resources to the model. Suddenly, the graph woke up. Dotted lines appeared in the UI.

The AI recognized that an IAM policy granting s3:GetObject implied a dependency on a storage bucket. It saw the port 80 ingress rule and confidently drew an inferred connection to the Load Balancer. It was finding the silent links that regex could never catch.

But finding the links wasn't enough. I wanted to ask questions.

I built a conversational interface (a GraphRAG chatbot) directly into the UI. Now, when I asked, "Is my database exposed to the internet?", the system didn't just regurgitate text. It traversed the actual Neo4j graph, pulling the database, the attached security groups, the subnets, and the route tables into a tight, 1-hop context window. It fed that grounded, topological reality to the Llama 3 model.

Because the LLM was grounded in the graph, hallucinations dropped to near-zero. It returned a precise, accurate answer, identifying the exact security group rule and trailing route that caused the exposure, along with actionable recommendations and security best practices to remediate it.

I no longer just had a map. I had an interactive test environment. Before I even ran a terraform plan, I could ask my infrastructure: "If I delete this Security Group, what breaks down?" and get a deterministic answer translated into plain English.

The "Multi-Repo Verse" was no longer a silent, anxiety-inducing labyrinth. I could finally converse with my infrastructure code, and testing configurations became as simple as asking a question.

It's easy to trust the output of a terraform plan to tell the whole story. But building the map and integrating the Oracle showed me that details are often hidden between the lines. What did I learned about the semantics of infrastructure code? What did this experiment in local, conversational infrastructure actually teach me about how we design and manage systems?

Next up in Part 4: we'll explore the practical lessons learned from this journey and discuss the real value of visual sovereignty.