Shoaibali Mir

Posted on Jan 2

Beyond Prompts: How Hybrid LLM-Graph Planning Builds Truly Autonomous AI Agents

#machinelearning #devops #ai #architecture

In my earlier post, I showed you why graph traversal speed is the real bottleneck in AI agents.

But here's the next problem:

Pure graph search requires perfect world models.

And perfect world models don't exist in production.

Your infrastructure graph can't encode:

"The auth service is flaky on Mondays"
"This deployment usually causes downstream issues"
"Users complain about this edge case"

This is where LLMs shine—semantic reasoning about messy, real-world context.

But LLMs alone have their own problems:

They hallucinate paths that don't exist
No optimality guarantees
Expensive token costs at scale

The solution? Combine both.

Let's delve deeper into how to build hybrid symbolic-neural planning systems that get the best of both worlds.

The Core Problem: LLMs vs. Graphs

What LLMs Do Well

Semantic understanding:

User: "The payment service feels slow"

LLM: Interprets "feels slow" as latency degradation
     -> Maps to measurable metrics (P95, P99)
     -> Suggests relevant remediation paths

Pattern recognition:

LLM: "This failure pattern looks similar to the outage last month
      when the cache layer was misconfigured. Maybe check Redis?"

Natural language reasoning:

LLM: "If we rollback, we lose feature X which users need for Y.
      Scaling might be safer, but costs more. Here's the tradeoff..."

What LLMs Do Poorly

No structural guarantees:

LLM suggests: "Restart service A, then scale B, then rollback C"

Reality: This path violates dependency constraints
         -> Would cause cascading failures

Hallucinated actions:

LLM: "Run the fix-payment-gateway-v2 script"

Reality: That script doesn't exist
         -> Agent tries to execute non-existent action

Expensive at scale:

Each planning decision: $0.001-0.01 in API costs
At 1000 incidents/day: $1-10/day just for planning
Plus: 500ms-2s latency per LLM call

What Graph Search Does Well

Optimality guarantees:

Dijkstra/A* always finds the lowest-cost path
-> You can prove it's optimal
-> Critical for high-stakes decisions

Structural constraints:

Graph encodes valid state transitions
-> Can't suggest invalid action sequences
-> Respects dependencies

Speed:

As I showed in my previous post:
~200ms planning time vs. 8+ seconds
-> Enables continuous replanning

What Graph Search Does Poorly

Requires perfect models:

Graph needs: All states, all actions, all costs
Reality: Systems change, new failure modes emerge
-> Graph becomes stale quickly

No semantic reasoning:

Graph knows: "Action A costs 5, Action B costs 8"
Graph doesn't know: "Action A worked last time, B usually fails"
-> Misses learned patterns

Rigid state representation:

Can't encode: "This deployment is risky during peak hours"
-> Lost context that humans use for decisions

The Research Foundation: Learning to Guide Search

Recent work in hybrid planning shows the breakthrough:

"Learning to Plan with Tree Search via Deep RL"

arXiv:2504.17033v2
Link - https://arxiv.org/pdf/2504.17033v2

Core insight:

Train a neural heuristic to guide traditional search algorithms.

Instead of:

LLM generates full plan (no guarantees)
Or: Pure graph search (no learning)

You get:

Neural network learns which paths are promising
Graph search provides optimality guarantees
Deep RL optimizes the exploration strategy

How It Works

┌─────────────┐
│   Neural    │──▶ Predicts: "This edge looks good"
│  Heuristic  │    (learned from historical data)
└─────────────┘
       │
       ▼
┌─────────────┐
│   A* Search │──▶ Uses predictions to guide exploration
│  Algorithm  │    (guarantees optimal path)
└─────────────┘
       │
       ▼
┌─────────────┐
│  Optimal    │
│    Path     │
└─────────────┘

Key advantage:

The neural heuristic learns from past incidents:

Which actions worked
Which failed
Pattern recognition across similar failures

But the graph search guarantees you find the optimal solution given those predictions.

Production Architecture: Hybrid Planning System

Here's how to build this for real infrastructure.

Layer 1: LLM Semantic Layer

Role: High-level reasoning and context understanding

Output:

Layer 2: Neural Guidance Network

Role: Learned heuristic for graph search

Training data:

Layer 3: Graph Database (World Model)

Role: Store valid states and actions

Layer 4: Hybrid Search Engine

Role: Combine neural guidance with graph search

class HybridPlanner:
    def __init__(self, graph_db, neural_heuristic, llm):
        self.graph = graph_db
        self.heuristic = neural_heuristic
        self.llm = llm

    def plan(self, current_state, goal_state, incident_context):
        # Step 1: Get semantic context from LLM
        context = self.llm.get_semantic_context(incident_context)

        # Step 2: Filter action space using LLM suggestions
        candidate_actions = self.graph.get_valid_actions(current_state)
        filtered_actions = [
            a for a in candidate_actions 
            if a.name in context['strategies']
        ]

        # Step 3: Use neural heuristic to guide A* search
        path = self.a_star_with_neural_heuristic(
            start=current_state,
            goal=goal_state,
            actions=filtered_actions,
            context=context
        )

        return path

    def a_star_with_neural_heuristic(self, start, goal, actions, context):
        """
        A* search guided by learned heuristic
        """
        open_set = PriorityQueue()
        open_set.put((0, start))

        came_from = {}
        g_score = {start: 0}

        while not open_set.empty():
            current = open_set.get()[1]

            if current == goal:
                return self.reconstruct_path(came_from, current)

            for action in self.graph.get_actions(current):
                neighbor = action.target_state

                # Base cost from graph
                base_cost = action.cost

                # Neural heuristic adjustment
                state_vec = self.vectorize_state(current, context)
                action_vec = self.vectorize_action(action)

                with torch.no_grad():
                    learned_value = self.heuristic(state_vec, action_vec)

                # Combined cost: base + learned adjustment
                tentative_g = g_score[current] + base_cost

                if neighbor not in g_score or tentative_g < g_score[neighbor]:
                    came_from[neighbor] = (current, action)
                    g_score[neighbor] = tentative_g

                    # f_score = g_score + heuristic
                    f_score = tentative_g - learned_value.item()
                    open_set.put((f_score, neighbor))

        return None  # No path found

Real Example: Hybrid Planning in Action

Scenario: Database Latency Spike

Incident Data:

Step 1: LLM Semantic Analysis

Step 2: Graph Defines Valid Paths

Step 3: Neural Heuristic Predicts Effectiveness

Step 4: Hybrid Search Combines Both

Training the Neural Heuristic

Data Collection

Training Loop

Continuous Learning

Expected Performance Characteristics

Theoretical Comparison

Based on the architecture design, here's what you can expect from each approach:

Approach	Planning Time	Success Pattern	Cost Pattern	Token Cost/Incident
Pure LLM	2-4s typical	~70-80% (hallucinates invalid actions)	Higher remediation cost	~$0.01-0.02
Pure Graph	<500ms	~60-70% (rigid, no learning)	Higher due to suboptimal paths	$0
Hybrid	~500ms-1s	~85-95% (learns + validates)	Lower (optimal + learned)	~$0.002-0.005

Why These Patterns Emerge

Pure LLM:

Slower due to API latency (multiple round-trips)
Suggests invalid actions ~20-30% of the time (no structural validation)
Expensive at scale (every planning step = API call)
Strong semantic understanding helps with novel situations

Pure Graph:

Very fast (local computation, no API calls)
Fails on novel scenarios (no learning from past incidents)
Requires perfect world model (doesn't adapt)
Zero token costs

Hybrid Approach:

Balanced speed (one LLM call for context, rest is local)
High success rate (LLM semantics + graph validation + learned heuristics)
Low cost (minimal LLM usage, cached learnings)
Adapts over time as neural heuristic improves

When to Use Each Approach

Use Pure LLM When:

Novel situations with no historical data
Complex semantic reasoning required
Cost/latency aren't critical
Don't use for: High-frequency decisions, cost-sensitive ops

Use Pure Graph When:

Well-defined state spaces
Millisecond latency requirements
Perfect optimality needed
Don't use for: Dynamic environments, learning from experience

Use Hybrid When:

--> Production incident response
--> Need both speed and learning
--> Have historical training data available
--> Want to balance cost vs. accuracy
--> Best for: Most real-world autonomous systems

Production Considerations

1. Model Versioning

2. Fallback Strategy

3. Monitoring

The Complete Pipeline

Here's the full system in production:

┌──────────────────────────────────────────────────────┐
│                  Incident Detected                   │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  LLM: Semantic Analysis                              │
│  -> Understand symptoms                              │
│  -> Identify potential root causes                   │
│  -> Filter action space                              │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Neural Heuristic: Predict Action Values             │
│  -> Input: Current state + Candidate actions         │
│  -> Output: Learned success probability per action   │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Graph Search: Find Optimal Path                     │
│  -> Use neural predictions to guide A*               │
│  -> Guarantee optimal solution given heuristic       │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Execute Plan                                        │
│  -> Monitor execution                                │
│  -> Log outcome for training                         │
│  -> Update heuristic model                           │
└──────────────────────────────────────────────────────┘

What's Next

In Part 3, I'll cover:

Distributed graph traversal for planet-scale systems
Choosing the right graph database (Neo4j vs Neptune vs TigerGraph vs custom)
Cost vs. performance tradeoffs at different scales

Want to see something specific? Drop a comment below.

Try It Yourself

Starting point:

Take your existing incident runbooks
Convert them to a graph (Neo4j)
Collect historical incident data
Train a simple heuristic (start with scikit-learn)
Measure improvement over pure LLM or pure graph

The code examples above provide a complete implementation foundation.

Key Takeaways

Hybrid planning combines the best of both worlds:

--> LLMs provide semantic reasoning

--> Neural networks learn from history

--> Graph search guarantees optimality

Expected benefits for production autonomous systems:

Faster than pure LLM approaches (~500ms-1s vs 2-4s typical)
Higher success rates through validation and learning (~85-95% vs 70-80%)
Lower token costs through minimal LLM usage (~$0.002-0.005 vs $0.01-0.02)

The future of AI agents isn't prompts OR graphs—it's both.

References

Part 1: Why Your AI Agent Is Slow
Learning to Plan with Tree Search via Deep RL, arXiv:2504.17033v2
Thorup, M. (2004). "Integer priority queues with decrease key in constant time"

Hit the ❤️ if this helps you build better autonomous systems.

Questions? Challenges? Share your experience in the comments.

About the Author

Shoaibali Mir

I'm an engineer with 4+ yrs of experience spanning across DevOps, Data, Cloud and AI/ML Engineering Domain. Along with full time work, I'm pursuing Masters Degree in AI/ML from BITS Pilani.

Connect:

The Core Problem: LLMs vs. Graphs

What LLMs Do Well

What LLMs Do Poorly

What Graph Search Does Well

What Graph Search Does Poorly

The Research Foundation: Learning to Guide Search

How It Works

Production Architecture: Hybrid Planning System

Layer 1: LLM Semantic Layer

Layer 2: Neural Guidance Network

Layer 3: Graph Database (World Model)

Layer 4: Hybrid Search Engine

Real Example: Hybrid Planning in Action

Scenario: Database Latency Spike

Step 1: LLM Semantic Analysis

Step 2: Graph Defines Valid Paths

Step 3: Neural Heuristic Predicts Effectiveness

Step 4: Hybrid Search Combines Both

Training the Neural Heuristic

Data Collection

Training Loop

Continuous Learning

Expected Performance Characteristics

Theoretical Comparison

Why These Patterns Emerge

When to Use Each Approach

Use Pure LLM When:

Use Pure Graph When:

Use Hybrid When:

Production Considerations

1. Model Versioning

2. Fallback Strategy

3. Monitoring

The Complete Pipeline

What's Next

Try It Yourself

Key Takeaways

References

About the Author

Shoaibali MirFollow

Shoaibali Mir