DEV Community

Cover image for Beyond Prompts: How Hybrid LLM-Graph Planning Builds Truly Autonomous AI Agents
Shoaibali Mir
Shoaibali Mir

Posted on

Beyond Prompts: How Hybrid LLM-Graph Planning Builds Truly Autonomous AI Agents

In my earlier post, I showed you why graph traversal speed is the real bottleneck in AI agents.

But here's the next problem:

Pure graph search requires perfect world models.

And perfect world models don't exist in production.

Your infrastructure graph can't encode:

  • "The auth service is flaky on Mondays"
  • "This deployment usually causes downstream issues"
  • "Users complain about this edge case"

This is where LLMs shine—semantic reasoning about messy, real-world context.

But LLMs alone have their own problems:

  • They hallucinate paths that don't exist
  • No optimality guarantees
  • Expensive token costs at scale

The solution? Combine both.

Let's delve deeper into how to build hybrid symbolic-neural planning systems that get the best of both worlds.


The Core Problem: LLMs vs. Graphs

What LLMs Do Well

Semantic understanding:

User: "The payment service feels slow"

LLM: Interprets "feels slow" as latency degradation
     -> Maps to measurable metrics (P95, P99)
     -> Suggests relevant remediation paths
Enter fullscreen mode Exit fullscreen mode

Pattern recognition:

LLM: "This failure pattern looks similar to the outage last month
      when the cache layer was misconfigured. Maybe check Redis?"
Enter fullscreen mode Exit fullscreen mode

Natural language reasoning:

LLM: "If we rollback, we lose feature X which users need for Y.
      Scaling might be safer, but costs more. Here's the tradeoff..."
Enter fullscreen mode Exit fullscreen mode

What LLMs Do Poorly

No structural guarantees:

LLM suggests: "Restart service A, then scale B, then rollback C"

Reality: This path violates dependency constraints
         -> Would cause cascading failures
Enter fullscreen mode Exit fullscreen mode

Hallucinated actions:

LLM: "Run the fix-payment-gateway-v2 script"

Reality: That script doesn't exist
         -> Agent tries to execute non-existent action
Enter fullscreen mode Exit fullscreen mode

Expensive at scale:

Each planning decision: $0.001-0.01 in API costs
At 1000 incidents/day: $1-10/day just for planning
Plus: 500ms-2s latency per LLM call
Enter fullscreen mode Exit fullscreen mode

What Graph Search Does Well

Optimality guarantees:

Dijkstra/A* always finds the lowest-cost path
-> You can prove it's optimal
-> Critical for high-stakes decisions
Enter fullscreen mode Exit fullscreen mode

Structural constraints:

Graph encodes valid state transitions
-> Can't suggest invalid action sequences
-> Respects dependencies
Enter fullscreen mode Exit fullscreen mode

Speed:

As I showed in my previous post:
~200ms planning time vs. 8+ seconds
-> Enables continuous replanning
Enter fullscreen mode Exit fullscreen mode

What Graph Search Does Poorly

Requires perfect models:

Graph needs: All states, all actions, all costs
Reality: Systems change, new failure modes emerge
-> Graph becomes stale quickly
Enter fullscreen mode Exit fullscreen mode

No semantic reasoning:

Graph knows: "Action A costs 5, Action B costs 8"
Graph doesn't know: "Action A worked last time, B usually fails"
-> Misses learned patterns
Enter fullscreen mode Exit fullscreen mode

Rigid state representation:

Can't encode: "This deployment is risky during peak hours"
-> Lost context that humans use for decisions
Enter fullscreen mode Exit fullscreen mode

The Research Foundation: Learning to Guide Search

Recent work in hybrid planning shows the breakthrough:

"Learning to Plan with Tree Search via Deep RL"

arXiv:2504.17033v2
Link - https://arxiv.org/pdf/2504.17033v2

Core insight:

Train a neural heuristic to guide traditional search algorithms.

Instead of:

  • LLM generates full plan (no guarantees)
  • Or: Pure graph search (no learning)

You get:

  • Neural network learns which paths are promising
  • Graph search provides optimality guarantees
  • Deep RL optimizes the exploration strategy

How It Works

┌─────────────┐
│   Neural    │──▶ Predicts: "This edge looks good"
│  Heuristic  │    (learned from historical data)
└─────────────┘
       │
       ▼
┌─────────────┐
│   A* Search │──▶ Uses predictions to guide exploration
│  Algorithm  │    (guarantees optimal path)
└─────────────┘
       │
       ▼
┌─────────────┐
│  Optimal    │
│    Path     │
└─────────────┘
Enter fullscreen mode Exit fullscreen mode

Key advantage:

The neural heuristic learns from past incidents:

  • Which actions worked
  • Which failed
  • Pattern recognition across similar failures

But the graph search guarantees you find the optimal solution given those predictions.


Production Architecture: Hybrid Planning System

Here's how to build this for real infrastructure.

Layer 1: LLM Semantic Layer

Role: High-level reasoning and context understanding

Image 1

Output:

Image 2

Layer 2: Neural Guidance Network

Role: Learned heuristic for graph search

Image 3

Training data:

Image 4

Layer 3: Graph Database (World Model)

Role: Store valid states and actions

Image 5

Layer 4: Hybrid Search Engine

Role: Combine neural guidance with graph search

class HybridPlanner:
    def __init__(self, graph_db, neural_heuristic, llm):
        self.graph = graph_db
        self.heuristic = neural_heuristic
        self.llm = llm

    def plan(self, current_state, goal_state, incident_context):
        # Step 1: Get semantic context from LLM
        context = self.llm.get_semantic_context(incident_context)

        # Step 2: Filter action space using LLM suggestions
        candidate_actions = self.graph.get_valid_actions(current_state)
        filtered_actions = [
            a for a in candidate_actions 
            if a.name in context['strategies']
        ]

        # Step 3: Use neural heuristic to guide A* search
        path = self.a_star_with_neural_heuristic(
            start=current_state,
            goal=goal_state,
            actions=filtered_actions,
            context=context
        )

        return path

    def a_star_with_neural_heuristic(self, start, goal, actions, context):
        """
        A* search guided by learned heuristic
        """
        open_set = PriorityQueue()
        open_set.put((0, start))

        came_from = {}
        g_score = {start: 0}

        while not open_set.empty():
            current = open_set.get()[1]

            if current == goal:
                return self.reconstruct_path(came_from, current)

            for action in self.graph.get_actions(current):
                neighbor = action.target_state

                # Base cost from graph
                base_cost = action.cost

                # Neural heuristic adjustment
                state_vec = self.vectorize_state(current, context)
                action_vec = self.vectorize_action(action)

                with torch.no_grad():
                    learned_value = self.heuristic(state_vec, action_vec)

                # Combined cost: base + learned adjustment
                tentative_g = g_score[current] + base_cost

                if neighbor not in g_score or tentative_g < g_score[neighbor]:
                    came_from[neighbor] = (current, action)
                    g_score[neighbor] = tentative_g

                    # f_score = g_score + heuristic
                    f_score = tentative_g - learned_value.item()
                    open_set.put((f_score, neighbor))

        return None  # No path found
Enter fullscreen mode Exit fullscreen mode

Real Example: Hybrid Planning in Action

Scenario: Database Latency Spike

Incident Data:

Image A

Step 1: LLM Semantic Analysis

Image B

Step 2: Graph Defines Valid Paths

Image C

Step 3: Neural Heuristic Predicts Effectiveness

Image D

Step 4: Hybrid Search Combines Both

Image E


Training the Neural Heuristic

Data Collection

Image F

Training Loop

Image G

Continuous Learning

Image H


Expected Performance Characteristics

Theoretical Comparison

Based on the architecture design, here's what you can expect from each approach:

Approach Planning Time Success Pattern Cost Pattern Token Cost/Incident
Pure LLM 2-4s typical ~70-80% (hallucinates invalid actions) Higher remediation cost ~$0.01-0.02
Pure Graph <500ms ~60-70% (rigid, no learning) Higher due to suboptimal paths $0
Hybrid ~500ms-1s ~85-95% (learns + validates) Lower (optimal + learned) ~$0.002-0.005

Why These Patterns Emerge

Pure LLM:

  • Slower due to API latency (multiple round-trips)
  • Suggests invalid actions ~20-30% of the time (no structural validation)
  • Expensive at scale (every planning step = API call)
  • Strong semantic understanding helps with novel situations

Pure Graph:

  • Very fast (local computation, no API calls)
  • Fails on novel scenarios (no learning from past incidents)
  • Requires perfect world model (doesn't adapt)
  • Zero token costs

Hybrid Approach:

  • Balanced speed (one LLM call for context, rest is local)
  • High success rate (LLM semantics + graph validation + learned heuristics)
  • Low cost (minimal LLM usage, cached learnings)
  • Adapts over time as neural heuristic improves

When to Use Each Approach

Use Pure LLM When:

  • Novel situations with no historical data
  • Complex semantic reasoning required
  • Cost/latency aren't critical
  • Don't use for: High-frequency decisions, cost-sensitive ops

Use Pure Graph When:

  • Well-defined state spaces
  • Millisecond latency requirements
  • Perfect optimality needed
  • Don't use for: Dynamic environments, learning from experience

Use Hybrid When:

--> Production incident response
--> Need both speed and learning
--> Have historical training data available
--> Want to balance cost vs. accuracy
--> Best for: Most real-world autonomous systems


Production Considerations

1. Model Versioning

Image Q

2. Fallback Strategy

Image W

3. Monitoring

Image P


The Complete Pipeline

Here's the full system in production:

┌──────────────────────────────────────────────────────┐
│                  Incident Detected                   │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  LLM: Semantic Analysis                              │
│  -> Understand symptoms                              │
│  -> Identify potential root causes                   │
│  -> Filter action space                              │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Neural Heuristic: Predict Action Values             │
│  -> Input: Current state + Candidate actions         │
│  -> Output: Learned success probability per action   │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Graph Search: Find Optimal Path                     │
│  -> Use neural predictions to guide A*               │
│  -> Guarantee optimal solution given heuristic       │
└───────────────────┬──────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────┐
│  Execute Plan                                        │
│  -> Monitor execution                                │
│  -> Log outcome for training                         │
│  -> Update heuristic model                           │
└──────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

What's Next

In Part 3, I'll cover:

  1. Distributed graph traversal for planet-scale systems
  2. Choosing the right graph database (Neo4j vs Neptune vs TigerGraph vs custom)
  3. Cost vs. performance tradeoffs at different scales

Want to see something specific? Drop a comment below.


Try It Yourself

Starting point:

  1. Take your existing incident runbooks
  2. Convert them to a graph (Neo4j)
  3. Collect historical incident data
  4. Train a simple heuristic (start with scikit-learn)
  5. Measure improvement over pure LLM or pure graph

The code examples above provide a complete implementation foundation.


Key Takeaways

Hybrid planning combines the best of both worlds:

--> LLMs provide semantic reasoning

--> Neural networks learn from history

--> Graph search guarantees optimality

Expected benefits for production autonomous systems:

  • Faster than pure LLM approaches (~500ms-1s vs 2-4s typical)
  • Higher success rates through validation and learning (~85-95% vs 70-80%)
  • Lower token costs through minimal LLM usage (~$0.002-0.005 vs $0.01-0.02)

The future of AI agents isn't prompts OR graphs—it's both.


References

  • Part 1: Why Your AI Agent Is Slow
  • Learning to Plan with Tree Search via Deep RL, arXiv:2504.17033v2
  • Thorup, M. (2004). "Integer priority queues with decrease key in constant time"

Hit the ❤️ if this helps you build better autonomous systems.

Questions? Challenges? Share your experience in the comments.


About the Author

Connect:


Top comments (0)