In my earlier post, I showed you why graph traversal speed is the real bottleneck in AI agents.
But here's the next problem:
Pure graph search requires perfect world models.
And perfect world models don't exist in production.
Your infrastructure graph can't encode:
- "The auth service is flaky on Mondays"
- "This deployment usually causes downstream issues"
- "Users complain about this edge case"
This is where LLMs shine—semantic reasoning about messy, real-world context.
But LLMs alone have their own problems:
- They hallucinate paths that don't exist
- No optimality guarantees
- Expensive token costs at scale
The solution? Combine both.
Let's delve deeper into how to build hybrid symbolic-neural planning systems that get the best of both worlds.
The Core Problem: LLMs vs. Graphs
What LLMs Do Well
Semantic understanding:
User: "The payment service feels slow"
LLM: Interprets "feels slow" as latency degradation
-> Maps to measurable metrics (P95, P99)
-> Suggests relevant remediation paths
Pattern recognition:
LLM: "This failure pattern looks similar to the outage last month
when the cache layer was misconfigured. Maybe check Redis?"
Natural language reasoning:
LLM: "If we rollback, we lose feature X which users need for Y.
Scaling might be safer, but costs more. Here's the tradeoff..."
What LLMs Do Poorly
No structural guarantees:
LLM suggests: "Restart service A, then scale B, then rollback C"
Reality: This path violates dependency constraints
-> Would cause cascading failures
Hallucinated actions:
LLM: "Run the fix-payment-gateway-v2 script"
Reality: That script doesn't exist
-> Agent tries to execute non-existent action
Expensive at scale:
Each planning decision: $0.001-0.01 in API costs
At 1000 incidents/day: $1-10/day just for planning
Plus: 500ms-2s latency per LLM call
What Graph Search Does Well
Optimality guarantees:
Dijkstra/A* always finds the lowest-cost path
-> You can prove it's optimal
-> Critical for high-stakes decisions
Structural constraints:
Graph encodes valid state transitions
-> Can't suggest invalid action sequences
-> Respects dependencies
Speed:
As I showed in my previous post:
~200ms planning time vs. 8+ seconds
-> Enables continuous replanning
What Graph Search Does Poorly
Requires perfect models:
Graph needs: All states, all actions, all costs
Reality: Systems change, new failure modes emerge
-> Graph becomes stale quickly
No semantic reasoning:
Graph knows: "Action A costs 5, Action B costs 8"
Graph doesn't know: "Action A worked last time, B usually fails"
-> Misses learned patterns
Rigid state representation:
Can't encode: "This deployment is risky during peak hours"
-> Lost context that humans use for decisions
The Research Foundation: Learning to Guide Search
Recent work in hybrid planning shows the breakthrough:
"Learning to Plan with Tree Search via Deep RL"
arXiv:2504.17033v2
Link - https://arxiv.org/pdf/2504.17033v2
Core insight:
Train a neural heuristic to guide traditional search algorithms.
Instead of:
- LLM generates full plan (no guarantees)
- Or: Pure graph search (no learning)
You get:
- Neural network learns which paths are promising
- Graph search provides optimality guarantees
- Deep RL optimizes the exploration strategy
How It Works
┌─────────────┐
│ Neural │──▶ Predicts: "This edge looks good"
│ Heuristic │ (learned from historical data)
└─────────────┘
│
▼
┌─────────────┐
│ A* Search │──▶ Uses predictions to guide exploration
│ Algorithm │ (guarantees optimal path)
└─────────────┘
│
▼
┌─────────────┐
│ Optimal │
│ Path │
└─────────────┘
Key advantage:
The neural heuristic learns from past incidents:
- Which actions worked
- Which failed
- Pattern recognition across similar failures
But the graph search guarantees you find the optimal solution given those predictions.
Production Architecture: Hybrid Planning System
Here's how to build this for real infrastructure.
Layer 1: LLM Semantic Layer
Role: High-level reasoning and context understanding
Output:
Layer 2: Neural Guidance Network
Role: Learned heuristic for graph search
Training data:
Layer 3: Graph Database (World Model)
Role: Store valid states and actions
Layer 4: Hybrid Search Engine
Role: Combine neural guidance with graph search
class HybridPlanner:
def __init__(self, graph_db, neural_heuristic, llm):
self.graph = graph_db
self.heuristic = neural_heuristic
self.llm = llm
def plan(self, current_state, goal_state, incident_context):
# Step 1: Get semantic context from LLM
context = self.llm.get_semantic_context(incident_context)
# Step 2: Filter action space using LLM suggestions
candidate_actions = self.graph.get_valid_actions(current_state)
filtered_actions = [
a for a in candidate_actions
if a.name in context['strategies']
]
# Step 3: Use neural heuristic to guide A* search
path = self.a_star_with_neural_heuristic(
start=current_state,
goal=goal_state,
actions=filtered_actions,
context=context
)
return path
def a_star_with_neural_heuristic(self, start, goal, actions, context):
"""
A* search guided by learned heuristic
"""
open_set = PriorityQueue()
open_set.put((0, start))
came_from = {}
g_score = {start: 0}
while not open_set.empty():
current = open_set.get()[1]
if current == goal:
return self.reconstruct_path(came_from, current)
for action in self.graph.get_actions(current):
neighbor = action.target_state
# Base cost from graph
base_cost = action.cost
# Neural heuristic adjustment
state_vec = self.vectorize_state(current, context)
action_vec = self.vectorize_action(action)
with torch.no_grad():
learned_value = self.heuristic(state_vec, action_vec)
# Combined cost: base + learned adjustment
tentative_g = g_score[current] + base_cost
if neighbor not in g_score or tentative_g < g_score[neighbor]:
came_from[neighbor] = (current, action)
g_score[neighbor] = tentative_g
# f_score = g_score + heuristic
f_score = tentative_g - learned_value.item()
open_set.put((f_score, neighbor))
return None # No path found
Real Example: Hybrid Planning in Action
Scenario: Database Latency Spike
Incident Data:
Step 1: LLM Semantic Analysis
Step 2: Graph Defines Valid Paths
Step 3: Neural Heuristic Predicts Effectiveness
Step 4: Hybrid Search Combines Both
Training the Neural Heuristic
Data Collection
Training Loop
Continuous Learning
Expected Performance Characteristics
Theoretical Comparison
Based on the architecture design, here's what you can expect from each approach:
| Approach | Planning Time | Success Pattern | Cost Pattern | Token Cost/Incident |
|---|---|---|---|---|
| Pure LLM | 2-4s typical | ~70-80% (hallucinates invalid actions) | Higher remediation cost | ~$0.01-0.02 |
| Pure Graph | <500ms | ~60-70% (rigid, no learning) | Higher due to suboptimal paths | $0 |
| Hybrid | ~500ms-1s | ~85-95% (learns + validates) | Lower (optimal + learned) | ~$0.002-0.005 |
Why These Patterns Emerge
Pure LLM:
- Slower due to API latency (multiple round-trips)
- Suggests invalid actions ~20-30% of the time (no structural validation)
- Expensive at scale (every planning step = API call)
- Strong semantic understanding helps with novel situations
Pure Graph:
- Very fast (local computation, no API calls)
- Fails on novel scenarios (no learning from past incidents)
- Requires perfect world model (doesn't adapt)
- Zero token costs
Hybrid Approach:
- Balanced speed (one LLM call for context, rest is local)
- High success rate (LLM semantics + graph validation + learned heuristics)
- Low cost (minimal LLM usage, cached learnings)
- Adapts over time as neural heuristic improves
When to Use Each Approach
Use Pure LLM When:
- Novel situations with no historical data
- Complex semantic reasoning required
- Cost/latency aren't critical
- Don't use for: High-frequency decisions, cost-sensitive ops
Use Pure Graph When:
- Well-defined state spaces
- Millisecond latency requirements
- Perfect optimality needed
- Don't use for: Dynamic environments, learning from experience
Use Hybrid When:
--> Production incident response
--> Need both speed and learning
--> Have historical training data available
--> Want to balance cost vs. accuracy
--> Best for: Most real-world autonomous systems
Production Considerations
1. Model Versioning
2. Fallback Strategy
3. Monitoring
The Complete Pipeline
Here's the full system in production:
┌──────────────────────────────────────────────────────┐
│ Incident Detected │
└───────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ LLM: Semantic Analysis │
│ -> Understand symptoms │
│ -> Identify potential root causes │
│ -> Filter action space │
└───────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Neural Heuristic: Predict Action Values │
│ -> Input: Current state + Candidate actions │
│ -> Output: Learned success probability per action │
└───────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Graph Search: Find Optimal Path │
│ -> Use neural predictions to guide A* │
│ -> Guarantee optimal solution given heuristic │
└───────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Execute Plan │
│ -> Monitor execution │
│ -> Log outcome for training │
│ -> Update heuristic model │
└──────────────────────────────────────────────────────┘
What's Next
In Part 3, I'll cover:
- Distributed graph traversal for planet-scale systems
- Choosing the right graph database (Neo4j vs Neptune vs TigerGraph vs custom)
- Cost vs. performance tradeoffs at different scales
Want to see something specific? Drop a comment below.
Try It Yourself
Starting point:
- Take your existing incident runbooks
- Convert them to a graph (Neo4j)
- Collect historical incident data
- Train a simple heuristic (start with scikit-learn)
- Measure improvement over pure LLM or pure graph
The code examples above provide a complete implementation foundation.
Key Takeaways
Hybrid planning combines the best of both worlds:
--> LLMs provide semantic reasoning
--> Neural networks learn from history
--> Graph search guarantees optimality
Expected benefits for production autonomous systems:
- Faster than pure LLM approaches (~500ms-1s vs 2-4s typical)
- Higher success rates through validation and learning (~85-95% vs 70-80%)
- Lower token costs through minimal LLM usage (~$0.002-0.005 vs $0.01-0.02)
The future of AI agents isn't prompts OR graphs—it's both.
References
- Part 1: Why Your AI Agent Is Slow
- Learning to Plan with Tree Search via Deep RL, arXiv:2504.17033v2
- Thorup, M. (2004). "Integer priority queues with decrease key in constant time"
Hit the ❤️ if this helps you build better autonomous systems.
Questions? Challenges? Share your experience in the comments.
About the Author
Connect:
















Top comments (0)