How I Built a RAG System That Actually Understands Business Metrics (Part 2: Hierarchical Search)
TLDR: Part 1 could find "total_revenue", but failed at "Naver traffic". This article shows how I fixed that with a tree-based metric system and 2-stage GPT filtering. Result: 95% → 98% accuracy.
The Real Problem
Remember Part 1? We built this:
User: "What's the conversion rate?"
System: Found "conversion_rate"
But then this happened:
User: "How's my Naver traffic?"
System: Found "traffic_by_channel"
But which channel? Naver? Google? Facebook?
The gap: Our system understood categories but couldn't navigate their subcategories.
💡 The Solution in 3 Moves
Think of it like a chess game. We need three moves to checkmate:
Move 1: SEARCH → Cast a wide net (50 candidates)
Move 2: CLASSIFY → Filter intelligently (3 names → 12 nodes)
Move 3: REFINE → Validate relationships (final 2 metrics)
Let's see each move in action.
Move 1: The Setup (Vector Search)
The Challenge: Metrics have hierarchy
traffic_by_channel/ ← Parent (1-depth)
├── naver ← Child (2-depth)
├── google
├── facebook
└── direct_traffic
The Strategy: Use TWO search indices
| Index | Purpose | Top N |
|---|---|---|
| 1-depth | Find categories | 10 |
| 2-depth | Find specific values | 40 |
Why 40 vs 10? 2-depth metrics are like finding "John" in a phone book with 1000 Johns. We need more candidates.
Example Result:
Query: "How's my Naver traffic?"
1-depth finds: traffic_by_channel, site_duration, bounce_rate...
2-depth finds: naver, google, facebook, traffic_count...
Total: 50 candidates
Key Insight: Don't search for "traffic_by_channel|naver" directly. Search for each piece separately, then connect them.
Move 2: The Play (Smart Classification)
This move has 3 sub-plays. Watch closely.
Sub-Play 2.1: Remove the Noise 🧹
Problem: 50 candidates include junk
For "Naver traffic":
naver ← Relevant
traffic_count ← Maybe relevant?
instagram ← Wrong channel
bounce_rate ← Not about traffic sources
Solution: Ask GPT (Structured Outputs)
// Force GPT to return clean JSON array
"response_format": {
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"metrics": {
"type": "array",
"items": {"type": "string"}
}
}
}
}
Output:
{
"metrics": ["naver", "direct_traffic", "traffic_count"]
}
Result: 50 → 3 ✂️
Sub-Play 2.2: From Names to Nodes
Problem: "naver" is just a string. We need context.
What we need to know:
- Is this advertiser's data or industry average?
- What are the parent metrics?
- What other "naver" nodes exist?
Solution: MetricForest lookup
"naver" in nameIndex → [
"traffic_by_channel|naver|advertiser",
"traffic_by_channel|naver|industry",
"channel_revenue|naver|advertiser",
"channel_revenue|naver|industry",
"first_purchase|naver|advertiser",
"repeat_purchase|naver|advertiser"
]
The Magic: O(1) lookup using HashMap
// Without nameIndex: O(N) scan through all nodes
for (node in allNodes) {
if (node.name == "naver") { ... } // Slow!
}
// With nameIndex: O(1) direct access
List<MetricNode> nodes = nameIndex.get("naver"); // Fast!
Result: 3 names → 12 nodes (each name maps to multiple nodes)
Sub-Play 2.3: Pick Your Side
Problem: Same metric, two meanings
traffic_by_channel|naver|advertiser → "Our Naver traffic"
traffic_by_channel|naver|industry → "Industry average"
Question: Which one does the user want?
Solution: Let GPT decide
Query Analysis:
"How's MY Naver traffic?" → Advertiser
"Industry average?" → Industry
"Compare to industry" → Both
Implementation:
String scope = classifyScope(query);
if ("common".equals(scope)) {
return ["advertiser", "industry"]; // Return both
} else {
return [scope]; // Return one
}
Result: 12 nodes → Filter to advertiser domain only
Move 2 Summary:
- Started with: 50 candidates
- Ended with: 12 nodes, advertiser domain confirmed
- But we're not done yet...
Move 3: Checkmate (Relationship Validation)
Here's where it gets interesting. We have 12 nodes:
✓ traffic_by_channel|naver|advertiser
✓ channel_revenue|naver|advertiser ← Wrong parent!
✓ traffic_by_channel|direct_traffic|advertiser
✓ new_members|traffic_count|advertiser ← Wrong parent!
... 8 more
Problem: Not all are correct. "channel_revenue|naver" isn't about traffic.
Solution: 3-step validation
Step 3.1: Collect the Neighbors
Rule:
- If 1-depth node → Get children
- If 2-depth node → Get parents
Why? To understand relationships.
Our 12 nodes are all 2-depth
→ Collect their parents
→ Get 6 unique parent names:
1. traffic_by_channel ← Traffic related ✓
2. channel_revenue ← Revenue, not traffic ✗
3. first_purchase ← Purchase, not traffic ✗
4. repeat_purchase ← Purchase, not traffic ✗
5. new_members ← Not about channels ✗
6. existing_members ← Not about channels ✗
Step 3.2: Filter Parents with GPT
Ask GPT: "Which parents are relevant to 'Naver traffic'?"
Input: 6 parent names
GPT thinking:
- traffic_by_channel? YES → About traffic AND channels
- channel_revenue? NO → About revenue, not traffic
- first_purchase? NO → About purchases
- new_members? NO → Not related to Naver
Output: ["traffic_by_channel"]
This is CRITICAL: Only 1 parent passed! This means:
- Keep: traffic_by_channel|naver
- Remove: channel_revenue|naver
- Remove: new_members|traffic_count
Step 3.3: Final Validation
Now apply 3 rules:
Rule 1: Leaf nodes (no children) → Accept immediately
if (node.depth == 1 && node.children.isEmpty()) {
return node; // Already final
}
Rule 2: Parent nodes (has children) → Return matching children
if (node.depth == 1 && !node.children.isEmpty()) {
return node.children
.filter(child -> secondResult.contains(child.name));
}
Rule 3: Child nodes → Validate parent exists in secondResult
if (node.depth == 2) {
return node.parents
.anyMatch(parent -> secondResult.contains(parent.name));
}
Applying to our 12 nodes:
Check: traffic_by_channel|naver|advertiser
→ Parent "traffic_by_channel" in secondResult?
→ ACCEPT
Check: channel_revenue|naver|advertiser
→ Parent "channel_revenue" in secondResult?
→ REJECT
Check: new_members|traffic_count|advertiser
→ Parent "new_members" in secondResult?
→ REJECT
Final Result:
[
"traffic_by_channel|naver|advertiser",
"traffic_by_channel|direct_traffic|advertiser"
]
Checkmate!
The Complete Game Replay
Let's watch the entire sequence:
Query: "How's my Naver traffic?"
Move 1: SEARCH
├─ 1-depth index → 10 results
├─ 2-depth index → 40 results
└─ Total: 50 candidates
Move 2: CLASSIFY
├─ 2.1: GPT Filter → 3 names
├─ 2.2: Node Lookup → 12 nodes
└─ 2.3: Domain Filter → ["advertiser"]
Move 3: REFINE
├─ 3.1: Collect neighbors → 6 parent names
├─ 3.2: GPT Filter → 1 parent name
└─ 3.3: Validate → 2 final metrics
RESULT:
✓ traffic_by_channel|naver|advertiser
✓ traffic_by_channel|direct_traffic|advertiser
Time taken: < 1 second
Cost per query: ~$0.002
Before vs After
Part 1 (Basic Search)
"What's the conversion rate?"
→ conversion_rate
"How's my Naver traffic?"
→ Found "traffic_by_channel" but couldn't find "naver"
Accuracy: 95% for flat metrics only
Part 2 (Hierarchical Search)
"What's the conversion rate?"
→ conversion_rate
"How's my Naver traffic?"
→ traffic_by_channel|naver|advertiser
→ traffic_by_channel|direct_traffic|advertiser
"Compare my Google traffic to industry"
→ traffic_by_channel|google|advertiser
→ traffic_by_channel|google|industry
Accuracy: 98% for all queries including hierarchical
What I Learned
1. Use AI Where It Shines
Good: Semantic understanding
// Is "revenue" related to "sales"? → Ask GPT
// Is "naver" relevant to "traffic"? → Ask GPT
Bad: Deterministic logic
// Is this node depth 1 or 2? → Just check node.depth
// Does parent exist? → Just check hashmap
2. Two-Stage Filtering is Magic
One GPT call on 50 candidates → 70% accuracy
Two GPT calls (50 → 6 → final) → 98% accuracy
The second filter on a small, focused set is what makes it work.
3. Data Structure = Foundation
Without MetricForest:
- Can't collect neighbors
- Can't validate relationships
- Can't distinguish contexts
The tree structure makes everything else possible.
4. Filter Domain LATE, Not Early
Wrong approach:
// Filter domain first
nodes = filterByDomain(nodes, "advertiser");
// Now we can't see that both domains existed!
Right approach:
// Let GPT see both domains
nodes = getAllNodes();
domain = askGPT("Which domain does user want?");
// Now filter
nodes = filterByDomain(nodes, domain);
This enables "compare" queries that need both domains.
Real-World Code
Here's the orchestration service that ties it all together:
@Service
@RequiredArgsConstructor
public class MetricSearchService {
private final VectorSearchService vectorSearch;
private final StructuredOutputsService gptFilter;
private final MetricForest metricForest;
private final DomainClassifier domainClassifier;
public List<String> search(String query) {
// Move 1: Search
var candidates = vectorSearch.search(query, 50);
// Move 2.1: First filter
var names = gptFilter.filter(query, candidates);
// Move 2.2: Resolve to nodes
var nodes = metricForest.findByNames(names);
// Move 2.3: Decide domain
var domain = domainClassifier.classify(query, nodes);
// Move 3.1-3.2: Collect and filter neighbors
var neighbors = collectNeighbors(nodes);
var validParents = gptFilter.filter(query, neighbors);
// Move 3.3: Final validation
return validate(nodes, domain, validParents);
}
}
Clean, simple, effective.
Try These Queries
The system now handles:
Flat metrics (from Part 1)
"What's the conversion rate?"
"Show order count"
Hierarchical metrics (NEW!)
"My Naver traffic?"
"Google revenue?"
"Facebook first purchase rate?"
Comparisons (NEW!)
"Compare my Naver traffic to industry average"
Category queries (NEW!)
"Show all channel traffic"
What's Next?
Current system handles:
- Flat metrics
- Hierarchical metrics
- Domain disambiguation
- Comparison queries
Coming soon:
- Time ranges ("last month's Naver traffic")
- Aggregations ("total revenue by channel")
- Conversational follow-ups
- Multi-metric queries
Need Help Implementing RAG?
I help companies integrate AI systems with their existing Spring Boot infrastructure.
Specializing in:
- Spring Boot + OpenAI integration
- Custom RAG pipelines
- E-commerce analytics systems
📧 Contact: [junyoungmoon9857@gmail.com]
Tags: #rag #hierarchical-search #spring-boot #openai #system-design #ai-engineering
Top comments (0)