DEV Community

JunYoungMoon
JunYoungMoon

Posted on

Our RAG system still failed on hierarchical metrics — Part 2

How I Built a RAG System That Actually Understands Business Metrics (Part 2: Hierarchical Search)

TLDR: Part 1 could find "total_revenue", but failed at "Naver traffic". This article shows how I fixed that with a tree-based metric system and 2-stage GPT filtering. Result: 95% → 98% accuracy.


The Real Problem

Remember Part 1? We built this:

User: "What's the conversion rate?"
System: Found "conversion_rate"
Enter fullscreen mode Exit fullscreen mode

But then this happened:

User: "How's my Naver traffic?"
System: Found "traffic_by_channel"
        But which channel? Naver? Google? Facebook?
Enter fullscreen mode Exit fullscreen mode

The gap: Our system understood categories but couldn't navigate their subcategories.


💡 The Solution in 3 Moves

Think of it like a chess game. We need three moves to checkmate:

Move 1: SEARCH      → Cast a wide net (50 candidates)
Move 2: CLASSIFY    → Filter intelligently (3 names → 12 nodes)
Move 3: REFINE      → Validate relationships (final 2 metrics)
Enter fullscreen mode Exit fullscreen mode

Let's see each move in action.


Move 1: The Setup (Vector Search)

The Challenge: Metrics have hierarchy

traffic_by_channel/           ← Parent (1-depth)
├── naver                     ← Child (2-depth)
├── google
├── facebook
└── direct_traffic
Enter fullscreen mode Exit fullscreen mode

The Strategy: Use TWO search indices

Index Purpose Top N
1-depth Find categories 10
2-depth Find specific values 40

Why 40 vs 10? 2-depth metrics are like finding "John" in a phone book with 1000 Johns. We need more candidates.

Example Result:

Query: "How's my Naver traffic?"

1-depth finds: traffic_by_channel, site_duration, bounce_rate...
2-depth finds: naver, google, facebook, traffic_count...

Total: 50 candidates
Enter fullscreen mode Exit fullscreen mode

Key Insight: Don't search for "traffic_by_channel|naver" directly. Search for each piece separately, then connect them.


Move 2: The Play (Smart Classification)

This move has 3 sub-plays. Watch closely.

Sub-Play 2.1: Remove the Noise 🧹

Problem: 50 candidates include junk

For "Naver traffic":
naver          ← Relevant
traffic_count  ← Maybe relevant?
instagram      ← Wrong channel
bounce_rate    ← Not about traffic sources
Enter fullscreen mode Exit fullscreen mode

Solution: Ask GPT (Structured Outputs)

// Force GPT to return clean JSON array
"response_format": {
  "type": "json_schema",
  "schema": {
    "type": "object",
    "properties": {
      "metrics": {
        "type": "array",
        "items": {"type": "string"}
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "metrics": ["naver", "direct_traffic", "traffic_count"]
}
Enter fullscreen mode Exit fullscreen mode

Result: 50 → 3 ✂️


Sub-Play 2.2: From Names to Nodes

Problem: "naver" is just a string. We need context.

What we need to know:

  • Is this advertiser's data or industry average?
  • What are the parent metrics?
  • What other "naver" nodes exist?

Solution: MetricForest lookup

"naver" in nameIndex → [
  "traffic_by_channel|naver|advertiser",
  "traffic_by_channel|naver|industry",
  "channel_revenue|naver|advertiser",
  "channel_revenue|naver|industry",
  "first_purchase|naver|advertiser",
  "repeat_purchase|naver|advertiser"
]
Enter fullscreen mode Exit fullscreen mode

The Magic: O(1) lookup using HashMap

// Without nameIndex: O(N) scan through all nodes
for (node in allNodes) {
  if (node.name == "naver") { ... }  // Slow!
}

// With nameIndex: O(1) direct access
List<MetricNode> nodes = nameIndex.get("naver");  // Fast!
Enter fullscreen mode Exit fullscreen mode

Result: 3 names → 12 nodes (each name maps to multiple nodes)


Sub-Play 2.3: Pick Your Side

Problem: Same metric, two meanings

traffic_by_channel|naver|advertiser  → "Our Naver traffic"
traffic_by_channel|naver|industry    → "Industry average"
Enter fullscreen mode Exit fullscreen mode

Question: Which one does the user want?

Solution: Let GPT decide

Query Analysis:
"How's MY Naver traffic?"     → Advertiser
"Industry average?"           → Industry  
"Compare to industry"         → Both
Enter fullscreen mode Exit fullscreen mode

Implementation:

String scope = classifyScope(query);

if ("common".equals(scope)) {
    return ["advertiser", "industry"];  // Return both
} else {
    return [scope];  // Return one
}
Enter fullscreen mode Exit fullscreen mode

Result: 12 nodes → Filter to advertiser domain only

Move 2 Summary:

  • Started with: 50 candidates
  • Ended with: 12 nodes, advertiser domain confirmed
  • But we're not done yet...

Move 3: Checkmate (Relationship Validation)

Here's where it gets interesting. We have 12 nodes:

✓ traffic_by_channel|naver|advertiser
✓ channel_revenue|naver|advertiser        ← Wrong parent!
✓ traffic_by_channel|direct_traffic|advertiser
✓ new_members|traffic_count|advertiser   ← Wrong parent!
... 8 more
Enter fullscreen mode Exit fullscreen mode

Problem: Not all are correct. "channel_revenue|naver" isn't about traffic.

Solution: 3-step validation


Step 3.1: Collect the Neighbors

Rule:

  • If 1-depth node → Get children
  • If 2-depth node → Get parents

Why? To understand relationships.

Our 12 nodes are all 2-depth
→ Collect their parents
→ Get 6 unique parent names:

1. traffic_by_channel    ← Traffic related ✓
2. channel_revenue       ← Revenue, not traffic ✗
3. first_purchase        ← Purchase, not traffic ✗
4. repeat_purchase       ← Purchase, not traffic ✗
5. new_members          ← Not about channels ✗
6. existing_members     ← Not about channels ✗
Enter fullscreen mode Exit fullscreen mode

Step 3.2: Filter Parents with GPT

Ask GPT: "Which parents are relevant to 'Naver traffic'?"

Input: 6 parent names
GPT thinking:
  - traffic_by_channel? YES → About traffic AND channels
  - channel_revenue? NO → About revenue, not traffic
  - first_purchase? NO → About purchases
  - new_members? NO → Not related to Naver

Output: ["traffic_by_channel"]
Enter fullscreen mode Exit fullscreen mode

This is CRITICAL: Only 1 parent passed! This means:

  • Keep: traffic_by_channel|naver
  • Remove: channel_revenue|naver
  • Remove: new_members|traffic_count

Step 3.3: Final Validation

Now apply 3 rules:

Rule 1: Leaf nodes (no children) → Accept immediately

if (node.depth == 1 && node.children.isEmpty()) {
    return node;  // Already final
}
Enter fullscreen mode Exit fullscreen mode

Rule 2: Parent nodes (has children) → Return matching children

if (node.depth == 1 && !node.children.isEmpty()) {
    return node.children
        .filter(child -> secondResult.contains(child.name));
}
Enter fullscreen mode Exit fullscreen mode

Rule 3: Child nodes → Validate parent exists in secondResult

if (node.depth == 2) {
    return node.parents
        .anyMatch(parent -> secondResult.contains(parent.name));
}
Enter fullscreen mode Exit fullscreen mode

Applying to our 12 nodes:

Check: traffic_by_channel|naver|advertiser
  → Parent "traffic_by_channel" in secondResult? 
  → ACCEPT

Check: channel_revenue|naver|advertiser
  → Parent "channel_revenue" in secondResult? 
  → REJECT

Check: new_members|traffic_count|advertiser
  → Parent "new_members" in secondResult? 
  → REJECT
Enter fullscreen mode Exit fullscreen mode

Final Result:

[
  "traffic_by_channel|naver|advertiser",
  "traffic_by_channel|direct_traffic|advertiser"
]
Enter fullscreen mode Exit fullscreen mode

Checkmate!


The Complete Game Replay

Let's watch the entire sequence:

Query: "How's my Naver traffic?"

   Move 1: SEARCH
   ├─ 1-depth index → 10 results
   ├─ 2-depth index → 40 results
   └─ Total: 50 candidates

   Move 2: CLASSIFY
   ├─ 2.1: GPT Filter → 3 names
   ├─ 2.2: Node Lookup → 12 nodes
   └─ 2.3: Domain Filter → ["advertiser"]

   Move 3: REFINE
   ├─ 3.1: Collect neighbors → 6 parent names
   ├─ 3.2: GPT Filter → 1 parent name
   └─ 3.3: Validate → 2 final metrics

   RESULT:
   ✓ traffic_by_channel|naver|advertiser
   ✓ traffic_by_channel|direct_traffic|advertiser
Enter fullscreen mode Exit fullscreen mode

Time taken: < 1 second

Cost per query: ~$0.002


Before vs After

Part 1 (Basic Search)

 "What's the conversion rate?"
 → conversion_rate

 "How's my Naver traffic?"
 → Found "traffic_by_channel" but couldn't find "naver"
Enter fullscreen mode Exit fullscreen mode

Accuracy: 95% for flat metrics only


Part 2 (Hierarchical Search)

   "What's the conversion rate?"
   → conversion_rate

   "How's my Naver traffic?"
   → traffic_by_channel|naver|advertiser
   → traffic_by_channel|direct_traffic|advertiser

   "Compare my Google traffic to industry"
   → traffic_by_channel|google|advertiser
   → traffic_by_channel|google|industry
Enter fullscreen mode Exit fullscreen mode

Accuracy: 98% for all queries including hierarchical


What I Learned

1. Use AI Where It Shines

Good: Semantic understanding

// Is "revenue" related to "sales"? → Ask GPT
// Is "naver" relevant to "traffic"? → Ask GPT
Enter fullscreen mode Exit fullscreen mode

Bad: Deterministic logic

// Is this node depth 1 or 2? → Just check node.depth
// Does parent exist? → Just check hashmap
Enter fullscreen mode Exit fullscreen mode

2. Two-Stage Filtering is Magic

One GPT call on 50 candidates → 70% accuracy
Two GPT calls (50 → 6 → final) → 98% accuracy
Enter fullscreen mode Exit fullscreen mode

The second filter on a small, focused set is what makes it work.

3. Data Structure = Foundation

Without MetricForest:

  • Can't collect neighbors
  • Can't validate relationships
  • Can't distinguish contexts

The tree structure makes everything else possible.

4. Filter Domain LATE, Not Early

Wrong approach:

// Filter domain first
nodes = filterByDomain(nodes, "advertiser");
// Now we can't see that both domains existed!
Enter fullscreen mode Exit fullscreen mode

Right approach:

// Let GPT see both domains
nodes = getAllNodes();
domain = askGPT("Which domain does user want?");
// Now filter
nodes = filterByDomain(nodes, domain);
Enter fullscreen mode Exit fullscreen mode

This enables "compare" queries that need both domains.


Real-World Code

Here's the orchestration service that ties it all together:

@Service
@RequiredArgsConstructor
public class MetricSearchService {

    private final VectorSearchService vectorSearch;
    private final StructuredOutputsService gptFilter;
    private final MetricForest metricForest;
    private final DomainClassifier domainClassifier;

    public List<String> search(String query) {
        // Move 1: Search
        var candidates = vectorSearch.search(query, 50);

        // Move 2.1: First filter
        var names = gptFilter.filter(query, candidates);

        // Move 2.2: Resolve to nodes
        var nodes = metricForest.findByNames(names);

        // Move 2.3: Decide domain
        var domain = domainClassifier.classify(query, nodes);

        // Move 3.1-3.2: Collect and filter neighbors
        var neighbors = collectNeighbors(nodes);
        var validParents = gptFilter.filter(query, neighbors);

        // Move 3.3: Final validation
        return validate(nodes, domain, validParents);
    }
}
Enter fullscreen mode Exit fullscreen mode

Clean, simple, effective.


Try These Queries

The system now handles:

Flat metrics (from Part 1)

"What's the conversion rate?"
"Show order count"
Enter fullscreen mode Exit fullscreen mode

Hierarchical metrics (NEW!)

"My Naver traffic?"
"Google revenue?"
"Facebook first purchase rate?"
Enter fullscreen mode Exit fullscreen mode

Comparisons (NEW!)

"Compare my Naver traffic to industry average"
Enter fullscreen mode Exit fullscreen mode

Category queries (NEW!)

"Show all channel traffic"
Enter fullscreen mode Exit fullscreen mode

What's Next?

Current system handles:

  • Flat metrics
  • Hierarchical metrics
  • Domain disambiguation
  • Comparison queries

Coming soon:

  • Time ranges ("last month's Naver traffic")
  • Aggregations ("total revenue by channel")
  • Conversational follow-ups
  • Multi-metric queries

Need Help Implementing RAG?

I help companies integrate AI systems with their existing Spring Boot infrastructure.

Specializing in:

  • Spring Boot + OpenAI integration
  • Custom RAG pipelines
  • E-commerce analytics systems

📧 Contact: [junyoungmoon9857@gmail.com]


Tags: #rag #hierarchical-search #spring-boot #openai #system-design #ai-engineering

Top comments (0)