DEV Community

JunYoungMoon
JunYoungMoon

Posted on

Our RAG system failed to understand KPIs — Part 1: Metric retrieval design

How I Built a RAG System That Actually Understands Business Metrics (Part 1: Foundation)

The Problem: When AI Can't Find Your Data

Picture this: Your marketing team asks your AI assistant, "How's this month's revenue?"

The AI responds: "I don't have access to that data."

Frustrating, right? But here's the real problem - you DO have the data. It's sitting in your database, perfectly organized, just waiting to be used. The AI just doesn't know which metrics to look for.

This is the story of how I solved that problem by building a Retrieval-Augmented Generation (RAG) system that bridges the gap between natural language questions and structured business metrics.

Why RAG? Understanding LLM Limitations

Large Language Models like GPT have three fundamental limitations:

  1. Time Constraint: Trained only up to a specific cutoff date
  2. Domain Constraint: No knowledge of your company's internal data
  3. Real-time Constraint: Can't access continuously changing data

RAG solves this by combining the AI's language understanding with your real-time data retrieval system.

The Architecture Overview

Here's what we're building:

User Question 
    ↓
[1] Text Embedding (Vector Search)
    ↓
[2] Structured Outputs (GPT Filtering)
    ↓
[3] Database Query
    ↓
Final Answer
Enter fullscreen mode Exit fullscreen mode

This article covers steps 1-2, which form the foundation of intelligent metric retrieval.

Starting Point: Understanding Our Data Structure

Our Spring Batch process generates metrics in this format:

order_count|advertiser
conversion_rate|advertiser
total_revenue|advertiser
campaign_roas|ad_impressions|advertiser
traffic_by_channel|google|industry
traffic_by_channel|facebook|industry
Enter fullscreen mode Exit fullscreen mode

The problem? This flat structure makes semantic search difficult.

Solution: Restructure into hierarchical maps

Step 1 - Separate by domain (Advertiser vs Industry):

{
  "order_count": [],
  "total_revenue": [],
  "campaign_roas": [
    "ad_impressions",
    "ad_clicks",
    "ad_spend"
  ],
  "traffic_by_channel": [
    "google",
    "facebook",
    "direct"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Step 2 - Create domain classification:

{
  "order_count": "advertiser",
  "conversion_rate": "common",
  "total_revenue": "common",
  "avg_purchase_count": "industry"
}
Enter fullscreen mode Exit fullscreen mode

Step 1: Text Embedding for Semantic Search

The Core Concept: Convert text into numerical vectors that capture semantic meaning.

How It Works

When a user asks "How's the revenue?", we need to find that this relates to the "total_revenue" metric - even though the words don't match exactly.

Process:

  1. Convert question → 1536-dimensional vector
  2. Convert all metrics → 1536-dimensional vectors
  3. Calculate cosine similarity
  4. Extract top candidates

OpenAI Embedding API

@Service
public class EmbeddingService {

    @Value("${openai.api.key}")
    private String apiKey;

    public double[] getEmbedding(String text) {
        Map<String, Object> requestBody = Map.of(
            "model", "text-embedding-3-small",
            "input", text
        );

        HttpHeaders headers = new HttpHeaders();
        headers.setBearerAuth(apiKey);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<Map<String, Object>> request = new HttpEntity<>(requestBody, headers);
        ResponseEntity<Map> response = restTemplate.postForEntity(
            "https://api.openai.com/v1/embeddings",
            request,
            Map.class
        );

        // Extract embedding array from response
        Map<String, Object> data = (Map<String, Object>) 
            ((List<Object>) response.getBody().get("data")).get(0);
        List<Double> embedding = (List<Double>) data.get("embedding");

        return embedding.stream()
            .mapToDouble(Double::doubleValue)
            .toArray();
    }
}
Enter fullscreen mode Exit fullscreen mode

This service generates embeddings using OpenAI's API.

Cosine Similarity Calculation

public class VectorMath {

    public static double cosineSimilarity(double[] vec1, double[] vec2) {
        double dotProduct = 0.0;
        double norm1 = 0.0;
        double norm2 = 0.0;

        for (int i = 0; i < vec1.length; i++) {
            dotProduct += vec1[i] * vec2[i];
            norm1 += vec1[i] * vec1[i];
            norm2 += vec2[i] * vec2[i];
        }

        return dotProduct / (Math.sqrt(norm1) * Math.sqrt(norm2));
    }
}
Enter fullscreen mode Exit fullscreen mode

This calculates semantic similarity between two vectors.

Example Output

Question: "How's this month's revenue?"

Top Matches:
- "total_revenue": 0.89 similarity ✓
- "order_count": 0.72 similarity
- "conversion_rate": 0.45 similarity
Enter fullscreen mode Exit fullscreen mode

Real Example:

Question: "Show me sales performance"

Vector Comparison:
[0.4, 0.1, 0.6, ...] (question)
    vs
[0.3, 0.2, 0.7, ...] (total_revenue) → 89% match
[0.1, 0.5, -0.2, ...] (order_count) → 72% match
[0.2, 0.6, -0.4, ...] (conversion_rate) → 45% match

Result: Extract top 10 candidates
Enter fullscreen mode Exit fullscreen mode

Problem: 10 candidates is still too many for accurate answers. We need more precise filtering.

Step 2: Structured Outputs for Precise Classification

Why Structured Outputs?

Without Structured Outputs:

AI Response: "You should check total_revenue and maybe order_count as well..."
Enter fullscreen mode Exit fullscreen mode

↓ Unpredictable format, hard to parse

With Structured Outputs:

{
  "metrics": ["total_revenue", "order_count"]
}
Enter fullscreen mode Exit fullscreen mode

↓ Clean array, ready to use

The JSON Schema

Map<String, Object> schema = Map.of(
    "type", "object",
    "properties", Map.of(
        "metrics", Map.of(
            "type", "array",
            "items", Map.of("type", "string"),
            "description", "List of relevant metric names"
        )
    ),
    "required", List.of("metrics"),
    "additionalProperties", false
);
Enter fullscreen mode Exit fullscreen mode

This schema forces GPT to return exactly the format we need.

System Prompt Design

String systemPrompt = String.format("""
    You are an analytics assistant for e-commerce advertisers.

    ### Critical Rules
    1) If you don't understand the question → return []
    2) If unrelated to advertising data → return []
    3) Extract ONLY metric names, no explanations
    4) Use exact names from the list below

    ### Available Metrics
    %s
    """, String.join(", ", candidateMetrics));
Enter fullscreen mode Exit fullscreen mode

Clear instructions ensure consistent results.

Java Implementation

@Service
public class MetricClassificationService {

    @Value("${openai.api.key}")
    private String apiKey;

    private final RestTemplate restTemplate;

    public List<String> classifyMetrics(
            String userQuestion, 
            List<String> candidateMetrics) {

        String systemPrompt = buildSystemPrompt(candidateMetrics);

        Map<String, Object> requestBody = Map.of(
            "model", "gpt-4o-mini",
            "messages", List.of(
                Map.of("role", "system", "content", systemPrompt),
                Map.of("role", "user", "content", userQuestion)
            ),
            "response_format", Map.of(
                "type", "json_schema",
                "json_schema", Map.of(
                    "name", "map_metrics",
                    "strict", true,
                    "schema", buildSchema()
                )
            )
        );

        HttpHeaders headers = new HttpHeaders();
        headers.setBearerAuth(apiKey);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<Map<String, Object>> entity = 
            new HttpEntity<>(requestBody, headers);

        ResponseEntity<Map> response = restTemplate.postForEntity(
            "https://api.openai.com/v1/chat/completions",
            entity,
            Map.class
        );

        return parseResponse(response.getBody());
    }

    private List<String> parseResponse(Map<String, Object> response) {
        try {
            Map<String, Object> choice = (Map<String, Object>) 
                ((List<Object>) response.get("choices")).get(0);
            Map<String, Object> message = 
                (Map<String, Object>) choice.get("message");
            String content = (String) message.get("content");

            ObjectMapper mapper = new ObjectMapper();
            Map<String, Object> data = mapper.readValue(content, Map.class);

            return (List<String>) data.get("metrics");
        } catch (Exception e) {
            log.error("Failed to parse structured output", e);
            return Collections.emptyList();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This service handles the complete Structured Outputs workflow.

Real-World Example

Input: "How's this month's revenue?" + 50 candidates

GPT Processing:
- "total_revenue" : revenue mentioned (selected)
- "order_count" : related to revenue (selected)
- "page_views" : not about revenue (rejected)
- "cart_abandonment" : not about revenue (rejected)

Output: ["total_revenue", "order_count"]
Enter fullscreen mode Exit fullscreen mode

Complete Flow Example

Let's see the entire system in action:

Step 1: Vector Search
User: "How's this month's revenue?"
↓
Top 10 candidates:
- total_revenue (0.89)
- order_count (0.78)
- gross_profit (0.76)
- net_revenue (0.75)
- sales_amount (0.74)
- purchase_count (0.71)
- transaction_value (0.69)
- page_views (0.45)
- bounce_rate (0.32)
- conversion_rate (0.28)

Step 2: GPT Structured Filtering
Input: 10 candidates
↓
GPT Analysis:
"The question asks about 'revenue', so:
- total_revenue ✓ (direct match)
- net_revenue ✓ (revenue metric)
- order_count ✗ (count, not revenue)
- page_views ✗ (engagement, not revenue)"
↓
Output: ["total_revenue", "net_revenue"]

Step 3: Database Query
SELECT total_revenue, net_revenue 
FROM metrics 
WHERE period = 'current_month'
↓
Final Answer: "This month's total revenue is $125,000..."
Enter fullscreen mode Exit fullscreen mode

Results So Far

Before RAG:

User: "How's this month's revenue?"
AI: "I don't have access to that data."
Enter fullscreen mode Exit fullscreen mode

After Part 1:

User: "How's this month's revenue?"
System: Successfully identified ["total_revenue", "net_revenue"]
→ Can now query the correct metrics
Enter fullscreen mode Exit fullscreen mode

Success Rate: 95% accuracy for flat metrics like "total_revenue", "order_count"

Current Limitation

This system works well for simple metrics but fails on hierarchical ones:

User: "How's my Google traffic?"

Current System:
- Finds "traffic_by_channel" ✓
- Can't identify the "google" sub-metric ✗

We need: "traffic_by_channel|google|advertiser"
Enter fullscreen mode Exit fullscreen mode

The Problem: We need to search not just for parent metrics, but also navigate their hierarchical structure to find specific child metrics like "google" under "traffic_by_channel".

What's Next?

In Part 2, I'll show you how I solved this using:

  1. MetricForest Data Structure: A tree-based system for hierarchical metrics
  2. Neighbor Node Exploration: Finding related parent-child metric pairs
  3. Two-Stage GPT Filtering: Precise classification at each hierarchy level

Example of what Part 2 solves:

User: "How's my Google traffic?"
System: "traffic_by_channel|google|advertiser"

User: "Compare my Facebook traffic to industry average"
System: 
- "traffic_by_channel|facebook|advertiser"
- "traffic_by_channel|facebook|industry"
Enter fullscreen mode Exit fullscreen mode

Result: 98% accuracy even for complex hierarchical queries.


Need Help Implementing RAG?

I help companies integrate AI systems with their existing Spring Boot infrastructure.

Specializing in:

  • Spring Boot + OpenAI integration
  • Custom RAG pipelines
  • E-commerce analytics systems

📧 Contact: [junyoungmoon9857@gmail.com]


Tags: #rag #system-design #vectordatabase #backend

Top comments (0)