JunYoungMoon

Posted on Dec 14, 2025 • Edited on Dec 18, 2025

Our RAG system failed to understand KPIs — Part 1: Metric retrieval design

#rag #systemdesign #vectordatabase #backend

How I Built a RAG System That Actually Understands Business Metrics (Part 1: Foundation)

The Problem: When AI Can't Find Your Data

Picture this: Your marketing team asks your AI assistant, "How's this month's revenue?"

The AI responds: "I don't have access to that data."

Frustrating, right? But here's the real problem - you DO have the data. It's sitting in your database, perfectly organized, just waiting to be used. The AI just doesn't know which metrics to look for.

This is the story of how I solved that problem by building a Retrieval-Augmented Generation (RAG) system that bridges the gap between natural language questions and structured business metrics.

Why RAG? Understanding LLM Limitations

Large Language Models like GPT have three fundamental limitations:

Time Constraint: Trained only up to a specific cutoff date
Domain Constraint: No knowledge of your company's internal data
Real-time Constraint: Can't access continuously changing data

RAG solves this by combining the AI's language understanding with your real-time data retrieval system.

The Architecture Overview

Here's what we're building:

User Question 
    ↓
[1] Text Embedding (Vector Search)
    ↓
[2] Structured Outputs (GPT Filtering)
    ↓
[3] Database Query
    ↓
Final Answer

This article covers steps 1-2, which form the foundation of intelligent metric retrieval.

Starting Point: Understanding Our Data Structure

Our Spring Batch process generates metrics in this format:

order_count|advertiser
conversion_rate|advertiser
total_revenue|advertiser
campaign_roas|ad_impressions|advertiser
traffic_by_channel|google|industry
traffic_by_channel|facebook|industry

The problem? This flat structure makes semantic search difficult.

Solution: Restructure into hierarchical maps

Step 1 - Separate by domain (Advertiser vs Industry):

{
  "order_count": [],
  "total_revenue": [],
  "campaign_roas": [
    "ad_impressions",
    "ad_clicks",
    "ad_spend"
  ],
  "traffic_by_channel": [
    "google",
    "facebook",
    "direct"
  ]
}

Step 2 - Create domain classification:

{
  "order_count": "advertiser",
  "conversion_rate": "common",
  "total_revenue": "common",
  "avg_purchase_count": "industry"
}

Step 1: Text Embedding for Semantic Search

The Core Concept: Convert text into numerical vectors that capture semantic meaning.

How It Works

When a user asks "How's the revenue?", we need to find that this relates to the "total_revenue" metric - even though the words don't match exactly.

Process:

Convert question → 1536-dimensional vector
Convert all metrics → 1536-dimensional vectors
Calculate cosine similarity
Extract top candidates

OpenAI Embedding API

@Service
public class EmbeddingService {

    @Value("${openai.api.key}")
    private String apiKey;

    public double[] getEmbedding(String text) {
        Map<String, Object> requestBody = Map.of(
            "model", "text-embedding-3-small",
            "input", text
        );

        HttpHeaders headers = new HttpHeaders();
        headers.setBearerAuth(apiKey);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<Map<String, Object>> request = new HttpEntity<>(requestBody, headers);
        ResponseEntity<Map> response = restTemplate.postForEntity(
            "https://api.openai.com/v1/embeddings",
            request,
            Map.class
        );

        // Extract embedding array from response
        Map<String, Object> data = (Map<String, Object>) 
            ((List<Object>) response.getBody().get("data")).get(0);
        List<Double> embedding = (List<Double>) data.get("embedding");

        return embedding.stream()
            .mapToDouble(Double::doubleValue)
            .toArray();
    }
}

This service generates embeddings using OpenAI's API.

Cosine Similarity Calculation

public class VectorMath {

    public static double cosineSimilarity(double[] vec1, double[] vec2) {
        double dotProduct = 0.0;
        double norm1 = 0.0;
        double norm2 = 0.0;

        for (int i = 0; i < vec1.length; i++) {
            dotProduct += vec1[i] * vec2[i];
            norm1 += vec1[i] * vec1[i];
            norm2 += vec2[i] * vec2[i];
        }

        return dotProduct / (Math.sqrt(norm1) * Math.sqrt(norm2));
    }
}

This calculates semantic similarity between two vectors.

Example Output

Question: "How's this month's revenue?"

Top Matches:
- "total_revenue": 0.89 similarity ✓
- "order_count": 0.72 similarity
- "conversion_rate": 0.45 similarity

Real Example:

Question: "Show me sales performance"

Vector Comparison:
[0.4, 0.1, 0.6, ...] (question)
    vs
[0.3, 0.2, 0.7, ...] (total_revenue) → 89% match
[0.1, 0.5, -0.2, ...] (order_count) → 72% match
[0.2, 0.6, -0.4, ...] (conversion_rate) → 45% match

Result: Extract top 10 candidates

Problem: 10 candidates is still too many for accurate answers. We need more precise filtering.

Step 2: Structured Outputs for Precise Classification

Why Structured Outputs?

Without Structured Outputs:

AI Response: "You should check total_revenue and maybe order_count as well..."

↓ Unpredictable format, hard to parse

With Structured Outputs:

{
  "metrics": ["total_revenue", "order_count"]
}

↓ Clean array, ready to use

The JSON Schema

Map<String, Object> schema = Map.of(
    "type", "object",
    "properties", Map.of(
        "metrics", Map.of(
            "type", "array",
            "items", Map.of("type", "string"),
            "description", "List of relevant metric names"
        )
    ),
    "required", List.of("metrics"),
    "additionalProperties", false
);

This schema forces GPT to return exactly the format we need.

System Prompt Design

String systemPrompt = String.format("""
    You are an analytics assistant for e-commerce advertisers.

    ### Critical Rules
    1) If you don't understand the question → return []
    2) If unrelated to advertising data → return []
    3) Extract ONLY metric names, no explanations
    4) Use exact names from the list below

    ### Available Metrics
    %s
    """, String.join(", ", candidateMetrics));

Clear instructions ensure consistent results.

Java Implementation

@Service
public class MetricClassificationService {

    @Value("${openai.api.key}")
    private String apiKey;

    private final RestTemplate restTemplate;

    public List<String> classifyMetrics(
            String userQuestion, 
            List<String> candidateMetrics) {

        String systemPrompt = buildSystemPrompt(candidateMetrics);

        Map<String, Object> requestBody = Map.of(
            "model", "gpt-4o-mini",
            "messages", List.of(
                Map.of("role", "system", "content", systemPrompt),
                Map.of("role", "user", "content", userQuestion)
            ),
            "response_format", Map.of(
                "type", "json_schema",
                "json_schema", Map.of(
                    "name", "map_metrics",
                    "strict", true,
                    "schema", buildSchema()
                )
            )
        );

        HttpHeaders headers = new HttpHeaders();
        headers.setBearerAuth(apiKey);
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<Map<String, Object>> entity = 
            new HttpEntity<>(requestBody, headers);

        ResponseEntity<Map> response = restTemplate.postForEntity(
            "https://api.openai.com/v1/chat/completions",
            entity,
            Map.class
        );

        return parseResponse(response.getBody());
    }

    private List<String> parseResponse(Map<String, Object> response) {
        try {
            Map<String, Object> choice = (Map<String, Object>) 
                ((List<Object>) response.get("choices")).get(0);
            Map<String, Object> message = 
                (Map<String, Object>) choice.get("message");
            String content = (String) message.get("content");

            ObjectMapper mapper = new ObjectMapper();
            Map<String, Object> data = mapper.readValue(content, Map.class);

            return (List<String>) data.get("metrics");
        } catch (Exception e) {
            log.error("Failed to parse structured output", e);
            return Collections.emptyList();
        }
    }
}

This service handles the complete Structured Outputs workflow.

Real-World Example

Input: "How's this month's revenue?" + 50 candidates

GPT Processing:
- "total_revenue" : revenue mentioned (selected)
- "order_count" : related to revenue (selected)
- "page_views" : not about revenue (rejected)
- "cart_abandonment" : not about revenue (rejected)

Output: ["total_revenue", "order_count"]

Complete Flow Example

Let's see the entire system in action:

Step 1: Vector Search
User: "How's this month's revenue?"
↓
Top 10 candidates:
- total_revenue (0.89)
- order_count (0.78)
- gross_profit (0.76)
- net_revenue (0.75)
- sales_amount (0.74)
- purchase_count (0.71)
- transaction_value (0.69)
- page_views (0.45)
- bounce_rate (0.32)
- conversion_rate (0.28)

Step 2: GPT Structured Filtering
Input: 10 candidates
↓
GPT Analysis:
"The question asks about 'revenue', so:
- total_revenue ✓ (direct match)
- net_revenue ✓ (revenue metric)
- order_count ✗ (count, not revenue)
- page_views ✗ (engagement, not revenue)"
↓
Output: ["total_revenue", "net_revenue"]

Step 3: Database Query
SELECT total_revenue, net_revenue 
FROM metrics 
WHERE period = 'current_month'
↓
Final Answer: "This month's total revenue is $125,000..."

Results So Far

Before RAG:

User: "How's this month's revenue?"
AI: "I don't have access to that data."

After Part 1:

User: "How's this month's revenue?"
System: Successfully identified ["total_revenue", "net_revenue"]
→ Can now query the correct metrics

Success Rate: 95% accuracy for flat metrics like "total_revenue", "order_count"

Current Limitation

This system works well for simple metrics but fails on hierarchical ones:

User: "How's my Google traffic?"

Current System:
- Finds "traffic_by_channel" ✓
- Can't identify the "google" sub-metric ✗

We need: "traffic_by_channel|google|advertiser"

The Problem: We need to search not just for parent metrics, but also navigate their hierarchical structure to find specific child metrics like "google" under "traffic_by_channel".

What's Next?

In Part 2, I'll show you how I solved this using:

MetricForest Data Structure: A tree-based system for hierarchical metrics
Neighbor Node Exploration: Finding related parent-child metric pairs
Two-Stage GPT Filtering: Precise classification at each hierarchy level

Example of what Part 2 solves:

User: "How's my Google traffic?"
System: "traffic_by_channel|google|advertiser"

User: "Compare my Facebook traffic to industry average"
System: 
- "traffic_by_channel|facebook|advertiser"
- "traffic_by_channel|facebook|industry"

Result: 98% accuracy even for complex hierarchical queries.

Need Help Implementing RAG?

I help companies integrate AI systems with their existing Spring Boot infrastructure.

DEV Community