Harish Kotra (he/him)

Posted on Jun 19

RAG vs HippoRAG: Solving a Detective Mystery with Knowledge Graphs

#ai #programming #productivity #dailybuild2026

A hands-on comparison of vector search retrieval and graph-based Personalized PageRank on a multi-hop reasoning task — running fully locally.

Standard RAG (Retrieval-Augmented Generation) is everywhere. Embed your documents, store them in a vector database, search by cosine similarity, feed the top results to an LLM. It works well for simple fact lookups. But what happens when your query requires connecting information across multiple documents?

"Who had both a motive and the opportunity to steal the pearl necklace?"

This question appears simple, but it requires the system to:

Know who was in the mansion during the theft (D4: Victoria Crane's visit)
Know who had financial troubles (D8: Victoria lost her job)
Connect that the figure in the security footage (D7) could be Victoria

Standard RAG retrieves individual chunks independently. It might get D8 (financial trouble) and D7 (security footage) but has no explicit mechanism to connect them. The LLM has to infer the link from the text alone.

HippoRAG, introduced in this paper, takes a different approach: build a knowledge graph from your documents, then use Personalized PageRank to retrieve passages. Entities in the question act as "seed nodes" that spread probability through the graph, naturally aggregating related evidence.

Let's build both systems from scratch and see where each shines.

The Setup

We created "The Purloined Pearl" — a detective case with 10 witness statements, alibi records, and evidence logs about a stolen $2M necklace. Here's a sample of the evidence:

{
  "id": "D4",
  "title": "Victoria Crane's Visit",
  "text": "Mrs. Blackwood's sister, Victoria Crane, visited the mansion at 8 PM for dinner. She left at 9:30 PM, visibly upset after an argument with Mr. Blackwood."
},
{
  "id": "D8",
  "title": "Victoria's Financial Trouble",
  "text": "Victoria Crane had recently lost her job and was facing foreclosure on her home. Mr. Blackwood had refused to lend her money."
}

Notice that "Victoria Crane" appears in multiple documents — D4, D6, D8, D9, D10. A vector DB stores each document independently. A knowledge graph can explicitly connect all documents mentioning "Victoria Crane" through a shared entity node.

Architecture

Both systems share a common LLM backend and embedding pipeline. The difference is purely in retrieval strategy.

                    ┌─────────────────────┐
                    │   LM Studio Server   │
                    │  (OpenAI-compatible)  │
                    └─────────┬───────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │llm_complete│  │  embed() │   │  embed() │
        └─────┬─────┘   └────┬─────┘   └────┬─────┘
              │              │              │
     ┌────────┴────────┐    │              │
     │                 │    │              │
     ▼                 ▼    ▼              ▼
┌──────────┐   ┌──────────┐   ┌───────────────────┐
│Standard  │   │ OpenIE   │   │  ChromaDB         │
│RAG QA    │   │ Triple   │   │  (Vector Index)    │
│          │   │Extraction│   └───────────────────┘
└──────────┘   └────┬─────┘
                    │
                    ▼
              ┌──────────┐   ┌───────────────────┐
              │Knowledge │   │  PPR Retrieval     │
              │ Graph    │──►│  (NetworkX)        │
              │(NetworkX)│   └─────────┬─────────┘
              └──────────┘            │
                                      ▼
                               ┌──────────┐
                               │HippoRAG  │
                               │QA        │
                               └──────────┘

Building the Systems

The LLM Client

Both systems talk to LM Studio through an OpenAI-compatible wrapper:

def llm_complete(prompt, model="qwen/qwen3.5-9b", max_tokens=2048):
    resp = requests.post(
        "http://127.0.0.1:1234/v1/chat/completions",
        json={"model": model, "messages": [{"role": "user", "content": prompt}],
              "temperature": 0, "max_tokens": max_tokens}
    )
    msg = resp.json()["choices"][0]["message"]
    return msg.get("content", "").strip()

def embed(texts, model="nomic-embed-text-v1.5:2"):
    resp = requests.post(
        "http://127.0.0.1:1234/v1/embeddings",
        json={"input": texts}
    )
    return [d["embedding"] for d in resp.json()["data"]]

This works with LM Studio, Ollama, or any OpenAI-compatible server with a single URL change.

Standard RAG: ChromaDB + Vector Search

Standard RAG is straightforward — embed everything, store in ChromaDB, search by cosine similarity:

def index_docs(docs):
    collection = chromadb_client.create_collection("detective_case")
    embeddings = embed([d["text"] for d in docs])
    collection.add(ids=ids, documents=texts, embeddings=embeddings)
    return collection

def retrieve(collection, query, top_k=3):
    q_emb = embed([query])[0]
    results = collection.query(query_embeddings=[q_emb], n_results=top_k)
    return extract_results(results)

The pain point: each document is an isolated vector. There's no link between D4 (Victoria's visit) and D8 (her financial trouble) even though they describe the same person. The LLM must infer this connection from the raw text.

HippoRAG: Knowledge Graph + Personalized PageRank

HippoRAG adds two steps before retrieval:

Step 1: Open Information Extraction

We prompt a small LLM to extract (subject, relation, object) triples from each document:

TRIPLE_PROMPT = """
Return a JSON array of triples, each with "subject", "relation", "object" keys.

Text: {text}
"""

def extract_triples(text, doc_id):
    raw = llm_complete(TRIPLE_PROMPT.format(text=text),
                       model="google/gemma-4-e4b")
    triples = json.loads(extract_json(raw))
    return [t | {"doc_id": doc_id} for t in triples]

For D4, the LLM might extract:

[
  {"subject": "Victoria Crane", "relation": "visited", "object": "mansion"},
  {"subject": "Victoria Crane", "relation": "left_at", "object": "9:30 PM"},
  {"subject": "Victoria Crane", "relation": "argued_with", "object": "Mr. Blackwood"}
]

Step 2: Build the Knowledge Graph

We use NetworkX to create a graph with two node types:

Entity nodes (entity:Victoria Crane) — people, places, objects
Passage nodes (passage:D4) — the original documents

Edges connect entities to passages they appear in, and entities to other entities via triple relations:

class KnowledgeGraph:
    def add_triple(self, subj, rel, obj, doc_id):
        passage_id = f"passage:{doc_id}"
        self.graph.add_edge(f"entity:{subj}", passage_id)
        self.graph.add_edge(f"entity:{obj}", passage_id)
        self.graph.add_edge(f"entity:{subj}", f"entity:{obj}",
                            triple=f"{subj} --[{rel}]-> {obj}")

This creates a graph like:

entity:Victoria Crane ──mentioned_in── passage:D4
entity:Victoria Crane ──mentioned_in── passage:D8
entity:Victoria Crane ──argued_with── entity:Mr. Blackwood
entity:Victoria Crane ──lost_job── entity:job

Step 3: Personalized PageRank Retrieval

When a query arrives, we extract its entities and use them as seed nodes for Personalized PageRank:

def retrieve_hipporag(graph, question, top_k=3):
    entities = extract_query_entities(question)  # e.g., ["Victoria Crane"]

    personalization = {node: 0.0 for node in graph.graph.nodes()}
    for entity in entities:
        node = f"entity:{entity}"
        if node in personalization:
            personalization[node] = 1.0

    # Normalize and run PPR
    pr = nx.pagerank(graph.graph, alpha=0.5,
                     personalization=personalization)

    # Aggregate scores to passage nodes
    passage_scores = defaultdict(float)
    for node, score in pr.items():
        if node.startswith("passage:"):
            passage_scores[node] += score
        elif node.startswith("entity:"):
            for neighbor in graph.graph.neighbors(node):
                if neighbor.startswith("passage:"):
                    passage_scores[neighbor] += score * 0.5

    return top_k passages by score

The alpha=0.5 damping factor controls how much probability stays at seed nodes vs. propagates through the graph. Lower values = more aggressive spread.

Results: Where Each System Excels

Simple Queries (single-document fact lookups)

Q: "What time did Detective Miller arrive at the mansion?"

Both systems retrieve D1 and answer correctly: "10 PM". Standard RAG does it faster (no graph build needed).

Winner: Standard RAG (speed, simplicity).

Multi-hop Queries (cross-document reasoning)

Q: "Who had both a motive and the opportunity to steal the pearl necklace?"

This requires:

D4: Victoria was at the mansion (opportunity)
D7: A figure was in the hallway at 9:15 PM (opportunity timeline)
D8: Victoria lost her job, Mr. Blackwood refused money (motive)

Standard RAG retrieves documents by embedding similarity. "Motive and opportunity" is a semantic query — it might retrieve D8 (motive), D4 (opportunity), and D7 (footage). But each is retrieved independently; there's no structural reason they should be connected. The LLM must piece it together from the text.

HippoRAG extracts entity "Victoria Crane" from the question. PPR spreads probability from entity:Victoria Crane to all connected passages (D4, D6, D8, D9, D10) and their neighboring entities. The result: all evidence about Victoria surfaces together, regardless of which document it's in.

Winner: HippoRAG (structural evidence aggregation).

Sense-making Queries

Q: "Given the timing of events, the suspect's circumstances, and the physical evidence, who is the most likely thief?"

Standard RAG retrieves the top-3 documents by vector similarity. If D4 (visit), D8 (finances), and D10 (coat) rank highest, D7 (footage) and D9 (photo ID) might be missed.

HippoRAG's PPR naturally distributes probability across the entire subgraph connected to "Victoria Crane", potentially surfacing all relevant documents.

Winner: HippoRAG (broader evidence coverage).

Key Implementation Details

Handling Reasoning Models

Modern LLMs like Qwen and Gemma include chain-of-thought reasoning that can consume output tokens. Two strategies for clean extraction:

Separate models — Use a smaller, non-reasoning model (google/gemma-4-e4b) for structured extraction tasks where you need clean JSON output.
Reasoning content handling — OpenAI-compatible APIs return reasoning in a separate reasoning_content field for supported models:

msg = response["choices"][0]["message"]
content = msg.get("content", "")
if not content and "reasoning_content" in msg:
    content = msg["reasoning_content"]

PPR Damping Factor

The damping factor α controls exploration vs. exploitation:

High (0.85): Probability stays near seed entities — good for precision
Low (0.15): Probability spreads widely — good for recall
Medium (0.50): Balanced — worked best in our tests

Fallback Entity Extraction

If the LLM fails to extract JSON entities from the query, we fall back to a simpler comma-separated extraction:

if not entities:
    raw_keywords = llm_complete("Extract entities as comma-separated list: " + question)
    entities = [e.strip() for e in raw_keywords.split(",") if e.strip()]

Running It Yourself

# Setup
python3 -m venv venv && source venv/bin/activate
pip install chromadb networkx requests streamlit

# Start LM Studio with models loaded:
# - qwen/qwen3.5-9b (QA)
# - google/gemma-4-e4b (extraction)
# - nomic-embed-text-v1.5:2 (embeddings)

# CLI comparison
python run_comparison.py

# Streamlit dashboard
streamlit run streamlit_app.py

Limitations & Future Work

LLM-based triple extraction is slow — Every document requires an LLM call. For 10 documents it's manageable; for 10,000 you'd want a dedicated OpenIE system.
Small graph — Our 57-node, 90-edge graph is trivial. At scale, you'd need to partition the graph or use approximate PPR.
Chunk independence — Each document is a single passage. In production, documents would be chunked, with each chunk as a separate passage node.
No IDF weighting — The real HippoRAG paper weights entities by inverse document frequency. Common entities like "the" or "necklace" would be downweighted.
Deterministic OpenIE — Tools like Stanford OpenIE or SPICE extract triples without an LLM, making the pipeline faster and more predictable.

Standard RAG is fast, simple, and effective for fact lookup. But when your questions require connecting information across documents — the kind of multi-hop reasoning that comes naturally to a human detective — a knowledge graph approach like HippoRAG provides structural advantages that vector similarity alone cannot match.