We Built a GraphRAG System Over 14,000 Research Papers!! Here's What We Learned

#graphrag #tigergraph

For the TigerGraph GraphRAG Hackathon, we built GyanCortex — a Q&A system that answers factual and multi-hop questions over 14,247 AI/ML research papers.

The core question: does adding a knowledge graph on top of vector search actually help?

What We Built

Three retrieval pipelines, one benchmark (16 hand-authored questions):

LLM-Only — keyword filter → dump papers into Gemini. Simple baseline.
Hybrid RAG — Qdrant dense + sparse retrieval, cross-encoder reranking, query decomposition for multi-hop.
GraphRAG — everything in Pipeline 2, plus TigerGraph for citation expansion (CITES edges) and topic linking (HAS_TOPIC edges).

Results

Pipeline	Pass Rate	Avg Latency
LLM-Only	31.2%	29s
Hybrid RAG	93.8%	115s
GraphRAG	100%	50s

More accurate and 2.3× faster than pure Hybrid RAG.

Why the Graph Helps

Vector search is good at finding semantically similar papers. It struggles with
papers that are related but phrased differently — exactly what multi-hop
questions need.

TigerGraph let us traverse citation networks and topic clusters to surface papers
the vector index ranked poorly. The one question Hybrid RAG failed was a
multi-hop synthesis question — the graph found the right papers, vector search
didn't.

The graph traversal adds ~2–5s per query. The accuracy gain is worth it.