Introduction
As enterprise adoption of LLMs grows, inference costs, hallucinations, and retrieval inefficiencies are becoming major production challenges.
Traditional vector-based Retrieval-Augmented Generation (RAG) improves grounding, but it still struggles with multi-hop reasoning and relationship-aware retrieval.
For the TigerGraph GraphRAG Inference Hackathon, our team built a complete biomedical GraphRAG inference system that compares:
β’ LLM-only inference
β’ Basic RAG (Vector + LLM)
β’ GraphRAG (Knowledge Graph + LLM)
across latency, token usage, cost, grounded accuracy, and reasoning quality.
Our goal was simple:
Can GraphRAG reduce token usage while maintaining grounded and explainable answers?
Main benchmarking dashboard comparing LLM-only, Basic RAG, and GraphRAG pipelines.
π GitHub Repository:
https://github.com/SIDHANTH-S/graphrag-inference-system
π Live Demo:
http://52.172.150.0:3000/
π₯ Demo Video:
https://drive.google.com/file/d/1CKCUYpRbdjh9qdTHKyu5V2V8J5c0lgRr/view?usp=sharing
Why We Built This
LLMs are powerful, but production AI systems face several challenges:
β’ Hallucinated answers
β’ Expensive context windows
β’ Retrieval noise
β’ Weak explainability
β’ Difficulty performing multi-hop reasoning
Basic RAG pipelines solve part of the problem by retrieving semantically similar chunks from vector databases.
However, semantic similarity alone is often insufficient for domains like biomedicine, where relationships between drugs, diseases, enzymes, and pathways are highly structured.
This is where GraphRAG becomes powerful.
Instead of retrieving only semantically similar text, GraphRAG retrieves entities and relationships from a structured knowledge graph, enabling explainable and relationship-aware reasoning.
System Architecture
Our platform combines:
β’ FAISS for semantic vector retrieval
β’ TigerGraph for structured biomedical relationships
β’ LLM-based entity extraction and answer synthesis
β’ A benchmarking dashboard for evaluation and analytics
End-to-End Architecture
The Three Pipelines
1. LLM-Only Pipeline
This serves as the baseline pipeline.
The user query is sent directly to the LLM without any retrieval or grounding.
Advantages:
β’ Fast
β’ Simple
Limitations:
β’ High hallucination risk
β’ No evidence grounding
β’ Poor explainability
2. Basic RAG Pipeline
The Basic RAG pipeline retrieves semantically similar chunks using FAISS embeddings.
Pipeline flow:
Query
β Embedding generation
β Vector retrieval
β Context injection
β LLM answer generation
Advantages:
β’ Better grounding than pure LLM inference
β’ Reduced hallucinations
Limitations:
β’ Retrieval noise
β’ Weak relationship understanding
β’ Difficulty with multi-hop reasoning
3. GraphRAG Pipeline
The GraphRAG pipeline combines semantic retrieval with structured graph traversal.
The workflow includes:
β’ Query entity extraction
β’ Entity-to-graph resolution
β’ Multi-hop graph expansion in TigerGraph
β’ Evidence fusion
β’ Grounded answer synthesis
This enables the system to retrieve not only semantically similar text, but also biologically meaningful relationships.
Biomedical Dataset and Knowledge Graph Construction
We used PubMed-style biomedical literature from the MedRAG dataset hosted on Hugging Face.
Dataset Source : https://huggingface.co/datasets/MedRAG/pubmed
The ingestion pipeline performs:
β’ Document chunking
β’ Biomedical entity extraction
β’ Relation extraction
β’ Dense embedding generation
β’ TigerGraph vertex/edge creation
β’ FAISS index construction
The system extracts biomedical entities such as:
β’ Drugs
β’ Diseases
β’ Genes
β’ Side effects
β’ Anatomical entities
and stores their relationships in TigerGraph for graph-based retrieval.

High-throughput biomedical ingestion pipeline.
Benchmarking and Evaluation
One of the main goals of this project was not just building GraphRAG, but scientifically evaluating it.
Our dashboard compares:
β’ Token usage
β’ Latency
β’ Estimated API cost
β’ Grounded accuracy
β’ BERTScore
β’ LLM-as-a-Judge evaluation
Example Query: Causal Biomedical Reasoning
One benchmark query asked:
βWhat is the causal path from alloxan to arteriosclerosis?β
Expected reasoning:
Alloxan
β causes
Diabetes
β increases
Arteriosclerosis
The LLM-only pipeline generated a plausible but unverified answer.
Basic RAG retrieved semantically relevant evidence but struggled with structured causal reasoning.
GraphRAG successfully combined semantic retrieval with graph-grounded biomedical relationships to generate a grounded causal explanation with supporting evidence.
Key Results
Across evaluation queries, our GraphRAG pipeline achieved:
β’ ~52% average token reduction
β’ ~58% retrieval token savings
β’ ~61% estimated API cost reduction
β’ Strong grounded biomedical reasoning
β’ Improved explainability through graph traces
One of the most important findings was that GraphRAG reduced unnecessary retrieval context while maintaining answer quality through structured graph relationships.
GraphRAG Benchmark Highlights
β ~52% average token reduction
β ~61% estimated API cost savings
β Grounded biomedical reasoning
β Multi-hop graph-based retrieval
β Explainable evidence-backed answers
What We Learned
One of our biggest takeaways was that GraphRAG is not simply βbetter retrieval.β
Its real strength comes from:
β’ structured reasoning
β’ relationship-aware retrieval
β’ explainability
β’ context compression
This becomes especially valuable in biomedical AI systems, where trust, traceability, and multi-hop reasoning are critical
Conclusion
As enterprise AI systems continue to scale, inference efficiency and explainability will become increasingly important.
This project demonstrated that GraphRAG can reduce retrieval overhead while maintaining grounded and explainable reasoning through structured biomedical knowledge graphs.
The combination of vector retrieval and graph traversal opens exciting possibilities for production-grade GenAI systems that are not only accurate, but also interpretable and cost-efficient.
This project was developed as part of the TigerGraph GraphRAG Inference Hackathon.


Top comments (0)