DEV Community

Kavyanjali
Kavyanjali

Posted on

Building a Biomedical GraphRAG Inference System: Comparing LLM-Only, Basic RAG, and GraphRAG Pipelines

Introduction

As enterprise adoption of LLMs grows, inference costs, hallucinations, and retrieval inefficiencies are becoming major production challenges.

Traditional vector-based Retrieval-Augmented Generation (RAG) improves grounding, but it still struggles with multi-hop reasoning and relationship-aware retrieval.

For the TigerGraph GraphRAG Inference Hackathon, our team built a complete biomedical GraphRAG inference system that compares:

β€’ LLM-only inference
β€’ Basic RAG (Vector + LLM)
β€’ GraphRAG (Knowledge Graph + LLM)

across latency, token usage, cost, grounded accuracy, and reasoning quality.

Our goal was simple:

Can GraphRAG reduce token usage while maintaining grounded and explainable answers?

Main benchmarking dashboard comparing LLM-only, Basic RAG, and GraphRAG pipelines.

πŸ”— GitHub Repository:
https://github.com/SIDHANTH-S/graphrag-inference-system

🌐 Live Demo:
http://52.172.150.0:3000/

πŸŽ₯ Demo Video:
https://drive.google.com/file/d/1CKCUYpRbdjh9qdTHKyu5V2V8J5c0lgRr/view?usp=sharing

Why We Built This

LLMs are powerful, but production AI systems face several challenges:

β€’ Hallucinated answers
β€’ Expensive context windows
β€’ Retrieval noise
β€’ Weak explainability
β€’ Difficulty performing multi-hop reasoning

Basic RAG pipelines solve part of the problem by retrieving semantically similar chunks from vector databases.

However, semantic similarity alone is often insufficient for domains like biomedicine, where relationships between drugs, diseases, enzymes, and pathways are highly structured.

This is where GraphRAG becomes powerful.

Instead of retrieving only semantically similar text, GraphRAG retrieves entities and relationships from a structured knowledge graph, enabling explainable and relationship-aware reasoning.

System Architecture

Our platform combines:

β€’ FAISS for semantic vector retrieval
β€’ TigerGraph for structured biomedical relationships
β€’ LLM-based entity extraction and answer synthesis
β€’ A benchmarking dashboard for evaluation and analytics

End-to-End Architecture

The Three Pipelines

1. LLM-Only Pipeline

This serves as the baseline pipeline.

The user query is sent directly to the LLM without any retrieval or grounding.

Advantages:
β€’ Fast
β€’ Simple

Limitations:
β€’ High hallucination risk
β€’ No evidence grounding
β€’ Poor explainability

2. Basic RAG Pipeline
The Basic RAG pipeline retrieves semantically similar chunks using FAISS embeddings.

Pipeline flow:

Query
β†’ Embedding generation
β†’ Vector retrieval
β†’ Context injection
β†’ LLM answer generation

Advantages:
β€’ Better grounding than pure LLM inference
β€’ Reduced hallucinations

Limitations:
β€’ Retrieval noise
β€’ Weak relationship understanding
β€’ Difficulty with multi-hop reasoning

3. GraphRAG Pipeline

The GraphRAG pipeline combines semantic retrieval with structured graph traversal.

The workflow includes:

β€’ Query entity extraction
β€’ Entity-to-graph resolution
β€’ Multi-hop graph expansion in TigerGraph
β€’ Evidence fusion
β€’ Grounded answer synthesis

This enables the system to retrieve not only semantically similar text, but also biologically meaningful relationships.

Biomedical Dataset and Knowledge Graph Construction
We used PubMed-style biomedical literature from the MedRAG dataset hosted on Hugging Face.
Dataset Source : https://huggingface.co/datasets/MedRAG/pubmed

The ingestion pipeline performs:

β€’ Document chunking
β€’ Biomedical entity extraction
β€’ Relation extraction
β€’ Dense embedding generation
β€’ TigerGraph vertex/edge creation
β€’ FAISS index construction

The system extracts biomedical entities such as:

β€’ Drugs
β€’ Diseases
β€’ Genes
β€’ Side effects
β€’ Anatomical entities

and stores their relationships in TigerGraph for graph-based retrieval.


High-throughput biomedical ingestion pipeline.

Benchmarking and Evaluation

One of the main goals of this project was not just building GraphRAG, but scientifically evaluating it.

Our dashboard compares:

β€’ Token usage
β€’ Latency
β€’ Estimated API cost
β€’ Grounded accuracy
β€’ BERTScore
β€’ LLM-as-a-Judge evaluation

Example Query: Causal Biomedical Reasoning
One benchmark query asked:

β€œWhat is the causal path from alloxan to arteriosclerosis?”

Expected reasoning:

Alloxan
β†’ causes
Diabetes
β†’ increases
Arteriosclerosis

The LLM-only pipeline generated a plausible but unverified answer.

Basic RAG retrieved semantically relevant evidence but struggled with structured causal reasoning.

GraphRAG successfully combined semantic retrieval with graph-grounded biomedical relationships to generate a grounded causal explanation with supporting evidence.

Key Results

Across evaluation queries, our GraphRAG pipeline achieved:

β€’ ~52% average token reduction
β€’ ~58% retrieval token savings
β€’ ~61% estimated API cost reduction
β€’ Strong grounded biomedical reasoning
β€’ Improved explainability through graph traces

One of the most important findings was that GraphRAG reduced unnecessary retrieval context while maintaining answer quality through structured graph relationships.

GraphRAG Benchmark Highlights

βœ” ~52% average token reduction
βœ” ~61% estimated API cost savings
βœ” Grounded biomedical reasoning
βœ” Multi-hop graph-based retrieval
βœ” Explainable evidence-backed answers

What We Learned

One of our biggest takeaways was that GraphRAG is not simply β€œbetter retrieval.”

Its real strength comes from:
β€’ structured reasoning
β€’ relationship-aware retrieval
β€’ explainability
β€’ context compression

This becomes especially valuable in biomedical AI systems, where trust, traceability, and multi-hop reasoning are critical

Conclusion

As enterprise AI systems continue to scale, inference efficiency and explainability will become increasingly important.

This project demonstrated that GraphRAG can reduce retrieval overhead while maintaining grounded and explainable reasoning through structured biomedical knowledge graphs.

The combination of vector retrieval and graph traversal opens exciting possibilities for production-grade GenAI systems that are not only accurate, but also interpretable and cost-efficient.
This project was developed as part of the TigerGraph GraphRAG Inference Hackathon.

Top comments (0)