<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sudharsan@7621</title>
    <description>The latest articles on DEV Community by Sudharsan@7621 (@kamisettysba2027source).</description>
    <link>https://dev.to/kamisettysba2027source</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936802%2F3239922a-a027-4fdf-9130-3cdf535bc725.png</url>
      <title>DEV Community: Sudharsan@7621</title>
      <link>https://dev.to/kamisettysba2027source</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kamisettysba2027source"/>
    <language>en</language>
    <item>
      <title>Does Graph Beat Tokens? Engineering a GraphRAG Benchmark on TigerGraph</title>
      <dc:creator>Sudharsan@7621</dc:creator>
      <pubDate>Sun, 17 May 2026 20:13:47 +0000</pubDate>
      <link>https://dev.to/kamisettysba2027source/does-graph-beat-tokens-engineering-a-graphrag-benchmark-on-tigergraph-1omg</link>
      <guid>https://dev.to/kamisettysba2027source/does-graph-beat-tokens-engineering-a-graphrag-benchmark-on-tigergraph-1omg</guid>
      <description>&lt;p&gt;LLM token costs explode at scale. The TigerGraph GraphRAG Inference Hackathon&lt;br&gt;
poses one question: can a knowledge graph make inference cheaper &lt;em&gt;without&lt;/em&gt;&lt;br&gt;
losing answer quality? I built three pipelines on 95 PubMed papers (~1M tokens)&lt;br&gt;
about Type-2-Diabetes drug interactions and let the numbers decide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key design choice:&lt;/strong&gt; all three pipelines use the &lt;em&gt;same&lt;/em&gt; LLM (gpt-4o-mini).&lt;br&gt;
So every difference is the retrieval architecture, not the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Pipelines
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LLM-Only&lt;/strong&gt; — prompt in, answer out, no retrieval. The floor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Basic RAG&lt;/strong&gt; — FAISS vector search, top-5 chunks dumped into the prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GraphRAG&lt;/strong&gt; — TigerGraph knowledge graph (33,969 entities, 1.75M
relationships, 2,459 community summaries). Path B: I customized the repo.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Headline Result
&lt;/h2&gt;

&lt;p&gt;On &lt;strong&gt;3-hop reasoning&lt;/strong&gt; — questions requiring connections across documents,&lt;br&gt;
exactly what graphs are built for:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pipeline&lt;/th&gt;
&lt;th&gt;3-hop Accuracy&lt;/th&gt;
&lt;th&gt;Tokens/Query&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM-Only&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;td&gt;526&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic RAG&lt;/td&gt;
&lt;td&gt;60%&lt;/td&gt;
&lt;td&gt;1,424&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GraphRAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;438&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;GraphRAG matches the best accuracy at the lowest token cost — 69% fewer&lt;br&gt;
tokens than Basic RAG on the reasoning that matters. Across all 30 questions,&lt;br&gt;
GraphRAG cut tokens ~95% vs Basic RAG.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Honest full picture:&lt;/strong&gt; the architectures are complementary. GraphRAG&lt;br&gt;
dominates multi-hop synthesis (90% vs 60%); Basic RAG leads precise&lt;br&gt;
single-fact lookup (80% vs 50%). I'm not claiming a clean sweep — I'm&lt;br&gt;
showing where graph structure wins, and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Engineering (this is the real story)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lever 1 — Chunking strategy
&lt;/h3&gt;

&lt;p&gt;Reading the repo source, only &lt;code&gt;semantic&lt;/code&gt; and &lt;code&gt;characters&lt;/code&gt; chunkers are wired&lt;br&gt;
into ingestion. I kept &lt;strong&gt;semantic&lt;/strong&gt; as baseline for a specific reason:&lt;br&gt;
entity-relationship extraction needs a &lt;em&gt;complete fact inside one chunk&lt;/em&gt;.&lt;br&gt;
Semantic splitting keeps "drug A increases drug B's AUC 2-fold" intact so&lt;br&gt;
the extractor captures the relationship.&lt;/p&gt;

&lt;p&gt;I tested fixed-size chunking in an &lt;strong&gt;isolated experiment&lt;/strong&gt; — a separate&lt;br&gt;
graph so the validated baseline was never at risk. CharacterChunker&lt;br&gt;
(1000 chars / 200 overlap) produced 8,689 chunks vs the baseline's 4,083&lt;br&gt;
(2.1× more), proving chunking materially reshapes the graph. But blind&lt;br&gt;
character cuts fragment the precise facts I was trying to fix. The run was&lt;br&gt;
interrupted by a resource limit before completion — reported honestly as a&lt;br&gt;
documented finding and future work, not a finished claim.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lever 2 — Retrieval (single-variable ablations)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hop depth:&lt;/strong&gt; num_hops=1 beat num_hops=2 — better BERTScore &lt;em&gt;and&lt;/em&gt; fewer
tokens. Two hops wandered into tangential context that diluted precision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Method:&lt;/strong&gt; tested hybrid / community / similarity. I hypothesized
similarity would win fact-lookup — it didn't; hybrid was best or tied
everywhere. Kept as an honest negative result.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Lever 3 — Prompt design
&lt;/h3&gt;

&lt;p&gt;GraphRAG over-abstained ("no information available" when the answer was in&lt;br&gt;
the graph). I traced it to a prompt clause forbidding synthesis, surgically&lt;br&gt;
swapped only that clause, kept the load-bearing JSON-format line&lt;br&gt;
byte-identical. Measured: BERTScore 0.8648 → 0.8623 — no gain, still&lt;br&gt;
abstained. Reverted. Conclusion: the abstention is a graph-retrieval&lt;br&gt;
limitation, not prompt wording.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discipline throughout:&lt;/strong&gt; change one variable, measure, keep only what the&lt;br&gt;
data supports. Every bundled change broke something.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reproducible
&lt;/h2&gt;

&lt;p&gt;Public repo, live dashboard, all 30 questions and scores visible. Nothing&lt;br&gt;
hidden.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/kamisettysba2027-source/graphrag-inference-hackathon" rel="noopener noreferrer"&gt;https://github.com/kamisettysba2027-source/graphrag-inference-hackathon&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Live dashboard: &lt;a href="https://graphrag-inference-hackathon-pdtqvncbcctvdlqsacqlkr.streamlit.app/" rel="noopener noreferrer"&gt;https://graphrag-inference-hackathon-pdtqvncbcctvdlqsacqlkr.streamlit.app/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built on the &lt;a href="https://github.com/tigergraph/graphrag" rel="noopener noreferrer"&gt;TigerGraph GraphRAG repo&lt;/a&gt;&lt;br&gt;
for #GraphRAGInferenceHackathon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tigergraph</category>
      <category>graphrag</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
