<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arpan Sharma</title>
    <description>The latest articles on DEV Community by Arpan Sharma (@sarpan).</description>
    <link>https://dev.to/sarpan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940201%2Fbbe98a33-2210-4bf8-b84f-0b8f3ee00876.png</url>
      <title>DEV Community: Arpan Sharma</title>
      <link>https://dev.to/sarpan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sarpan"/>
    <language>en</language>
    <item>
      <title>Building a GraphRAG vs Traditional RAG Benchmarking System on Indian Public Health Literature</title>
      <dc:creator>Arpan Sharma</dc:creator>
      <pubDate>Tue, 19 May 2026 11:54:17 +0000</pubDate>
      <link>https://dev.to/sarpan/building-a-graphrag-vs-traditional-rag-benchmarking-system-on-indian-public-health-literature-1pin</link>
      <guid>https://dev.to/sarpan/building-a-graphrag-vs-traditional-rag-benchmarking-system-on-indian-public-health-literature-1pin</guid>
      <description>&lt;p&gt;I'm building a benchmarking platform to rigorously compare three AI retrieval pipelines on a large corpus of Indian public health research papers from PubMed Central. Here's the architecture, the engineering decisions, and why I think graph-based retrieval is the right approach for this problem — before the benchmark numbers are in.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem in One Sentence
&lt;/h2&gt;

&lt;p&gt;Ask a RAG system: &lt;em&gt;"How does diabetes affect TB treatment outcomes, and what role does HbA1c play?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Vector search returns chunks about diabetes. Chunks about TB. Maybe a chunk about HbA1c. But it has no idea those three things are connected — and that connection is the entire answer.&lt;/p&gt;

&lt;p&gt;That's the problem we set out to benchmark rigorously.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;A benchmarking platform that will run three AI pipelines in parallel on a target corpus of &lt;strong&gt;~9,000+ Indian public health research papers&lt;/strong&gt; from PubMed Central, covering Diabetes, Tuberculosis, Maternal Health, and Malaria. The ingestion pipeline is built; the corpus is being populated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three pipelines, same LLM, same queries:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pipeline&lt;/th&gt;
&lt;th&gt;Retrieval Strategy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM-Only&lt;/td&gt;
&lt;td&gt;No retrieval — raw GPT-4o-mini&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic RAG&lt;/td&gt;
&lt;td&gt;FAISS vector search + cross-encoder reranking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraphRAG&lt;/td&gt;
&lt;td&gt;TigerGraph multi-hop traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every query runs through all three simultaneously. We measure tokens, cost, latency, LLM-as-a-Judge quality scores, and BERTScore F1.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Vector Search Breaks on Multi-Hop Questions
&lt;/h2&gt;

&lt;p&gt;Standard RAG treats every chunk as an island. There's no edge between the &lt;em&gt;HbA1c chunk&lt;/em&gt; and the &lt;em&gt;rifampicin chunk&lt;/em&gt;. If your question requires connecting those two concepts, the retriever can't help you — unless both happen to be semantically close to the query, which for indirect relationships, they often aren't.&lt;/p&gt;

&lt;p&gt;Three failure modes we observed repeatedly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Indirect relationships are invisible&lt;/strong&gt;&lt;br&gt;
Query: &lt;em&gt;"How does rifampicin affect glycemic control in diabetic TB patients?"&lt;/em&gt;&lt;br&gt;
The answer requires connecting rifampicin's CYP enzyme induction to hepatic glucose metabolism. Neither paper individually covers both. Vector search misses the connection entirely.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Entity role confusion&lt;/strong&gt;&lt;br&gt;
A query about MDR-TB treatment in pediatric patients scores adult MDR-TB chunks highly — same keywords, wrong population. The retriever has no concept of &lt;em&gt;role&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Aggregation is impossible&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;"What are the most common comorbidities in Indian TB literature?"&lt;/em&gt; can't be answered from any single chunk. You need corpus-wide aggregation. Vector search gives you the most similar individual chunks, which is not the same thing.&lt;/p&gt;


&lt;h2&gt;
  
  
  Building the Corpus: Pulling From PMC at Scale
&lt;/h2&gt;

&lt;p&gt;We used PubMed's E-utilities API with domain-specific MeSH queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;Bio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Entrez&lt;/span&gt;

&lt;span class="n"&gt;Entrez&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your@email.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_pmids&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domain_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="c1"&gt;# usehistory caches results server-side for batch paging
&lt;/span&gt;    &lt;span class="n"&gt;handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Entrez&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;esearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pmc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;term&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;domain_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;usehistory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retmax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;search_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Entrez&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;web_env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;WebEnv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;query_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;QueryKey&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;search_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;pmids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;start&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_results&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;fetch_handle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Entrez&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;efetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pmc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rettype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retmode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retstart&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retmax&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;webenv&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;web_env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;query_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_key&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Entrez&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fetch_handle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fetch_handle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;pmids&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MedlineCitation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PMID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PubmedArticle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pmids&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The domain query for Tuberculosis looked like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(tuberculosis[MeSH] OR "TB"[tiab] OR "MDR-TB"[tiab])
AND ("India"[Affiliation] OR "Indian"[Affiliation])
AND (epidemiology[MeSH] OR "public health"[tiab] OR "clinical trial"[tiab])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Preprocessing headaches worth knowing about:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;~8% of papers had no abstract — fallback to first 500 tokens of full text&lt;/li&gt;
&lt;li&gt;PMC affiliation strings are free-text and inconsistent (&lt;code&gt;AIIMS&lt;/code&gt;, &lt;code&gt;All India Institute of Medical Sciences&lt;/code&gt;, &lt;code&gt;New Delhi 110029&lt;/code&gt; all need to resolve to the same institution)&lt;/li&gt;
&lt;li&gt;Some papers appear under multiple PMIDs due to PMC versioning — requires deduplication&lt;/li&gt;
&lt;li&gt;Retraction Watch cross-reference is non-optional for a medical corpus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Affiliation filtering&lt;/strong&gt; used multi-pass regex covering explicit country mentions, known Indian institution abbreviations (AIIMS, JIPMER, ICMR, PGIMER, NIMHANS, CMC Vellore), and major city names. False positive rate: ~2-3%.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Knowledge Graph Schema
&lt;/h2&gt;

&lt;p&gt;The schema is the most consequential design decision. Too sparse and traversals return nothing. Too noisy and you get spurious paths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 vertex types:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;Disease&lt;/code&gt;, &lt;code&gt;Treatment&lt;/code&gt;, &lt;code&gt;Biomarker&lt;/code&gt;, &lt;code&gt;Population&lt;/code&gt;, &lt;code&gt;GeographicRegion&lt;/code&gt;, &lt;code&gt;Intervention&lt;/code&gt;, &lt;code&gt;Outcome&lt;/code&gt;, &lt;code&gt;Study&lt;/code&gt;, &lt;code&gt;Institution&lt;/code&gt;, &lt;code&gt;Comorbidity&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10 typed edge types:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Edge&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TREATS&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Treatment → Disease&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ASSOCIATED_WITH&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disease ↔ Disease or Disease ↔ Biomarker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;MEASURED_BY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disease → Biomarker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RISK_FACTOR_FOR&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Biomarker/Population → Disease&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;COMPLICATES&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disease ↔ Disease (bidirectional)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;REPORTS_OUTCOME&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Study → Outcome&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;STUDIED_IN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Study → Population/GeographicRegion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CO_OCCURS_WITH&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Disease ↔ Disease (same study context)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CONDUCTED_BY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Study → Institution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PART_OF&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GeographicRegion → GeographicRegion (hierarchy)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why typed edges matter:&lt;/strong&gt; Early prototypes used a generic &lt;code&gt;related_to&lt;/code&gt; edge for everything. The graph was technically connected but semantically useless — traversal had no way to distinguish "treats" from "is a risk factor for." Every edge also carries a &lt;strong&gt;confidence score from the extraction model&lt;/strong&gt;. Edges below 0.65 are filtered during retrieval. This single quality gate made the biggest difference to answer quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final graph stats:&lt;/strong&gt; ~17,830 vertices, ~142,000 edges, avg vertex degree 8.0, graph diameter ~6 hops.&lt;/p&gt;


&lt;h2&gt;
  
  
  Multi-Hop Retrieval: A Concrete Walkthrough
&lt;/h2&gt;

&lt;p&gt;Query: &lt;em&gt;"What is the impact of diabetes on TB treatment outcomes in India?"&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1 — Entity extraction from query
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;

&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en_core_sci_lg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_query_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;start_char&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;end_char&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;

&lt;span class="c1"&gt;# Returns:
# [
#   {"text": "diabetes", "label": "DISEASE"},
#   {"text": "TB", "label": "DISEASE"},
#   {"text": "treatment outcomes", "label": "OUTCOME"},
#   {"text": "India", "label": "GPE"}
# ]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 2 — GSQL 2-hop traversal in TigerGraph
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;QUERY&lt;/span&gt; &lt;span class="n"&gt;multi_hop_retrieval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SET&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;VERTEX&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;seed_vertices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;FLOAT&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="n"&gt;max_hops&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt; &lt;span class="n"&gt;BiomedicalGraph&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="n"&gt;OrAccum&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;BOOL&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;visited&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;SetAccum&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;EDGE&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_edges&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;SetAccum&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;VERTEX&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_vertices&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;seed_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seed_vertices&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_vertices&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;seed_set&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;FOREACH&lt;/span&gt; &lt;span class="n"&gt;hop&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_hops&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt;
        &lt;span class="n"&gt;seed_set&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
            &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;seed_set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;(:&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;
            &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;confidence_threshold&lt;/span&gt;
              &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;visited&lt;/span&gt;
            &lt;span class="n"&gt;ACCUM&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;visited&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_edges&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_vertices&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
            &lt;span class="n"&gt;POST&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ACCUM&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;visited&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;TRUE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;PRINT&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_vertices&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@@&lt;/span&gt;&lt;span class="n"&gt;retrieved_edges&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Step 3 — What the traversal actually finds
&lt;/h3&gt;

&lt;p&gt;Starting from the &lt;code&gt;Type 2 Diabetes&lt;/code&gt; and &lt;code&gt;Tuberculosis&lt;/code&gt; vertices, 2 hops out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Hop 1 from Diabetes:
  → Tuberculosis [ASSOCIATED_WITH, confidence: 0.91]
  → HbA1c [MEASURED_BY, confidence: 0.97]
  → MDR-TB [RISK_FACTOR_FOR, confidence: 0.74]

Hop 1 from Tuberculosis:
  → Rifampicin [TREATS, confidence: 0.89]
  → DOTS therapy [TREATS, confidence: 0.93]
  → Treatment failure [REPORTS_OUTCOME, confidence: 0.85]

Hop 2 from Rifampicin:
  → HbA1c elevation [ASSOCIATED_WITH, confidence: 0.71]
  → Hepatotoxicity [ASSOCIATED_WITH, confidence: 0.83]

Hop 2 from HbA1c:
  → Treatment failure [RISK_FACTOR_FOR, confidence: 0.78]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Context passed to the LLM: ~380 tokens.&lt;/strong&gt;&lt;br&gt;
Basic RAG for the same query: 5–8 chunks × ~450 tokens = &lt;strong&gt;2,250–3,600 tokens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The graph didn't just return less text — it returned a &lt;em&gt;connected evidence chain&lt;/em&gt; that the LLM could actually reason over.&lt;/p&gt;


&lt;h2&gt;
  
  
  Entity Extraction Pipeline
&lt;/h2&gt;

&lt;p&gt;We used a three-stage pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;quickumls&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scispacy.linking&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;EntityLinker&lt;/span&gt;

&lt;span class="c1"&gt;# Stage 1: NER with scispaCy
&lt;/span&gt;&lt;span class="n"&gt;nlp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spacy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en_core_sci_lg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scispacy_linker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resolve_abbreviations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;linker_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;umls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_and_normalize_entities&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;nlp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;entities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;umls_concepts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kb_ents&lt;/span&gt;  &lt;span class="c1"&gt;# [(cui, score), ...]
&lt;/span&gt;        &lt;span class="n"&gt;top_concept&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;umls_concepts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;umls_concepts&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="n"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;surface_form&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;label_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;umls_cui&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_concept&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;top_concept&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;top_concept&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;top_concept&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;canonical_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;get_canonical_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_concept&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;top_concept&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;ent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;entities&lt;/span&gt;

&lt;span class="c1"&gt;# Stage 2: Relation extraction (fine-tuned BioBERT)
# Stage 3: Confidence scoring + edge filtering at load time
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drug name normalization was the messiest part. &lt;code&gt;metformin&lt;/code&gt;, &lt;code&gt;Metformin HCl&lt;/code&gt;, &lt;code&gt;metformin hydrochloride&lt;/code&gt;, and &lt;code&gt;Glucophage&lt;/code&gt; all need to resolve to the same vertex. UMLS handles most of this, but Indian-specific institutional and geographic terms needed custom patterns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evaluation Design
&lt;/h2&gt;

&lt;p&gt;The benchmark will run &lt;strong&gt;120 queries across 4 categories:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;30 simple factual (&lt;em&gt;"First-line treatment for MDR-TB in India?"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;30 single-hop relational (&lt;em&gt;"Which biomarkers monitor diabetes treatment in Indian studies?"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;35 multi-hop reasoning (&lt;em&gt;"How does diabetes affect TB outcomes and what role does HbA1c play?"&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;25 aggregation/synthesis (&lt;em&gt;"Most common comorbidities in Indian TB literature?"&lt;/em&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;LLM-as-a-Judge&lt;/strong&gt; — GPT-4o will evaluate GPT-4o-mini outputs on 4 rubric dimensions. To mitigate the preference-for-longer-answers bias: randomized presentation order, structured rubric (not open-ended scoring), and cross-validation against human expert ratings on a 30-question subset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BERTScore&lt;/strong&gt; will be computed against expert-written reference answers using &lt;code&gt;microsoft/deberta-xlarge-mnli&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;bert_score&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;bert_score&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;evaluate_with_bertscore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;P&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;F1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bert_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;microsoft/deberta-xlarge-mnli&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;lang&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;precision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;P&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recall&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;R&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;f1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;F1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;item&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  When GraphRAG Wins — and When It Doesn't
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GraphRAG wins clearly on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-hop questions where the answer requires connecting 3+ concepts&lt;/li&gt;
&lt;li&gt;Aggregation queries across the full corpus&lt;/li&gt;
&lt;li&gt;Any query where explainability matters (you get auditable evidence paths)&lt;/li&gt;
&lt;li&gt;Complex biomedical questions with indirect causal chains&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;GraphRAG loses or ties on:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple factual lookups — Basic RAG is equally accurate and faster&lt;/li&gt;
&lt;li&gt;Latency-sensitive applications — graph traversal is slower than ANN lookup; expect higher end-to-end latency&lt;/li&gt;
&lt;li&gt;Low-resource domains — if your entity extraction is noisy, the graph degrades fast&lt;/li&gt;
&lt;li&gt;Setup cost — graph schema design, NER pipeline, TigerGraph infrastructure vs. a FAISS index&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The upfront investment is real. For a focused domain with relationship-dense literature, it pays off. For a general-purpose Q&amp;amp;A system over miscellaneous documents, it probably doesn't.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Summary
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PMC API → Biopython Entrez → Affiliation Filter → scispaCy NER
    → UMLS Normalization → Relation Extraction (BioBERT)
    → TigerGraph GSQL Load Jobs → Knowledge Graph

Query → Entity Extraction → Seed Vertex Lookup
    → GSQL Multi-Hop Traversal → Subgraph Serialization
    → GPT-4o-mini → Answer

Evaluation: LLM-as-a-Judge (GPT-4o) + BERTScore + tiktoken cost accounting
Dashboard: Streamlit + PyVis graph visualization
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;A few things we're actively working on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval&lt;/strong&gt; — using vector search to seed initial entity candidates, then graph traversal to expand. Best of both worlds for queries that sit between the categories.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental graph updates&lt;/strong&gt; — TigerGraph's upsert semantics make this possible; the pipeline isn't fully automated yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query routing&lt;/strong&gt; — classifying incoming queries as "simple factual" vs "multi-hop" to dispatch to the right pipeline automatically. The latency tradeoff makes this worthwhile in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Completing the corpus&lt;/strong&gt; — ingestion pipeline is done; working through the full ~9,000 paper target across all four domains&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
