<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sujatha</title>
    <description>The latest articles on DEV Community by Sujatha (@sujatha).</description>
    <link>https://dev.to/sujatha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3936646%2F52fb4a36-f8f9-41d0-a5ed-d95e76cb1070.png</url>
      <title>DEV Community: Sujatha</title>
      <link>https://dev.to/sujatha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sujatha"/>
    <language>en</language>
    <item>
      <title>lexgraph</title>
      <dc:creator>Sujatha</dc:creator>
      <pubDate>Sun, 17 May 2026 17:51:51 +0000</pubDate>
      <link>https://dev.to/sujatha/lexgraph-4occ</link>
      <guid>https://dev.to/sujatha/lexgraph-4occ</guid>
      <description>&lt;h1&gt;
  
  
  I Benchmarked GraphRAG vs Basic RAG on 70,000 Indian Supreme Court Judgments — Here's What the Numbers Actually Show
&lt;/h1&gt;

&lt;p&gt;Token costs in production RAG systems are exploding. Every quarter, engineering teams are paying more, waiting longer, and hitting context limits faster. The standard answer — Basic RAG with vector embeddings — helps, but it has a fundamental problem: it retrieves &lt;em&gt;similar text&lt;/em&gt;, not &lt;em&gt;structurally connected facts&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;I wanted to test whether GraphRAG actually solves this on a real, messy, domain-specific dataset. So I spent the last few weeks building &lt;strong&gt;LexGraph&lt;/strong&gt; — a three-pipeline benchmark on Indian Supreme Court judgments — for the GraphRAG Inference Hackathon by TigerGraph.&lt;/p&gt;

&lt;p&gt;The results were clear. Let me walk you through exactly what I built and what the data shows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Indian Supreme Court Judgments?
&lt;/h2&gt;

&lt;p&gt;I chose this dataset deliberately. SC judgments are among the most graph-shaped text data that exists in the public domain.&lt;/p&gt;

&lt;p&gt;Every judgment cites earlier cases. Those cases interpret constitutional articles. The judges who authored them have decades of jurisprudential philosophy that evolves across their careers. A question like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Which judges consistently expanded Article 21 rights, and which cases established those precedents?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;...requires traversing &lt;code&gt;Judge → Case → Article → Precedent Chain&lt;/code&gt;. That's 4 hops across different entity types. Vector RAG retrieves chunks that &lt;em&gt;mention&lt;/em&gt; Article 21. GraphRAG traverses the &lt;em&gt;relationship graph&lt;/em&gt; of who decided what, citing whom, interpreting which article.&lt;/p&gt;

&lt;p&gt;The dataset is the &lt;a href="https://huggingface.co/datasets/opennyaiorg/ILDC_multi" rel="noopener noreferrer"&gt;OpenNyai ILDC corpus&lt;/a&gt; — 70,000 Indian Supreme Court judgments, fully public domain. For Round 1, I ingested 6,000 cases (~3.8M tokens, well above the 2M requirement).&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Three pipelines, same 10 queries, same LLM (Gemini 1.5 Flash), same underlying data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User query
    │
    ├── Pipeline 1: LLM-Only
    │       Query → LLM → Answer
    │       No retrieval. Worst-case baseline.
    │
    ├── Pipeline 2: Basic RAG
    │       Query → ChromaDB vector search (top-5 chunks) → LLM → Answer
    │       Industry standard. Semantic similarity retrieval.
    │
    └── Pipeline 3: GraphRAG
            Query → LLM entity extraction
                  → TigerGraph multi-hop traversal (3 hops)
                  → Structured context compression
                  → LLM → Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For GraphRAG I used the &lt;a href="https://github.com/tigergraph/graphrag" rel="noopener noreferrer"&gt;TigerGraph GraphRAG repo&lt;/a&gt; — Path A, deployed via Docker, queried via REST API. The graph schema models &lt;code&gt;Case&lt;/code&gt;, &lt;code&gt;Article&lt;/code&gt;, &lt;code&gt;Act&lt;/code&gt;, &lt;code&gt;Judge&lt;/code&gt;, and &lt;code&gt;Bench&lt;/code&gt; nodes with &lt;code&gt;cites&lt;/code&gt;, &lt;code&gt;references_article&lt;/code&gt;, &lt;code&gt;references_act&lt;/code&gt;, and &lt;code&gt;authored_by&lt;/code&gt; edges.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results (10 Queries, 6,000 Cases, Real Benchmark)
&lt;/h2&gt;

&lt;p&gt;Here's the actual data from &lt;code&gt;eval/results.csv&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pipeline&lt;/th&gt;
&lt;th&gt;Avg Tokens&lt;/th&gt;
&lt;th&gt;Avg Latency&lt;/th&gt;
&lt;th&gt;Avg Cost (USD)&lt;/th&gt;
&lt;th&gt;BERTScore F1&lt;/th&gt;
&lt;th&gt;BERTScore Raw&lt;/th&gt;
&lt;th&gt;Judge Pass Rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LLM Only&lt;/td&gt;
&lt;td&gt;334&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;$0.000021&lt;/td&gt;
&lt;td&gt;0.180&lt;/td&gt;
&lt;td&gt;0.835&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic RAG&lt;/td&gt;
&lt;td&gt;1,732&lt;/td&gt;
&lt;td&gt;4.3s&lt;/td&gt;
&lt;td&gt;$0.000142&lt;/td&gt;
&lt;td&gt;0.310&lt;/td&gt;
&lt;td&gt;0.871&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GraphRAG&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;704&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.000058&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.620&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.891&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;59.4% fewer tokens&lt;/strong&gt; than Basic RAG per query&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1,028 tokens saved&lt;/strong&gt; per query on average&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;59.2% lower cost&lt;/strong&gt; per query ($0.000058 vs $0.000142)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BERTScore F1 2× higher&lt;/strong&gt; than Basic RAG (0.620 vs 0.310)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Equal judge pass rate&lt;/strong&gt; (100% vs 100%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GraphRAG delivers fewer tokens AND better answers. That's the core result.&lt;/p&gt;

&lt;p&gt;The hackathon's bonus thresholds: ≥90% judge pass rate AND ≥0.55 BERTScore F1 rescaled. LexGraph hits both.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the Numbers Look the Way They Do
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why does GraphRAG use fewer tokens than Basic RAG?
&lt;/h3&gt;

&lt;p&gt;Basic RAG sends 5 raw text chunks to the LLM. Each chunk is ~300 words of dense legal prose. Most of it is irrelevant — it's just &lt;em&gt;similar&lt;/em&gt; to the query, not &lt;em&gt;connected&lt;/em&gt; to the answer.&lt;/p&gt;

&lt;p&gt;GraphRAG retrieves a structured relationship summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="n"&gt;Article&lt;/span&gt; &lt;span class="mi"&gt;21&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;referenced&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Maneka&lt;/span&gt; &lt;span class="n"&gt;Gandhi&lt;/span&gt; &lt;span class="n"&gt;v.&lt;/span&gt; &lt;span class="n"&gt;UoI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1978&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
                               &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;authored&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Justice&lt;/span&gt; &lt;span class="n"&gt;P.N.&lt;/span&gt; &lt;span class="n"&gt;Bhagwati&lt;/span&gt;
                                               &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;authored&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Sunil&lt;/span&gt; &lt;span class="n"&gt;Batra&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1978&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
                                               &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;authored&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Francis&lt;/span&gt; &lt;span class="n"&gt;Coralie&lt;/span&gt; &lt;span class="n"&gt;Mullin&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1981&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
             &lt;span class="n"&gt;referenced&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Olga&lt;/span&gt; &lt;span class="n"&gt;Tellis&lt;/span&gt; &lt;span class="n"&gt;v.&lt;/span&gt; &lt;span class="n"&gt;BMC&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1985&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
                               &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;authored&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="n"&gt;Justice&lt;/span&gt; &lt;span class="n"&gt;Y.V.&lt;/span&gt; &lt;span class="n"&gt;Chandrachud&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This compact relational context answers the question precisely. It's 500 tokens instead of 1,600. No padding. No tangential prose.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is BERTScore 2× higher for GraphRAG?
&lt;/h3&gt;

&lt;p&gt;The graph context includes the &lt;em&gt;structural relationships&lt;/em&gt; between entities — which judges wrote which cases, which cases cite which articles. This structural information is exactly what the reference answers contain. So the semantic similarity between GraphRAG's answers and the ground truth is much higher.&lt;/p&gt;

&lt;p&gt;Basic RAG answers with chunks of similar text. The chunks might mention the right cases, but they don't capture the relational structure — why those judges mattered, how the cases connect to each other, what the citation chain shows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is LLM-Only judge pass rate 0%?
&lt;/h3&gt;

&lt;p&gt;The judge model (Mistral-7B) is evaluating factual accuracy against verifiable references. LLM-Only answers come from parametric memory — no retrieval, no corpus grounding. The judge correctly identifies these as "unverifiable without corpus access." The answers often contain the right case names (from training data) but can't be verified as corpus-grounded, so they fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Implementation Decisions That Actually Mattered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. LLM-based entity extraction (biggest quality improvement)
&lt;/h3&gt;

&lt;p&gt;I replaced regex with an LLM call to extract structured entities from every query before graph traversal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;EXTRACT_SYSTEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Extract legal entities from this query. Return ONLY valid JSON:
{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;articles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;21&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;14&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Maneka Gandhi v Union of India&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;acts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Prevention of Money Laundering Act&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;concepts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;right to privacy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;procedural due process&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;judges&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Justice P.N. Bhagwati&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;],
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temporal&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 2010, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: null}
}
No other text. JSON only.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This costs ~100 tokens per query but the traversal quality improvement is significant. Regex misses "Art. 21", "Article 21(1)", and every variation. The LLM handles all of them correctly and also extracts judge names and temporal constraints — which enable much more targeted graph queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 512-word chunk size is the sweet spot
&lt;/h3&gt;

&lt;p&gt;I tested 256, 512, and 1024 word chunks for ChromaDB. The results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;256 words&lt;/strong&gt;: Individual chunks lose legal context. BERTScore: 0.26.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;512 words&lt;/strong&gt;: Captures enough context while keeping retrieval focused. BERTScore: 0.31. ✅&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1024 words&lt;/strong&gt;: Chunks too broad, LLM gets overwhelmed. BERTScore: 0.28.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;512 is the sweet spot for Indian legal text.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The graph visualisation is the storytelling asset
&lt;/h3&gt;

&lt;p&gt;I built an animated D3.js force-directed graph in the dashboard. As GraphRAG traverses the graph, nodes light up in sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Query node (red) → Article nodes (purple) → Case nodes (green) → Judge nodes (blue)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reviewers are skeptical about multi-hop graph traversal until they &lt;em&gt;see&lt;/em&gt; it happening in real time. The animation turns an abstract concept into an immediate visual story.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. TigerGraph connection caching
&lt;/h3&gt;

&lt;p&gt;Early versions created a new TigerGraph connection and token on every query call — adding 1–2 seconds of overhead. The fix: initialise once at pipeline startup and cache the connection object. GraphRAG latency dropped from ~5.8s to ~3.8s — faster than Basic RAG's 4.3s.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Schema re-run safety
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;CREATE VERTEX&lt;/code&gt; without &lt;code&gt;IF NOT EXISTS&lt;/code&gt; guards causes errors on re-runs, which are constant during development. Adding &lt;code&gt;IF NOT EXISTS&lt;/code&gt; to every schema creation statement made the ingest script idempotent and saved a lot of debugging time.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Query Design
&lt;/h2&gt;

&lt;p&gt;The 10 benchmark queries were designed to maximise GraphRAG's structural advantage. The hardest ones:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;q07 — Citation chain with temporal filter:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Trace the citation chain from Maneka Gandhi v. Union of India to cases decided after 2010 that relied on it to expand personal liberty rights."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This requires: find Maneka Gandhi → follow &lt;code&gt;cites&lt;/code&gt; edges forward → filter by &lt;code&gt;year &amp;gt; 2010&lt;/code&gt;. Pure graph query. Impossible with vector search alone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;q08 — Multi-article intersection:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Which Supreme Court judges authored the most judgments interpreting both Article 19 and Article 21 together?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This requires: &lt;code&gt;Judge → authored → Case → references_article → [Article 19 AND Article 21]&lt;/code&gt;. Vector RAG can't count across entities or intersect them by judge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;q10 — Property-filtered citation graph:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Which constitutional bench decisions on reservation policy cite Indra Sawhney, and how did subsequent judges interpret the 50% ceiling rule?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;4 hops with property filters (&lt;code&gt;bench_size &amp;gt;= 5&lt;/code&gt;, &lt;code&gt;topic = reservations&lt;/code&gt;). This is what TigerGraph was built for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dashboard
&lt;/h2&gt;

&lt;p&gt;Works out of the box with mock data — no API keys needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;streamlit run dashboard/app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5 example queries or free-form input&lt;/li&gt;
&lt;li&gt;All 3 pipelines run side-by-side&lt;/li&gt;
&lt;li&gt;Entity pills colour-coded by type (articles purple, cases green, acts coral, judges blue)&lt;/li&gt;
&lt;li&gt;Animated D3 graph traversal&lt;/li&gt;
&lt;li&gt;Token/latency/cost metrics per pipeline&lt;/li&gt;
&lt;li&gt;Full 10-query benchmark tab with BERTScore + judge pass badges&lt;/li&gt;
&lt;li&gt;Session history and CSV export&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set &lt;code&gt;LIVE_MODE=true&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt; to switch to real APIs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Design the graph schema from the queries, not from the data.&lt;/strong&gt; I started with a generic schema and had to refactor when the multi-article intersection query needed a dedicated edge. Always start with the hardest query, work backward to the schema.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Path B would produce stronger results.&lt;/strong&gt; I used Path A (TigerGraph GraphRAG repo as-is). Tuning &lt;code&gt;num_hops&lt;/code&gt;, &lt;code&gt;top_k&lt;/code&gt;, and &lt;code&gt;community_level&lt;/code&gt; per query type would push BERTScore and judge pass rate higher. That's the Round 2 priority.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache BERTScore model loading.&lt;/strong&gt; DeBERTa-xlarge-mnli takes 30–60 seconds to download on first call. Add a warm-up call at benchmark start.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limit handling for concurrent tests.&lt;/strong&gt; 5 parallel queries against free-tier Gemini (15 RPM) hits limits immediately. Add exponential backoff before publishing throughput numbers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Indian legal tech is almost entirely unsolved at the AI layer. Courts publish thousands of judgments per year. Lawyers need precedents. Researchers need citation chains. Students need case summaries.&lt;/p&gt;

&lt;p&gt;The knowledge graph connecting 70 years of Supreme Court history already exists in the data — 70,000 judgments, hundreds of thousands of citations, thousands of constitutional interpretations. It just hasn't been made queryable in a way that preserves the relational structure.&lt;/p&gt;

&lt;p&gt;GraphRAG makes it queryable. At 59% lower token cost than Basic RAG — with a 2× improvement in semantic accuracy — it makes it economically viable at production scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/rynix27/lexgraph
&lt;span class="nb"&gt;cd &lt;/span&gt;lexgraph
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
python generate_data.py
streamlit run dashboard/app.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dashboard works immediately with mock data. To run real benchmarks, add your TigerGraph credentials and run &lt;code&gt;make ingest&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/rynix27/lexgraph" rel="noopener noreferrer"&gt;github.com/rynix27/lexgraph&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Numbers (TL;DR)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Token reduction (GraphRAG vs Basic RAG)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;−59.4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tokens saved per query&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,028&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost saved per query&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.000084&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BERTScore F1 improvement&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;+0.310&lt;/strong&gt; (0.620 vs 0.310)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Judge pass rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dataset&lt;/td&gt;
&lt;td&gt;6,000 SC cases · 3.8M tokens · Round 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bonus thresholds&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Both hit&lt;/strong&gt; ✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built for the &lt;a href="https://github.com/tigergraph/graphrag" rel="noopener noreferrer"&gt;GraphRAG Inference Hackathon by TigerGraph&lt;/a&gt;. Follow for the Round 2 update when we scale to 70,000 cases and 50M tokens.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;#GraphRAGInferenceHackathon #TigerGraph #GraphRAG #LegalTech #Python #LLM #IndianLaw #OpenSource&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>performance</category>
      <category>rag</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
