Tackle High Token Usage with GraphRAG

#devchallenge #llm #performance #rag

Large language models are powerful, but they become expensive and slow when complex questions force them to read too much context. The TigerGraph GraphRAG Inference Hackathon is centered on this exact production issue: token usage keeps increasing, costs go up, latency grows, and context windows get consumed too quickly.
This project was built to address that problem directly. The goal was not just to build a question-answering system, but to show that GraphRAG can reduce token usage while maintaining answer quality by retrieving a smaller and more connected set of evidence than Basic RAG.
Dataset used
The project uses a cybersecurity-focused dataset collected from Wikipedia pages related to computer security and connected cybersecurity topics. Wikipedia's text content is generally reusable under the Creative Commons Attribution-ShareAlike license, provided reuse follows the attribution and share-alike requirements described by Wikipedia.
This made Wikipedia a practical source for the benchmark because it provides a large amount of structured, link-rich text about cybersecurity concepts, threats, tools, incidents, and organizations. Wikipedia articles also naturally contain relationships across entities, which is useful for graph construction and multi-hop retrieval.
The cybersecurity theme worked well for this benchmark because many questions in this domain depend on connected knowledge. A query about ransomware, for example, may need links between attack types, malware families, threat actors, defense techniques, affected industries, and security tools rather than just a few similar paragraphs.
The problem
In a standard LLM setup, answering a cybersecurity question often means sending the model large chunks of background text just to make sure enough context is present. Basic RAG improves this by retrieving semantically similar chunks, but similarity alone does not always capture the real structure of the domain.
That creates a token efficiency problem. When retrieval is broad, the prompt grows larger because the system keeps adding chunks to avoid missing something important. Over time, that increases token usage, cost, and latency.
What we built
The solution follows the hackathon's required benchmark format and compares three pipelines on the same cybersecurity dataset and the same set of queries:
LLM-Only: the model answers directly with no retrieval.
Basic RAG: the system retrieves semantically similar Wikipedia chunks from a vector index and sends them to the LLM.
GraphRAG: the system turns the same cybersecurity knowledge into entities and relationships, retrieves graph-connected evidence, and passes a more focused context to the LLM.
This setup lets the benchmark measure the real tradeoff. Instead of making a vague claim that GraphRAG is better, the dashboard compares token usage, latency, cost, and answer quality side by side.
How high token usage was reduced
The main change was in retrieval. Instead of stuffing the prompt with many chunks that merely looked relevant, the GraphRAG pipeline aimed to retrieve only the most connected facts needed to answer the question. TigerGraph describes GraphRAG as using entities, relationships, and multi-hop reasoning to build focused context for the LLM.
In the cybersecurity dataset, this means retrieval can move through relationships such as threat type to malware family, malware family to attack technique, attack technique to defense method, and defense method to affected system. That path-based retrieval is often more precise than simply collecting the top semantically similar chunks.
Because of that, the final prompt can stay smaller. The model receives a tighter evidence package instead of a broad text dump, which directly targets the hackathon's main goal of token reduction with maintained accuracy.
Why GraphRAG fits cybersecurity
Cybersecurity is not just a collection of isolated definitions. It is a network of connected concepts such as vulnerabilities, exploits, malware types, threat actors, targets, tools, and mitigations.
That makes it a strong GraphRAG domain. Basic RAG can find passages that mention the same words, but GraphRAG can preserve the relationships between the concepts and use those relationships to retrieve more relevant support with fewer unnecessary chunks.
This is especially useful for multi-step questions where the answer depends on how several concepts connect. In those cases, graph-based retrieval can be more selective than standard vector retrieval and therefore more token-efficient.
Evaluation
Every benchmark question is run through all three pipelines using the same Wikipedia cybersecurity corpus. The comparison dashboard then shows the answer from each pipeline along with tokens used, latency, cost per query, and answer quality.
The hackathon requires answer quality to be checked using LLM-as-a-Judge and BERTScore. It also states that token reduction only matters if GraphRAG maintains or improves accuracy compared with the Basic RAG baseline.
That rule is important because the goal is not to reduce tokens by dropping useful evidence. The goal is to reduce waste while still answering correctly.
What success looks like
A successful outcome for this project means GraphRAG answers cybersecurity questions with fewer prompt tokens than Basic RAG while preserving answer quality and keeping latency practical. That is the exact problem the benchmark is meant to test in a realistic way.
The broader lesson is that high token usage is often not only a model problem but also a retrieval design problem. When the retrieval layer becomes more structured and relationship-aware, the LLM can do the same job with less context and less waste.
Final thought
This project uses cybersecurity knowledge from Wikipedia to show that GraphRAG can attack one of the biggest pain points in production AI systems: too many tokens for every serious question. By replacing broad chunk retrieval with graph-guided evidence selection, the system aims to make LLM-based question answering cheaper, faster, and more focused without sacrificing quality

DEV Community

Tackle High Token Usage with GraphRAG

Top comments (0)