Seenivasa Ramadurai

Posted on Apr 12

From RAG to Knowledge Graphs Why the Agent Era Is Redefining AI Architecture

#ai #architecture #llm #rag

Introduction

One question is dominating AI architecture discussions right now. We already built RAG. Everyone is talking about GraphRAG. Should we move?

On the surface, it looks like a standard tech upgrade cycle. Underneath, something more fundamental is happening a debate about how we represent knowledge, how we retrieve it, and how we expect machines to reason over it.

For the last two years, the industry followed a predictable path. We started with raw Large Language Models, quickly realized they could hallucinate with terrifying confidence, and turned to RAG(Retrieval Augmented Generation) to ground them in real data. It was a genuine breakthrough. Suddenly you could connect a model to your PDFs, internal portals, wikis, and live databases without the nightmare of constant retraining. For most teams, it felt like magic.

Then the ceiling arrived.

Teams started noticing that RAG was useful, but not intelligent. It could find relevant text. It couldn't understand how things actually connected. This gap between finding information and understanding relationships is what drove the industry toward Knowledge Graphs and GraphRAG.

Now, just as that conversation is picking up steam, another shift is already underway the Agentic AI. Autonomous agents, dynamic tool use, and multi-step orchestration are changing the very definition of what retrieval even means. It's no longer about fetching facts it's about giving machines the cognitive infrastructure to solve genuinely complex problems.

Before you commit to your next infrastructure pivot, let's slow down and answer the questions that actually matter.

What exactly is RAG, and where does it fail?
Why did GraphRAG emerge, and what is the real cost of building it?
In a world of agents, do we still need it the same way?

This blog is the roadmap for that journey.

The Problem RAG Solved (and Why It Mattered So Much)

A large language model is trained on enormous amounts of text. That gives it remarkable linguistic ability and broad general knowledge but it comes with a hard constraint. The model doesn't know your enterprise data, your latest reports, your private documents, or the product changes that landed last Tuesday. And if it doesn't know something? It may still generate a confident, fluent answer anyway. That's hallucination, and it's not a bug you can patch it's structural.

RAG solves this by moving knowledge outside the model and fetching it dynamically at query time.

The flow is straightforward:

Ingest your documents PDFs, emails, contracts, meeting notes, tickets, whatever lives in your knowledge ecosystem.
Chunk the text into smaller, searchable units (chunk size matters enormously too small and you lose context, too large and retrieval gets noisy).
Embed each chunk using an embedding model, converting text into dense numerical vectors that capture semantic meaning.
Index those vectors in a vector database FAISS, Qdrant, Pinecone, Chroma, Weaviate, or Milvus are common choices.
At query time, embed the user's question, find the most semantically similar chunks, inject them into the prompt, and let the LLM answer from real evidence.

It changed practical AI development. It gave teams a way to build grounded document assistants, enterprise search tools, Q&A bots, and domain specific copilots without retraining foundation models. And it introduced an architectural principle that remains one of the most important ideas in modern AI systems.

The model doesn't need to contain all knowledge internally, if we can retrieve the right knowledge externally at the right moment.
That idea isn't going away. But it has limits.

Where RAG Starts Struggling

The challenge with RAG isn't that it's bad. The challenge is that it's optimized for similarity, not structure.

That difference turns out to matter a great deal in practice.
Imagine someone asks a question Which projects are affected by the recent leadership changes?

A classic RAG system might retrieve a chunk about a new VP appointment, another about a project roadmap, another about budget realignments, and another about team restructuring. Each chunk could be individually relevant. But the system has no natural way to understand that the VP change affects Project A through a specific reporting line, or that the budget change flows to Project B because of a procurement dependency. RAG retrieved similar text. It didn't model how things connect.

This plays out in three structural pain points that no amount of implementation tuning fully resolves.

Relationships Don't Live in Paragraphs

Real world knowledge is relational. Drugs interact with proteins. Engineers depend on infrastructure. Transactions flow through accounts. Court rulings reference precedents. Products belong to supply chains. None of this structure lives cleanly in a paragraph and vector similarity can't reconstruct it from loose chunks.

More Context Isn't the Same as Better Context

As context windows have grown from 4K to 128K to 1M tokens, the tempting fix has been just send more chunks. But flooding the LLM with additional text doesn't compensate for missing structure. Research has consistently shown that LLMs are sensitive to redundant and noisy context more text can actively degrade answer quality when the signal is buried in noise. A 2023 paper from Stanford memorably called this the lost-in-the-middle problem models perform worse when the relevant information is buried inside long contexts, not positioned at the edges.

Local Relevance ≠ Global Understanding

RAG surfaces locally relevant text fragments. It doesn't provide a holistic view of a domain, network, or system. This becomes a serious limitation in scientific literature review, financial relationship analysis, legal precedent mapping, biomedical research, and any domain where the value lies not just in what's said, but in how facts connect.

At some point, teams hit a realization if the problem isn't finding relevant text but navigating connected knowledge, then text chunks might be the wrong unit of retrieval entirely.

What a Knowledge Graph Actually Is

A Knowledge Graph is a way of representing knowledge as explicit entities and relationships — rather than as paragraphs and hoping the model infers structure later.
At the heart of this is a simple but powerful idea called a triplet

(Subject → Relationship → Object)

For example:

(Ram → leads → Project A)
(Project A → depends_on → Payments Platform v2)
(Payments Platform v2 → owned_by → FinTech Division)
(FinTech Division → reports_to → CTO Office)

Notice what just happened. We didn't store paragraphs. We stored meaning in a form the system can traverse, query, and reason over.

Now we can ask:

What does the CTO Office indirectly own? and follow the chain. We can ask.

What breaks if the Payments Platform is delayed? and trace the dependencies.

We've moved from retrieving information to navigating knowledge.

Knowledge graphs are stored as directed graphs nodes are entities, edges are typed relationships. This structure enables graph traversal algorithms, multi hop queries, shortest path analysis, and network centrality calculations none of which are available in a flat vector index.

Where Knowledge Graphs Already Live

Knowledge graphs aren't a new invention. Google has used its Knowledge Graph and Wikidata the structured data backbone of Wikipedia contains over 100 million items. The biomedical knowledge graph OpenBioLink contains millions of interactions between genes, proteins, diseases, and drugs. LinkedIn's economic graph models relationships between professionals, companies, skills, and jobs at scale. These aren't prototypes they're production systems handling billions of queries.

What GraphRAG Is and How It Works

GraphRAG popularized significantly by a 2024 Microsoft Research paper is a framework that uses a knowledge graph as the retrieval layer for an LLM, rather than a flat vector index.

The core intuition: instead of retrieving semantically similar text chunks, retrieve connected knowledge from a graph, then provide that richer context to the model.
GraphRAG typically involves three stages.

Stage 1: Graph Based Indexing

You build and index a graph. This might be an existing open knowledge graph (Wikidata, ConceptNet, UMLS for medical domains), a domain-specific proprietary graph, or a graph you construct from your own corpus using extraction pipelines. Proper indexing matters retrieval can use text descriptions, graph topology, embeddings over graph structure, hybrid schemes, or all of the above.

Stage 2: Graph Guided Retrieval

When a user asks a question, the system identifies relevant entities, then traverses relationships, paths, and subgraphs to assemble a richer answer context. This may involve entity linking, k-hop neighborhood expansion, Personalized PageRank, community detection, or LLM-directed graph traversal. The Microsoft GraphRAG paper specifically introduced a community summarization approach using graph algorithms to identify clusters of related entities and pre generating summaries which dramatically improved performance on global sense making tasks like, What are the major themes in this document corpus?

Stage 3: Graph Enhanced Generation

Once relevant graph knowledge is identified, it's translated into a form the LLM can consume: raw triplets, adjacency lists, natural language descriptions of paths, or structured summaries. This translation step is critical and often underestimated LLMs are sequence models trained on text, not graph traversal engines. The quality of this bridge between graph structure and language generation largely determines whether GraphRAG actually outperforms RAG in practice.

How Knowledge Graphs Get Built: The Extraction Pipeline

Before you can run GraphRAG, you need a graph. Building one from your own data means running an information extraction pipeline over your corpus.

The two core tasks are:

Named Entity Recognition (NER): Identifying entities in text people, organizations, products, locations, medical conditions, financial instruments, events, and whatever entity types your domain requires.

Relation Extraction (RE): Identifying the relationship between those entities works_at, acquired, causes, located_in, depends_on, cited_by.

Historically, this required expensive annotated training data and domain-specific supervised models. Modern LLMs have changed the economics significantly. You can prompt a model to extract entities and relationships from a document in a single pass, using in context examples to define your schema.

Two Practical Approaches

1. Custom LLM pipelines:

You design prompts that specify exactly what entity types and relationship types to extract, validate the output, handle edge cases, and write the results to your graph database often Neo4j, which uses the Cypher query language. This gives you fine-grained domain control but requires serious engineering effort: output validation, error handling, entity disambiguation (is OpenAI the same as Open AI?), conflict resolution, and ongoing maintenance. For enterprise grade graphs that become core assets, this is usually the right investment.

2. LangChain GraphTransformers / LlamaIndex graph tools

Frameworks like LangChain's LLMGraphTransformer abstract much of this into a few lines of code. You hand it documents and get back structured graph documents you can load into a graph store. This is excellent for prototyping and early validation you can have a working graph in hours, not weeks. The tradeoff is less control over extraction quality and ontology design.

A pragmatic approach: use LangChain tools to validate the concept and understand the data, then invest in a custom pipeline when the graph becomes a production dependency.

The Real Costs of GraphRAG (The Part Most Bloggers Skip)

Here's where most GraphRAG enthusiasm runs ahead of reality. The framework is genuinely powerful but it carries costs that compound at scale. Teams that discover these after committing to the architecture tend to have strong opinions about them.

1. Compute Cost Is a Design Constraint, Not a Detail

Building a graph from a large corpus means running LLM-based extraction over every document often multiple passes for NER, RE, and disambiguation. At scale, this gets expensive fast. A corpus of 100,000 documents running extraction at $0.01 per document is $1,000 **to build. But knowledge changes. Documents get updated, entities evolve, relationships become stale. **This isn't a one time cost it's an ongoing infrastructure commitment.

The Microsoft GraphRAG **paper noted that graph construction costs can be **10–100x higher than standard RAG indexing, depending on corpus size and extraction complexity. For many use cases, that's a reasonable investment. For others, it's prohibitive.

2. Maintenance Is Continuous and Non Trivial

In a standard RAG system, updating the index when data changes is relatively mechanical process the new document, chunk it, embed it, replace the old vectors.

In GraphRAG, a new document isn't just new text. It may

Introduce entities not yet in the graph
Rename or merge existing entities (disambiguation challenge)
Add relationships that contradict previously stored ones
Require schema updates to accommodate new relationship types
Trigger cascading updates across connected subgraphs

Real knowledge graph maintenance involves entity resolution (merging duplicate nodes), relationship validation, conflict handling, ontology management, and quality monitoring. This isn't optional a stale or inconsistent graph produces worse answers than no graph at all. Organizations running production knowledge graphs typically have dedicated data engineering pipelines, not just an extraction script that runs once.

3. Query Complexity Is Significantly Higher

Vector RAG retrieval is fast and conceptually simple embed the query, run approximate nearest neighbor search, return top-k chunks. The main failure mode is retrieving the wrong chunks, which you address by improving chunking, embeddings, and reranking.

GraphRAG retrieval involves: identifying entities in the query, traversing the graph, selecting relevant subgraphs, managing traversal depth (too shallow and you miss context, too deep and you hit subgraph explosion), translating graph results into LLM consumable text, and often generating structured queries in Cypher or SPARQL. Each step introduces new failure modes, and a single error the entity linker fails to identify a key node, the traversal goes in the wrong direction can cascade into a wrong answer even if the graph itself is perfectly accurate.

4. LLMs Are Not Graph Native Models

This is a foundational point that's easy to underestimate.
LLMs are trained on sequences of tokens. They're extraordinarily good at language, context, and reasoning over text. They're not naturally good at topological reasoning, deep multi-hop graph traversal, or understanding complex graph structure. As graph complexity increases more hops, more nodes, more relationship types LLM performance can degrade unless the graph-to-text translation is carefully designed.

This is why active research exists on Graph Neural Networks (GNNs), Knowledge Graph Embeddings (like TransE, RotatE, ComplEx), and specialized graph reasoning models that can work alongside LLMs because language models alone aren't sufficient for the hardest graph reasoning tasks.

5. Subgraph Explosion Is a Real Production Problem

As your graph grows, so does the number of paths between any two nodes. A query that seems simple What does this organization depend on? can trigger traversal over thousands of candidate subgraphs if the graph is dense. Without careful traversal bounds, relevance scoring, and pruning strategies, retrieval latency can blow past acceptable thresholds. Large scale industrial knowledge graphs at companies like Google and Amazon contain billions of entities and trillions of relationships and efficient retrieval over those structures requires specialized infrastructure, not just a graph database with default settings.

When to Use GraphRAG (and When Not To)

Given the costs and complexity, GraphRAG deserves a clear deployment framework.

Use GraphRAG when:

Relationships are the core question. If users routinely ask about dependencies, hierarchies, networks, chains of causation, or multi hop connections and your current RAG system struggles with these a graph likely adds genuine value.
Your domain has natural graph structure. Biomedical research (gene-protein-disease networks), legal precedent analysis, financial transaction monitoring, supply chain management, security incident investigation these domains are inherently relational, and graph structure captures meaning that flat text loses.
Multi hop reasoning is required. What companies did the CTO previously work at, and what products were they responsible for? requires following a chain of relationships across entities. RAG retrieves disconnected chunks a graph traverses the chain.
Global sense-making matters. The Microsoft GraphRAG research showed particular strength in tasks that require understanding themes, patterns, and relationships across an entire corpus summarization tasks where no single document contains the answer. Standard RAG performs poorly on these.

Stick with RAG when:

Text retrieval is the actual problem. If users are asking questions that can be answered by finding the right paragraph — policy lookup, document Q&A, manual search RAG is often simpler, cheaper, and more maintainable. Don't add complexity for problems that don't require it.
Your data changes rapidly. Fast moving data makes graph maintenance expensive. A vector index is much easier to keep current.
Agents can resolve the gap dynamically. More on this shortly.
You're early in your AI journey. Get RAG right first. Chunking, embeddings, metadata filtering, reranking, and permissions are complex enough. Adding graph infrastructure before validating the core product is usually premature.

Then Came Agents Changing the Game Again

While teams were deep in RAG vs. GraphRAG debates, agentic AI was quietly shifting the entire premise.

An agent isn't a retriever. It's a reasoning and orchestration layer that can choose tools, call APIs, query databases, write and execute code, maintain state across steps, and decide what to do next based on intermediate results.

This changes the architectural question fundamentally.

GraphRAG assumes that you should structure knowledge in advance so you can traverse it later. The entire value proposition is precomputed structure available at retrieval time.

Agents introduce a different possibility maybe we don't need to precompute every relationship if the system can discover and assemble relevant context dynamically at runtime.

Consider what an Agent can do in a single reasoning flow

Query a relational database for organizational structure
Search a vector index for relevant documents
Call an internal API for live financial data
Execute code to analyze a dataset
Synthesize all of it into a coherent answer

In some cases, that dynamic composition can substitute for a prebuilt knowledge graph especially when the relationships are discoverable from authoritative source systems rather than needing to be extracted and stored separately.

Major Agentic Frameworks in Production

Several frameworks have emerged to support this style of architecture:

LangGraph (from LangChain) provides a graph based state machine for building multi-step agent workflows with explicit control flow
AutoGen (Microsoft) enables multi agent conversations where specialized agents collaborate on complex tasks.
Microsoft Agent Framework = AutoGen+ Semantic Kernel is new Agentic framework to provides for building Multi agents + Workflows.
CrewAI focuses on role-based multi-agent systems for structured workflows.
Amazon Bedrock Agents and Google Vertex AI Agents offer managed agentic infrastructure at cloud scale

These frameworks don't replace retrieval they orchestrate it. An agent using LangGraph might invoke a vector search tool for semantic lookup, a graph query tool for relationship traversal, a SQL tool for structured data, and a web search tool for current information all within a single reasoning chain.

The Real Future: Composition, Not Competition

The industry loves a clean narrative. RAG is dead. GraphRAG wins. Agents replace everything.

None of that is how it actually plays out in production systems.
What we're seeing in Microsoft's research, in enterprise AI deployments, in the emerging architecture patterns at companies like Uber, Airbnb, and LinkedIn is convergence toward hybrid, layered systems where each approach plays to its strengths.

The simplest mental model

Or more concisely: RAG finds information. GraphRAG finds connections. Agents decide how to use both.

The future isn't choosing one acronym over another. It's building systems smart enough to know when each approach applies.

A Practical Decision Framework for Teams Building AI Systems Today

Most teams don't fail because they chose the wrong technology. They fail because they never got clear on what they were actually trying to fix. A few honest questions asked early can save months of over engineering.

Start with the failure, not the solution

Ask yourself: what is actually going wrong right now?

If users are saying:

The answer is incorrect
It didn't pick the right document

That's a RAG quality problem not a graph problem. Fix the fundamentals first:

Better chunking strategies
Higher quality embeddings
Stronger reranking

But if users are saying:

It doesn't understand how things are connected
It misses relationships between entities

That's a structural gap. That's where graphs start making sense.

Not every domain is a graph domain

Some domains are naturally relational relationships aren't optional, they're the system

Drug interactions in healthcare
Organizational hierarchies
Legal precedents
Financial dependencies
Supply chain networks

Many common applications are not like this:

Document Q&A
Policy lookup systems
Internal copilots
Knowledge assistants

For these, well built RAG is often more than enough.

Be honest about what maintenance actually costs

A knowledge graph is not a one time build. It's a living system that requires:

Continuous entity resolution
Relationship validation
Ongoing extraction pipelines
Schema evolution as data changes

If the ownership isn't there to sustain this, the graph will drift from reality and once users lose trust, no architecture can win it back.

Sometimes the bottleneck isn't retrieval at all

If your system needs to:

Work across multiple data sources
Call APIs dynamically
Adapt based on intermediate results
Execute multi-step reasoning

Then the RAG vs. graph debate is beside the point. Your bottleneck is orchestration and that's where agentic architectures deliver the most value.

Start simple. Evolve with evidence, not assumptions.

Start with a clean, well implemented RAG pipeline
Observe where it fails in real usage
Then decide: does this failure require relationships (Graph) or coordination (Agents)?

Not trends. Not what worked for another team. Actual evidence from your system.

You don't start with GraphRAG. You earn your way into it.

Conclusion

The next time someone declares a technology dead, look closer chances are it's just being absorbed into something bigger. The most resilient AI systems aren't built on a single winning bet. They're built on clarity: knowing what problem you're solving, what tool solves it best, and how to compose them intelligently when complexity demands it.

RAG finds. Graphs connect. Agents reason. None of them wins alone but together, in the right architecture, they form something greater than the sum of their parts.

The engineers who will build the most capable systems aren't the ones chasing the newest headline. They're the ones who resist the hype cycle long enough to ask the harder question not what's the best technology? but what does my problem actually need?

That discipline matching tools to problems, not problems to tools is what separates trend followers from system builders.
In a field that reinvents itself every six months, that kind of thinking isn't just useful.

It's the only thing that ages well. and finally

The goal was never just to retrieve text. The goal is to help systems understand, connect, and use knowledge in a way that actually supports reasoning. We're getting closer and the path runs through all of these ideas at once.

Thanks
Sreeni Ramadorai

Top comments (5)

Max Quimby • May 23

The line about ongoing extraction/resolution costs is the part that I think gets glossed over in most GraphRAG posts. We tried it on a domain with ~50k documents and the entity resolution step alone became a half-time job for one engineer — schema drifted every time the source corpus added a new document type, and any silent regression in extraction would quietly corrupt the graph for weeks before anyone noticed it in retrievals.

The thing that finally worked for us was your point #4 in reverse: instead of building the graph upfront, we let an agent run vector RAG first and only construct a small per-query subgraph from entities it actually surfaced. Way less infrastructure, almost the same multi-hop reasoning quality, and the failure modes were attributable to a single agent turn instead of an entire pipeline.

Curious whether you've seen cases where the full upfront graph genuinely beat the on-demand approach — I keep expecting to find one and haven't yet.

Seenivasa Ramadurai • May 23

As of now we have less tooling , however this also gets better like any other technology .. we need to give sometime to industry to better align on this GRAG

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community