Do You Actually Need a Knowledge Graph?

Most teams decide they need a knowledge graph for the wrong reason. They read that GraphRAG beats plain retrieval, they see a competitor mention a graph, and they conclude that a graph is the upgrade their AI has been missing. Then they spend three months building one, and their agent answers the same questions it answered before, only slower and at higher cost.

The opposite mistake is just as common and more expensive. A team that genuinely needs a graph builds a pile of embeddings instead, ships it, and spends the next year confused about why their agent keeps giving confident answers that fall apart the moment a question requires connecting two facts.

A knowledge graph is not an upgrade. It is a different tool for a different question. This post is the buyer's guide to telling which question you actually have, written by a team that has built both kinds of system and, more usefully, has talked plenty of clients out of the graph they thought they wanted. This is the sixth post in our series on graph rot, and it is the one to read before you commit a single sprint to building one.

Reach for a graph when the answer lives in the connections, not the content.

What does a knowledge graph give you that a vector database does not?

It gives you relationships as first-class facts, instead of relationships you have to hope the model infers from nearby text.

A vector database stores your documents as embeddings and finds the chunks that are semantically closest to a question. That is retrieval, and for a large share of AI systems it is exactly the right tool. Ask it "what does our refund policy say about damaged goods" and it will find the passage, hand it to the model, and the model will answer. The relationship you needed, this question maps to that paragraph, is a similarity relationship, and similarity is what a vector store is built to find.

A knowledge graph stores entities and the explicit edges between them. A company, the people on its board, the funds it belongs to, the documents that mention it, all connected by named relationships you can traverse. The question it answers well is not "what text is similar to this" but "what is connected to this, and through what." Our entity resolution work exists precisely because a graph treats "is the same company as" as a hard fact it can enforce, where a vector store only ever has a fuzzy sense that two chunks sound alike.

The clean way to hold the difference: a vector database is the right tool when the answer is in the content of one place, and a graph is the right tool when the answer is in the connections between many places.

When is a vector database enough?

When your hardest question is a single-hop lookup over the content of your documents, you do not need a graph, and building one is a tax you will pay forever.

Here is the honest disqualifier, and we lead with it because the cheapest knowledge graph is the one you correctly decide not to build. You probably do not need a graph if your real questions look like these. You want semantic search over a document set. You want a support bot that answers from a policy library. You want to summarize or classify text. You want "find me passages similar to this one." Every one of those is single-hop, content-bound retrieval, and a well-built vector pipeline will serve it faster, cheaper, and with far less to maintain than a graph.

You also do not need a graph if your data has almost no meaningful relationships in it, or if those relationships never get asked about. A graph earns its cost by being traversed. If nobody is asking multi-hop questions, the edges sit there as expensive decoration, and you have bought a maintenance burden with no return.

And the hardest disqualifier to hear: you should not build a graph you cannot keep fresh. A graph that ingests new documents picks up new errors on every load, which is the entire subject of keeping a graph fresh. If you do not have the capacity to maintain it, a graph does not stay an asset. It rots into a liability that produces confident, wrong answers. A vector index that goes a little stale degrades quietly. A graph that goes stale lies with structure behind it.

When do you actually need a graph?

When answering your most important question requires connecting facts that live in different places, treating different names as the same thing, or proving how you reached an answer.

There are three signals, and one is usually enough.

The first is multi-hop questions. If the answer to "which of our portfolio companies share a director with a company under investigation" requires hopping from a person to a board seat to a company to a fund, no amount of semantic similarity will assemble that chain reliably. A vector store can find documents that mention directors. It cannot traverse the relationship, because it does not store the relationship. This is the single clearest signal that you have crossed into graph territory.

The second is identity that has to be enforced across messy sources. When the same real-world entity shows up as "Acme Corp," "Acme Corporation," and "ACME Inc." across a thousand documents, and your answers are wrong unless those are treated as one node, you need the thing a graph gives you and a vector store cannot: a place to make "these are the same" a stored, queryable fact. That is entity resolution, and it is structurally a graph problem.

The third is provenance you can defend. In finance and legal work, "the agent said so" is not an acceptable answer. You need to show the path: this conclusion rests on this edge, which came from this clause, in this document. A graph makes that path a first-class object you can return alongside the answer. This is why our legal and finance agent work leans on graphs. The provenance is not a nice-to-have, it is the deliverable. When we built a system of 23 agents for legal review, the value was not only the answers, it was that every answer could point at the structure it came from.

So how do you actually decide?

You run your hardest real question through one decision path, and the path tells you which tool you are actually buying.

Take the single most valuable question you want your AI to answer, the one that justifies the project, and walk it down the tree. Does answering it require connecting facts across different documents or sources, resolving the same entity under different names, or returning a traceable path. If the honest answer to all three is no, stop. You want a vector database and a good retrieval pipeline, and you will ship faster for admitting it. If the answer to any of them is yes, you are in graph territory, and the next question is not whether to build a graph but which graph to build.

The mistake to avoid is running an average question down the tree. Average questions are almost always single-hop, so they will tell you to skip the graph even when your highest-value question needs one. Decide on the hardest question that matters, not the typical one.

Should you build one or buy one?

Buy or adopt off-the-shelf when your domain is generic and your relationships are standard. Build custom when your identity problem is hard, your ontology is specific, or your provenance has to hold up under scrutiny.

The buy path is real and you should take it when you can. Mature open tooling exists, and managed graph databases and off-the-shelf GraphRAG frameworks will get a standard graph running quickly. If your entities are common, your relationships are obvious, and nobody is going to audit the result, adopt the existing stack and move on. Building from scratch what you could have configured is its own kind of waste.

The build path earns its cost in three situations, and they are exactly the situations our data engineering work tends to land in. The first is when entity resolution is genuinely hard, where the same company appears eleven different ways and a naive merge produces a graph that quietly lies. The second is when your domain needs a specific ontology, the relationships that matter in fund administration or contract review are not the ones a generic extractor pulls. The third is when provenance and correctness are load-bearing, which is when you need the scoring and acceptance gates we wrote about in scoring a graph before you trust it, not a graph that merely runs.

A blunt rule of thumb. If a wrong answer is an inconvenience, buy. If a wrong answer is a liability, the extraction, the resolution, and the validation are the hard part, and that is the part worth building carefully.

What does a knowledge graph cost you after you ship it?

The cost of a knowledge graph is not building it. It is keeping it true.

This is the line item teams forget, and it is the one that decides whether the graph was worth it. A graph is a living system. Every new document is a chance to introduce a duplicate entity, a mislinked edge, or a stale fact, which is the whole reason graph rot is the name of this series. The total cost of ownership includes incremental ingestion, ongoing entity resolution, continuous validation, and a confidence score on every node and edge so you can find the weak parts before an agent does. A vector index mostly just needs re-embedding when content changes. A graph needs governance.

So the buyer's question is not "can we build a graph." Almost anyone can stand up a graph that looks done. The question is "can we keep one true," because a graph you cannot maintain is more dangerous than the vector store you replaced. Count the maintenance before you count the benefit. If the maintenance math does not work, the honest answer is the vector database, and we would rather tell you that now than after the build.

What did building both teach us?

Across 50-plus projects since 2019, the pattern is consistent: the teams that get the most from a knowledge graph are the ones who needed it least desperately, because they decided on purpose rather than by trend. They had a real multi-hop question, a real identity problem, or a real provenance requirement, and they could name it before a line of code was written. The teams that struggle are the ones who built a graph to keep up, then went looking for questions it could justify.

The second lesson is that the line between the two tools is not as sharp as the vendors make it sound. Plenty of the systems we ship are hybrids. A vector layer for content-bound retrieval, a graph layer for the connected questions, each doing the job it is actually good at. The decision is rarely "graph or not." It is "where does the value actually live, in the content or in the connections," and the honest answer is often "both, in different places." Our enterprise retrieval work is usually exactly that hybrid, not a graph for its own sake.

If you can name your hardest question, you can answer the graph question yourself. If you cannot, that is the real problem to solve first, and no graph will solve it for you.

We build and fix knowledge graphs for AI systems, and we will tell you when you do not need one. If you are weighing a graph against a simpler retrieval system and want a straight answer, book a 15-minute call.