Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Knowledge Graphs for AI and Intelligent Systems (DAT209)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Knowledge Graphs for AI and Intelligent Systems (DAT209)

In this video, the speaker explains how knowledge graphs address AI challenges with enterprise data, particularly the problem that only 5% of AI projects reach production. The session covers context engineering as an evolution of prompt engineering, emphasizing the need for "minimal viable context" to provide LLMs with the right information at the right time. Knowledge graphs solve the query diversity problem by storing data as nodes and relationships in a property graph model, enabling multi-hop queries and pattern matching. GraphRAG is presented as superior to traditional vector databases, with studies showing three times better accuracy. The architecture allows projecting data from platforms like Aurora, Redshift, and S3 into Neo4j as a context layer. Beyond GraphRAG, graphs enable visualization through Cypher query language and can store user interactions as memory for future context.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Enterprise AI Challenge: Why 95% of Projects Fail to Reach Production

Hi everyone, thanks for attending today's session. We'll go over why knowledge graphs are great for AI and agentic AI, and we'll see how it makes your AI more trustworthy and explainable.

So here's the agenda. We'll go through some of the AI challenges we have with enterprise data and understanding context engineering. Some of you already know that term, but we'll go over what context engineering is and why knowledge graphs really help to build that context and make your data AI ready. I won't have time to go through a demo today because of all the slides, but if you stop by booth 1212, we'll go through the demo and you can also understand our new agents and MCP servers. Let's go over some of the challenges we have, especially with enterprise data.

As you all know, AI is a great accelerant for new experiences and processes. It really drives innovation, reduces complexity, improves experiences, and lowers costs. It helps grow revenue, manage risk, and achieve market differentiation. There is definitely a lot of adoption with AI. But you'll be surprised to know that only 5% of the projects actually go into production. 95% don't reach production.

This is from a new report from MIT and Gartner study showing that only 20% of the projects actually do a POC, and of that only 5% go to production. A lot of the customers we surveyed say that the data is not AI ready. That's the reason why they don't go to production. The reason might be that the data is in silos. There might be security constraints. There are a lot of regulations and compliance issues. Those are the main reasons.

Here are some of the AI challenges around enterprise data. We know there's hallucinations, obviously, but that's improving on public internet data. On enterprise data, it still needs a lot of improvement. AI really is like a black box where enterprises don't really get all the data to customers, and there's a lot of regulations and compliance around data. Those are the main reasons. Another big reason is the context. You really don't have structured or connected context-rich information that we can give to LLMs. Those are the main reasons.

What we've done is bucket that into three big buckets. There's a data organization problem where your data can live in many data sources. You can have data warehouses, lakehouses, traditional relational databases, document stores, and so on. AI really cannot access that information. That's a challenge. The other is you might have different schemas, data models, formats, and structures, so AI cannot really understand or interpret that information. That's another challenge.

Those first two are easy to interpret. A lot of enterprises are actually working on that. The third one is a big challenge: the scale of query diversity. Let me go a little deeper on that. Query diversity is a big problem where you ask a question to an LLM. With the enterprise data you have, there's a lot of information and a lot of sources to gather. It's not one query. It's not one question to one query. You have a lot of queries to gather and then get to a solution.

Let me give you an example. Let's take a supply chain optimization use case, and the question is you want to find the best routes from point A to point B. That's your problem. That's your question. But now you have to deal with a lot of data sources. You might have to get into ERP systems, your data warehouses, many of those sources. Now you start asking and digging deeper. Probably you want to say you want to identify possible routes. You want to trace dependency chains. You want to ask about delays and lead times. So you start asking those questions. That's basically a lot of queries. So now you're hitting it with a query diversity problem.

There are two ways to fix that: either give a lot more context to the LLM so that it understands, or consolidate the data somehow.

Graphs help you consolidate the data so you have a simpler query. That's exactly what we're trying to do. Graphs help you do that by boosting accuracy and improving explainability. The reason you're here for this talk is that graphs actually allow you to get more context and make the data AI ready.

Context Engineering and Knowledge Graphs: Making Data AI-Ready

Before we dive deeper, let's understand what context engineering is. Some of you already know that term. It's basically an evolution of prompt engineering. By definition, it means you give enough information to the LLM so that it can get to the next stage or next step. You want the right data at the right time in the right structure so it knows what to do next.

Why does this matter now? Traditionally, and even now, what's happening is you expect one-shot LLM prompts. You ask a question and think you'll get a straightforward answer, but that's not the case with enterprise data. You have multiple data sources. The LLM has to reason, especially with agentic AI. It has to plan and execute multi-step actions. You probably call multiple tools. It has to fetch data from various places. It comes back to the query—it has to do multiple queries to get that information. What we're saying is give enough context so it understands what to do next. That's the basic idea, and it's also a definition used by LangChain.

What are some of the sources you have? You have all the user interactions that you give to the LLMs now, which can be your prompts and your feedback, and all the interactions you have with LLMs. The system keeps the state and history, so that can also be part of the context. Models have short-term and long-term memory, and that can also be used. You have structured output, which is basically your APIs and data tools that you fetch data from. That is also part of the context. The other big thing is RAG. You can actually retrieve information from your vector database or your graph RAG, where you have a graph database like Neo4j, and that can be part of the context.

This is the same view but from a different angle. You have human interactions, the cognitive layer of the application, the memory layer of the models, and data tools with APIs. It's the same thing but just laid out in a different view. However, just because you have data doesn't mean you can give all that information to the LLMs. It's going to hallucinate again because it has too much information as context. What we're saying is there's something called minimal viable context. With the definition of context engineering, it's just the right information for the LLMs to do the next step. You have to make sure you have a higher signal-to-noise ratio to get to the LLMs. That's another concept with context engineering. At the crux, that's what it is. You give the right information. Even if you have agentic models, agents should understand what to do next. You need that context information.

So why graphs? Before I dig in, what is a graph database itself? Let's understand the concept of knowledge graphs. By definition, it's a design pattern to organize and access connected data. You can have any connected data—everything is connected. Let's say you have a supply chain, financial data, networking, access control—all of that is connected. By definition, you can have a supply chain knowledge graph, a financial knowledge graph, or any knowledge graph. It has specific information around that domain.

The way we store data is something called a property graph model. Instead of tables and columns, we have nodes and relationships. Think of a supply chain—think of a supplier and raw materials. A supplier is a node, and raw materials are a node as well. The supplier supplies raw materials. That's a verb, and that's a relationship. That's how we store data.

Within that information, you can store properties. I can say a supplier supplies raw materials, but what's the data it's supplied on, what's the lead time, what's the capacity? Those are the properties that you can store inherently within the database. That's a big difference. I don't have to have joins and tables to extract data. Everything is inherently stored.

Think of this as another example. You have a supplier graph, a product graph, and a consumer graph. Now all of your business can be connected in one view. This solves two problems. First, you have the query diversity problem. It's simpler to query with graph databases because you have all the connected view. I can do a multi-hop query, I can do reasoning, all of that because everything's connected and inherently stored.

The other big challenge which we solved by this is giving the right context. I can give minimal context to the LLMs for them to do the next step. That's the reason why we say it's AI-ready data because you have all the connections and context within the database. Think of this as a left and right brain. The right brain is basically your language, reasoning, and creativity, and then your left brain is basically your knowledge layer. You have context enrichment, and obviously you have more explainable AI.

GraphRAG in Action: Superior Accuracy and Beyond Retrieval

Coming back to RAG, this is part of context engineering. You're trying to provide as much context as possible. That's the whole idea. Traditionally, you have a retriever where you retrieve information from a vector database, which can be NoSQL or vector database. With graph, you're replacing that with a graph database and that's GraphRAG. What we have now is that since we have the connected view of all the information, you have the context. I can provide a lot more richer context to the LLMs. That's the whole idea with GraphRAG.

Now, an example would be going back to supply chain, the same question I asked before. What's the best route from point A to point B? With graph, I don't have to dig deeper into ERP solutions, spreadsheets, dashboards, and all of that. I can just do a single query saying what's the shortest path. That's a query, that's an algorithm I can use, and you get the answer. It's as simple as that. That makes it much easier.

You also have other information that you can retrieve. You can do pattern matching, path finding. You have community-driven algorithms that you can retrieve from the graph databases. It gives you an added benefit of using a graph database. Just to hit on the architecture here, you have a data platform that can be anything—since we're at the AWS conference, it can be Aurora, Redshift, S3 buckets, and so on. You have LLM and Knowledge Graph sitting in between that as a context layer or a knowledge layer.

A lot of companies project the data from your traditional data platforms into knowledge graph. You don't have to project everything. You can just project part of the data there where you want to run algorithms or where you need the context from, and that can act as a context to LLMs. A lot of companies are doing that. That just gives you an idea of how you want to use knowledge graphs. The knowledge graphs also lead to more accuracy. This is not me saying it—this is a public report which you can access. They ran a study where they ran knowledge graphs against NoSQL and SQL databases, and they found that the accuracy is three times more accurate in terms of responses they get from LLMs.

Just to level set, why is that? Let's go to the semantics of it. Let's say you have a very simple example of apples and oranges. On the left, that's how it's stored in the graph database. I don't have a vector representation. That's how it's stored. On the right, it's a vector database with vector embeddings. That's how you store the information. To do a search, you probably do a Euclidean or a cosine similarity search.

With a graph database, search is very easy. You can simply query the data. You can do much more with it, such as pattern matching and path finding. Imagine now that you have all of this information in one place. Let's say you have a supplier graph, product graph, and consumer graph. Now you can really analyze the data and do much more with it, not only providing context to LLMs, but unlocking additional capabilities as well. That is the essence of the advantage of using a graph database.

This is my last slide. Graphs go beyond GraphRAG. You have context engineering, where you provide the right level of context. This is possible because all the information is stored within the graph database. Additionally, when you interact with LLMs, all user interactions can be stored as context as well. Within the RAG architecture, after you receive a response, you can store all those interactions as context. You can store that as a property, relationships, or nodes within the graph database, giving you that as a memory that you can use as context going forward.

Beyond that, you can obviously visualize it. Another thing we didn't discuss is Cypher query, which is a GQL query, just like SQL. You have GQL, which is graph query language. Using that, you can do many things like pattern matching, path finding, and so on. That really allows you to visualize and explore data even further. I think that's all I had. If you have any questions, we have booth 1212, and we can probably go through a demo. I didn't have time for the demo today, but if you come to our booth, we'll walk through a demo. We also have our agents and other MCP servers as well. Thanks for coming.

; This article is entirely auto-generated using Amazon Bedrock.