Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Grounding GenAI on Enterprise Data with AWS AgentCore + Coveo (MAM221)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Grounding GenAI on Enterprise Data with AWS AgentCore + Coveo (MAM221)

In this video, Nicolas Bordeleau from Coveo explains how to ground GenAI with enterprise data using AgentCore and Coveo. He discusses why LLMs need grounding for factual accuracy, contextual relevance, traceability, and reducing hallucinations. Key topics include what makes a good retriever (depth of knowledge, relevance quality, execution speed), Coveo's MCP toolset offering passage retrieval and document retrieval, and integration architecture with AWS Bedrock AgentCore. The session emphasizes that effective prompting and precise MCP tool descriptions are critical, with specific guidance on tool naming conventions and front-loading important information in descriptions.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Coveo's Enterprise AI Solutions and the Critical Role of Data in GenAI

Hi everyone, thanks for taking some time from your busy re:Invent schedule to come and listen to this session. We're going to talk today about grounding GenAI with enterprise data, working with AgentCore and Coveo. I'm Nicolas Bordeleau. I work at Coveo. I work in the Product Relations team. Looking forward to giving you some information around grounding on enterprise data.

So the agenda for today, we're going to be talking about why grounding is so important. A couple of notions to cover there. Also a bit of architecture about how to integrate Coveo with Bedrock. The secret sauce, no spoiler alert, it's going to be prompting. Everything is about prompting in the LLM world and a few next steps after that if you want to learn more.

I wanted to start by not bragging, but I want to set the stage a bit so that you understand who we are, what we do, and also give some credibility so that you should listen to what I have to say. We are an enterprise search company. We've been working with many customers, helping them get GenAI to production. This is an example of Dell. Dell is using Coveo to power multiple of their portals. If you want to buy a laptop, if you want to buy material from them, it's going to be powered by Coveo. If you go and look at the support section also, it's going to be also powered by Coveo. They have really complex products, and they use us to provide answers to their users when they try to solve their issues.

Same thing with NVIDIA. NVIDIA has all the money in the world. They could be building that solution. They decided to partner with Coveo to offer question answering to their customers through the Coveo solution. Intuit decided to integrate Coveo inside of their own application. So if you're in Intuit, if you're looking to find how to work with the product, if you have questions, it's also going to be Coveo there. And Vanguard, in the financial services area, is also a big customer of Coveo. They use Coveo all across the board, internal and external, and that's an example of their Personal Investor portal. If you're looking to get information around their products, not investing information, but information around their products, that's going to be provided by Coveo. So we help our customers go to production with GenAI.

We built a whole solution, but we also have options for our customers if they want to integrate with solutions from AWS to be able to build their own solution. So what I've been talking about so far is the left side of that slide, where we do everything basically. We index your content, we build an index, we ground the prompt, we build the prompt, and then we also provide the UI components for you to be able to deliver that on whatever portal you want to deploy it. We decided to help our customers to offer the rest of the platform as a retriever and then offer integration points with Bedrock, AgentCore, Q Business, and also Quick Suite. And that's what we're going to be covering today, basically how to integrate Coveo into the second one, into AgentCore, but it's going to be pretty similar if you want to work with Quick Suite or other solutions from AWS.

That's a slide I stole from AWS that they presented a couple of weeks ago. I thought it was interesting. I fully agree with what's in there. Agentic is probably the future that's going to enable the LLMs to be fully used to their full potential. I was really interested to see that in any case, there's always a dependency on data. If you're working with GenAI, you need to ground the LLM based on enterprise data. You need to ground them so that they don't hallucinate. So there's a strong dependency on data. This is where we come into play. Same thing for GenAI. There's always a strong dependency on tools and data so that these agents or those models know how to behave and know where to get fresh information. So let's jump into the meat of this presentation, why LLMs need to be grounded.

Why LLMs Need Grounding: From Factual Accuracy to Reducing Hallucination

They need to be factually accurate. So for LLMs to be factually accurate, you need to provide them some information. They've been trained on data that is basically dated, and they don't know what's true, what's not. If you're asking a question, they're going to answer and they're going to basically give you back the information that they got in. If it's not what your brand thinks about what should be told to your customers, they don't care. They're just going to give back the information to the user on the other side trying to get information. If you ground them, it's now a source of truth that you're giving to the LLM, and they're going to be able to provide more accurate information and be factually accurate.

If you want your LLMs to be also contextually relevant, you need to ground them. LLMs don't know much about the users on the other side asking the question. You want to be able to provide project information, user history, those kinds of things need to be added into the prompt. That can be done by grounding the LLM to make it easier for you.

Traceability and trust. When you ground an LLM, you can now do source attribution. If you want your users to be able to trust what they read, you want them to be able to see where that information is coming from. If you don't ground an LLM, the information is coming from the LLM itself, and there's no way to trace back where this information is coming from.

When you are grounding an LLM, you're basically providing them the information that they should be using to give back an answer. This is where you can do source attribution, and then the users can navigate those links and validate if the information is accurate, and they can gain confidence in the system. Dynamic knowledge update is another important one. Basically, LLMs have been trained on a set of data that is fixed in time. If you want to be able to provide them updates on data, you need to provide that as grounding information.

So maybe the information that you want to expose in that LLM is already available out there, is already public, yes, but it's dated to the last time the model has been trained. So you need to be able to provide grounding information to those models for those reasons. The holy grail of grounding is to reduce hallucination. You want LLMs to be able to answer factually with information that you trust so that they don't provide false answers to your customers.

LLMs are really good at lying and having us believe that the information that they provide is true. So by grounding them, you're able to reduce the amount of hallucination that they are doing. And in enterprise, it's just mandatory. There's not enough information about your enterprise available out there, so the models don't know about your data. You need to ground them so that they are fully accurate.

What Makes a Good Retriever: Key Capabilities and Coveo's MCP Toolset

Now a bit more on what makes a good retriever. A good retriever needs to have a good depth of knowledge. You need to be able to look at a large amount of data. It's fairly easy to build a vector database to be able to get a small set of data, to be able to ground your model based on a small set of data. But what we see with users these days is that they go and talk with an LLM, they ask the first question, and then they don't go back.

In the Google days, you were looking at a result and then you were navigating that result, and then you were gone. You were on your own trying to find the information. With an LLM, people go there and they ask one question, they refine, they ask a second question, so you don't know exactly what's the scope of what they're going to be looking for. So you want to be able to provide a larger set of information so that when they talk with that LLM, they're always able to get an answer out of that LLM.

They don't end up in a dead end where there's no information to be retrieved. Just like the LLM needs to be grounded, a good retriever needs to be contextually aware. You want the retriever to be able to know about the users in front of you. You need that information to be able to personalize the information being returned based on who you are, based on what you have access to, based on what you've done before.

Retrievers need to be able to take that information into account and retrieve a set of information that's going to be used to ground the LLM based on who you are, based on what you're currently trying to do. It's extremely important that retrievers are able to be contextually aware. Relevance quality is probably the most important one here, because you are providing information in fact to the LLM before the inference time. You're basically telling the LLM just to use its linguistic capacity.

You don't want it to use its own information. You're basically telling it to answer the user question or decide a course of action based on the information that you provide. Don't use anything else. Use what I'm providing you here. So if what you're providing is not relevant, is not accurate, then you're going to provide false answers, and that's by design. So relevance quality is extremely important when you are working with a retriever.

Execution speed is also important. Retrieval, the retrieval part of a RAG pipeline, happens at the first stage, basically. So when you talk with LLMs these days, if they're not grounded, the answers are going to be coming fast, and then you're used to seeing the answer being streamed. So as the answer is generated, it's being returned to you, and you consume that information.

When you are grounding an LLM, that information is retrieved first. So there's a first step where the user is basically waiting on the LLM to start to generate an answer. So you need that retriever to be really fast so that users can start to see the answer as soon as possible. Format supported is important. You can do multiple things with a retriever. You can do multiple things with an LLM. You can do deep research. You can ask them questions to guide you toward exploration.

The classic way retrievers work these days is by returning chunks of information, passages of information, basically parts of information that are useful for the LLM to be able to answer the question being asked. But sometimes you just want to have links so that people can go and can navigate those links to do further exploration. Sometimes you want to do deep research or you want to answer a really complex question, and you don't need some passages of a document.

You need the whole document so that the LLM can look at the whole information and make up its own mind and answer the full question. So I think a retriever that is able to provide you various formats of information is extremely important as well. And concisely, being concise and precise, returning the right information in the shortest format as possible will make the job of the LLM easier, because it's easier to consume smaller, more focused portions of information. You don't have to decipher what you've returned. They already get the right passage of information.

to answer the questions. And it's also going to be cheaper for you. The more information you put in that prompt, the more you're going to have to pay for those input tokens. So if you have a retriever that is concise and precise, in the end you're going to be paying less for that LLM inference, although input tokens are not the most expensive one. In the end, it shows up anyway.

We offer a toolset of retrieval tools through MCP. In that toolset we have a passage retrieval, so we're able to extract passages from your documents and retrieve those documents so you can use them to ground whatever tools you're trying to build with LLM. We also have an answer generation tool in that MCP server, where if you're looking to get an answer and use Coveo more as an agent, basically as a question answering agent, you can decide to leave off the prompt but simply get answers from Coveo as an API and then provide those answers to your LLM on the other side that might be specific to do something a bit more simple than answering complex questions.

We also have search and document retrieval. So if you just want to get a list of results for users to explore and do their own things with the data, we can also offer that. And also full document retrieval, as I was saying before, for deeper research and for more complex questions, getting access to the full document, not just passages, not just stacking passages one to another. Getting the full document, let's say a procedure to rebuild a complex engine or stuff like that, you can get those from Coveo, making the job of the LLM much easier.

Building Agents with AgentCore: Architecture, Prompting Best Practices, and Next Steps

This is an architecture that we suggest to our customers when they want to get started to build an agent. Pretty simple, nothing groundbreaking in there. In the middle you have an agent. That agent obviously uses an LLM. There is long-term and short-term memory at the top so that they can have a session. They can work in the context of a few interactions. You basically have a conversation with that agent.

That agent is connected via gateway, that's a service from AgentCore, to the Coveo MCP server, and on the side you have all the tools I've been talking about that are offered there via an MCP server. So the gateway is going to be registering those tools. Now the agent has a set of tools that he can use to do whatever job you wanted him to do. And we're also leveraging the identity provider from AgentCore so that actions that are performed on Coveo are performed as a user who's authenticated on the other side.

So if I try to ask a question to an agent, if I don't have access to some specific part of the information that is on the Coveo index, those information won't be coming through to the agent. So there's no leakage of information. So that's also an important part to add in there to have some more contextual information based on who you are basically.

Secret sauce, it's a combination of prompt and MCP description. Basically you are building an agent. You are giving him a set of tools, so you need to be extremely explicit around what those tools are, when to use those tools, what they're for, how they work, and all these things. In the end it's a model that's going to be using those descriptions to decide what to do and how to use your tools.

So this is a kind of a meta prompt that we work with a few customers. Global directive, who you are, what's the main job of the agent. So that's up to you to decide what you do with it. But you want to talk about grounding, you want to talk about memory, you want to talk about sources. So there's going to be grounding available, that's going to be done with the XYZ tools. So you need to be explicit with that.

Make a clear distinction also between memory and fresh information from the retriever. So it's another source of information. Memory is also information, much more limited, but it's another source, so you need to be explicit when to use both. If you want your sources to be cited, you also need to be explicit around that. So it's a bit like taking a young child by the hand, but in the end you do it once. You test it multiple times, but you do it once, and at some point the agent becomes autonomous and is able to use those tools autonomously.

We have some questions that go as far as defining some types of questions, what is coming from memory, what is coming from retrieval. So you can go much further, but if you build something good at the top, that's a really good starting point to be able to use a retriever in a good way.

The other part is the MCP description. So you have an agent, you told them how to use, not how, but you told them basically when to use the tools, what to do with these tools. On the other side you have an MCP server that contains these tools. What we provide by default is pretty simple. There's a retriever which is Coveo. You have tools for search, for retrieval. You have tools for the full document, but we don't know what you're exposing in your index on the other side. So you need to be explicit around what's going to be available through these tools.

So you want to first use the .NET tool naming standard. We see so many customers confused by using dots in the name of the tools, so avoid dots and go with snake case, maybe dashes, but tool naming is quite important. The most important one is probably having good descriptions for your tools. Stick to concise descriptions. Once again, that's going to be used by an LLM on your side, so the more precise your tool descriptions are, the easier it's going to be for the LLM to use those tools properly.

Stick to one to two sentences and front-load the information. If the tool is to create a case and requires authentication, start with "creating a case requires authentication" as the primary information, and put secondary details afterward. Use verbs and specify the type of object that's going to be retrieved. There are lots of guidelines we can provide more information around, but defining your tool the right way is going to make a difference for your LLM agent to use them.

Also, regarding schema versus description, when you use an MCP server, you get a whole schema, a JSON schema for that MCP server, which contains descriptions for the tools. The whole schema is what's going to be used at runtime for the agent on the other side to call your tools. So procedure descriptions, arguments, and all the things that are required to call your tools are going to be in there. But the only part that the LLM is going to use to basically decide when to use and what to use as tools are going to be the descriptions. So they are part of the same bundle, but they're used for different things. Make sure that you provide good descriptions for your tools.

That's basically what I had planned for today. We have a booth around the Atlassian booth. If you make the tour of the Atlassian booth, you'll find Coveo's booth, 1529. We also have an AI Masterclass series, which are webinars. I think there's probably one around every two weeks or every month. You can use the QR code here, or you can go on Coveo.com to find the latest AI Masterclass.

And I have to tell you to go to the app and fill the survey. Actually, it'll be appreciated if you fill a survey to let us know how to improve. There's a few more minutes if people have questions. I'm happy to answer questions, or if you want to come talk, I'm also available. I'll be at the booth for the rest of the day as well. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.