Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - Supercharge app intelligence using gen AI with Amazon DocumentDB (DAT313)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Supercharge app intelligence using gen AI with Amazon DocumentDB (DAT313)

In this video, Cody Allen and Doug Bonser demonstrate three generative AI access patterns for Amazon DocumentDB. They showcase a Bedrock-powered TSQL plugin for mongosh that translates SQL queries to MQL, helping relational database developers work with DocumentDB. The session covers RAG architectures using vector embeddings, demonstrating a chatbot built on DocumentDB's developer documentation with HNSW indexes and cosine similarity. They also present Model Context Protocol (MCP) servers for exploring DocumentDB data through natural language in Visual Studio Code. Best practices include choosing between IVFFlat and HNSW vector indexes, creating indexes before data insertion, and optimizing recall rates versus query performance based on specific requirements.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Supercharging App Intelligence on DocumentDB with Generative AI

Just to make sure you're in the right session, this is DAT 313. We're going to talk about supercharging app intelligence on DocumentDB with, get ready, a brand new concept you haven't heard of before: generative AI. I know this is new. We'll walk you through it and explain what it is. Raise your hand if you haven't been to 17 sessions today about Gen AI, right, since lunch? No, I'm kidding.

So what we're going to talk about is a couple of different things. We're going to go through Gen AI. We're going to introduce ourselves in a second, I promise. We're going to talk about generative AI, but from a database perspective. We want to look at Gen AI from DocumentDB, from your data stores. We're going to talk about the foundation of these on databases, on your vector embeddings. We're going to go through the important parts of this. We're going to kind of explain the foundation of everything we're going to show you, and that's those access patterns.

I'm Cody Allen. I'm a Principal Solutions Architect with DocumentDB. With me is Doug Bonser, Senior DocumentDB Solutions Architect. What Doug and I do is we sit down with you. We talk to customers every single day. It's our favorite thing to do, actually, talking about DocumentDB. In these conversations, we talk about Gen AI and what they're doing with Gen AI, the proof of concepts that they're doing, the enterprise applications they're rolling out, and we want to share that with you. We want to tell you what we're seeing. We want to show you some of the tools, walk you through some demos of what customers are doing and what we're doing with Gen AI. We're going to go through those access patterns, and we're going to finish off with best practices.

The Enterprise Impact of Gen AI and the Critical Role of Data

It's overwhelming, to say the least, building new generative AI applications. There's a lot of variables, a lot of different factors that go into it, and we're going to leave you with some best practices, things that you can do to go back and start building these applications. With that said, let's go through a background of what Gen AI is. I'm kidding. You know what Gen AI is already.

What's interesting is there was a McKinsey study that said that Gen AI is going to add between 2.6 and 4.4 trillion dollars to the global economy annually. That's nuts. I can't even fathom that type of number. But really, what this is saying is that Gen AI apps are becoming an enterprise initiative. Everybody's talking about it. Who here works for a company that is not talking about Gen AI? Yeah, exactly. Everybody's focusing on this, finding what they can do because it's disruptive, right?

There's this massive amount of models that are available both on the cloud and that you can run locally. We'll show some of that. But the lifeblood of all of these Gen AI applications is data: secure data, enriched data, your data. That's where it all starts, and that's the most important thing. Doug and I were up here last year talking to you about generative AI on Amazon DocumentDB, going really deep on some vector indexes and how you utilize those. And the thing that we told you then was that your data is a differentiator. And guess what? Nothing has changed. Your data is still the differentiator. That's what makes these Gen AI applications that are going to be unique to your business.

Now, generative AI applications are seen as a new application type that sits on top of your existing data foundation. That means that you want to plug into your existing data sources: your data lakes, your data warehouses, your external or hybrid data stores, your document data stores, right, in order to build these applications. Now, customers have told us they don't want to create new data architectures for what is basically a new application set. They want to take advantage of the systems that they already have, the systems they're already using in production environments. They want to leverage their existing architectures to create these new Gen AI workflows, and they want these new Gen AI workflows and the applications to follow their enterprise rules that they've already established.

Now, keeping vectors as close as possible to that underlying data store is really going to simplify this architecture. It's going to minimize the data movement, the ETL process that you have to do. It's going to improve performance. It has fewer places to stop. It even has the advantage of decreasing licensing costs by introducing new applications. And speaking of that, when you start adding these new types of application layers, guess what? You have to learn new APIs. You have to learn new SDKs. You have this overhead of this new programming language in order to implement these.

For instance, once you have a vector store on top of an existing database, you leverage that knowledge that you already have. So, for example, if your organization is familiar with and you store your data in Redis, well, MemoryDB's vector functionality would be a good fit. This goes with Aurora. This goes with RDS and OpenSearch and Neptune Analytics, and guess what? The reason you're here: Amazon DocumentDB. Generally, this allows you to avoid introducing a new database component just to be able to build an AI application, and you get that confidence knowing that your existing data structures are already proven in production. They already meet your security requirements. They already meet your availability, your storage, and your compute requirements by building on top of that.

Understanding Vector Embeddings: Translating Text into Machine-Readable Context

With that being said, let's go to that first piece I was telling you about, that foundational piece of vector embeddings. This is the key component of vectors on databases. What is a vector embedding?

Well, it's just a numerical representation of your text, even your videos and your photos, but for the sake of DocumentDB, we're focusing on your text. We all can look at what's on the left and understand that. We can read those words and understand the context, but computers can't. So what we have to do is turn it into numbers. Machines understand numbers, and we have to turn that text into numbers. What this does is it allows these machines to understand context in the way that we understand context. Words have different meanings when they're close to other words. When we ask questions in a certain cadence or a certain pattern, it might change the meaning and therefore change the results that we get back. It's not a binary search where I'm just searching for a name, an address, or an email address. We're looking for context, that natural query language.

Now, like I said, Doug and I, we talk to customers all the time, and what we see customers doing is building these Generative AI proofs of concept and rolling things into production. But really their goal in doing this is to improve their customer experience. They want to improve their employee productivity. We see this a lot internally at AWS. They want to create brand new content, or they even want to improve their business operations. This is across every business that we talk to, whether it's financial or engineering or customer support or sales or marketing. Customers are running these pilot programs just to try it out, see what they can do. They're experimenting with Generative AI.

We have very strategic customers that are building chatbots. They're doing virtual assistants. They're doing conversational search and natural query language and code generation. Man, I used to be a terrible programmer before Generative AI. Now I'm a terrible programmer with Generative AI, but Generative AI fixes it for me. It's wonderful. I'm amazing, I believe. I don't think so.

First Access Pattern: Amazon Bedrock-Powered DocumentDB TSQL Plugin for mongosh

Let's talk about these access patterns. What are some things that you can do with this? Well, the first thing we're going to look at is building a tool that helps us with DocumentDB using some of our Generative AI tools. We built this Amazon Bedrock powered DocumentDB TSQL plugin for mongosh. Whew, that rolls off the tongue. It's like 72 characters long. Clearly I'm not in marketing.

Who here is a relational database person? Either a DBA or that's what you know? Me too. SQL Server, Oracle, MySQL, Postgres, that's my bread and butter. I look at this, minus the flickering lines there, I look at this and this is home to me. This is a warm cup of coffee on a nice cold day in a nook. That makes sense. Holy cow, that no longer makes sense. Hey, it makes sense again. SELECT star FROM table joined to this other table WHERE field equals field. So imagine when I started my MongoDB journey, my DocumentDB journey, and I had to learn MongoDB Query Language, MQL, and that bread and butter, nice cup of coffee became this. Yes, exactly. It just got all skewed. It doesn't make sense to me anymore. I have to do the lookup. What the heck's a dollar lookup? I have to do this dollar project. I'm not a project manager. Why do I have to care about project?

It didn't get any easier when I had to do update statements. Again, it's home. I get this. It feels good. Even this far gone from it, it feels good. And then I had to go do something like this in MQL. I have to do that dollar lookup again for each. I have to do a function to update this. My simple lizard brain, my relational brain didn't get this.

So we created this tool that translates these TSQL commands into DocumentDB commands within the mongosh environment, and they're set up to automatically handle all the supported APIs, operators, and data types in DocumentDB. And guess what? We actually made two versions. We have one that reaches out to Bedrock, and we have one that you can customize the prompt and the LLM for that uses Ollama. So you can actually use this locally on your machine if you don't want to reach out to Bedrock. Dramatic pause.

Let's go to a demo. All right, so this is available out on our GitHub repository. And we have our two different packages here, and we're going to start off by looking at our Ollama plugin. We go in that directory. And we're going to run our shell script to set this up, and what this is going to do is install all of our dependencies and it's going to download an Ollama LLM for us, and we'll see that in just one second. It handles compatibility using our compatibility tool to reference what operators it should use in these mongosh commands against DocumentDB. And once we do that, we can see that it's installed the CodeLlama 7B library. Now, it's a very large library, 3.8 gigs. It takes a while. It's not as fast in the real world as it is in the demo, but that allows us to make all these calls locally. We don't have to reach out to Bedrock or out to the internet at all.

We can see that it was installed. We have that Ollama powered plugin, and within mongosh we use this TSQL wrapper to do that query. So let's go ahead and launch mongosh. It's going to call on that JS file that tells us to use it, that it's loaded.

We have a couple of namespaces for this demo. We have some customers, some orders, some products, about 100 documents in the customers and orders. We have a products table that combines the customers and orders together, that nasty dollar lookup thing we were looking at before, and what we'll do is we'll run this TSQL wrapper, and we're going to do a T-SQL select star from table where field equals value, right? Simple, makes sense. Auto execute false at the end, that just means don't actually run that, just translate it for me. So we can see it's review mode, the commands are not executed, and there is the MQL. So we're just going to copy that, paste that in, and guess what, there's no missing quotes, missing curly brackets, it just works. That would have taken me three tries. It's like plugging in a USB port. I always get it the third time.

We're going to make it a little more complex. We're going to do select multiple fields instead of select star. We're going to add an extra where clause to this, and we're going to say, hey, translate this for me, all within the MongoDB Shell environment. Again, using Ollama locally, and here we have all these extra ones and curly brackets and quotes. We're going to copy that. Hopefully the syntax is correct. Guess what, it's a recorded demo, so it's going to work, and we have our value there. Let's take it a little bit further. So we did select star, we did select fields, let's do a select count from this table. Same criteria, same filter criteria, we're not going to execute it, we just want it to translate.

And when we run this, we don't get the find operator back. We get an aggregation. Oh man, this would have taken me like 30 minutes to mistype every single time. But now I can just copy that, paste that in, and I get my results, 75. It did a dollar match with a dollar count. Like I said, you can customize the LLM with this tool. So what we're going to do is instead of using that CodeLlama 7B LLM, we're going to use that 13B, about twice the size, 7.4 gigs versus that 3.8. Again, the trade-off here is you use a lot of space locally, but you're doing it locally. Again, demo, this is preloaded, might take you a minute to download that on your machine.

We need to go update our JavaScript file that MongoDB Shell is calling on, so tell it instead of using that CodeLlama 7B library, let's switch over to the CodeLlama 13B. Once we change that, the cool thing is we don't have to recompile anything, right? This is just a flat file within JavaScript, and when we launch MongoDB Shell, it's going to be referencing this new library for us. So we'll launch MongoDB Shell. We'll go back into that demo namespace again, the same place we were before. You can see nothing's changed here. We're still using Ollama. And we'll run the exact same query we just ran, that select count IDs from table with the two where clauses. Remember we had a dollar match and we had a dollar count, slightly different. Dollar match, dollar add fields, dollar match, dollar count. But when we run that, we get the same results, we still get those 75 results back. So customize that LLM.

Well, let's switch over. That was the local version. Let's switch over to the Bedrock version, and just like before, we're going to run our shell script to install the dependencies on this, and this one's going to go a lot faster because you're not installing any local LLM. It's going to reach out to Bedrock. When we run this exact same process as before, it's going to ask us where is that compatibility tool file so it can tell us, give us recommendations based on supported APIs. We plug that in, it very quickly installs everything else. Now our steps, our next steps are slightly different. The first thing we have to do is run AWS configure because we are making calls to Bedrock with this, and we have to have credentials to do that. That second item, ensure you have access, you don't have to do that anymore. About two or three weeks ago we changed that and you have access to all models by default, so it's old already. We need to update that. But you can see it's using Claude 3 for this.

The interaction is exactly the same. We have a TSQL wrapper, so we'll go back into MongoDB Shell, and we can see when we connect that we are now using the Bedrock-powered plugin. We're going to go into the same namespace. We'll run that same query, the count ID from table with these two where clauses. Now we don't have an aggregate, we have a count document, slightly different. So you can see we're getting slightly different results from each of these, but we have the same number of queries. What's cool about this, oh, I forgot about this one. So this one you can see we got results. We left that auto execute false off of it. So when you leave that off of it, not only is it going to give you the translation, it's going to give you the results as well.

This is pretty powerful for somebody like me who cannot write an MQL statement correctly the first 12 times. It saves me a lot of time, but what this can get pretty gnarly, right? We can put some pretty big TSQL statements in here and it'll translate it for us. So for example, we have this select star with a bunch of wheres and AND statements and an order by, and there we go. It turns that into a find. It has some regex searches for us, has some dollar in lists for us, it has some sort criteria. It takes care of all of this for us, so very quickly instead of Cody crying and being upset and yelling at the dogs and children for not knowing what he's doing, very quickly we have the results back there.

So this is really an example of how you can use these Gen AI tools with DocumentDB to improve your developer experience, right? Make things easier and more productive for those internal teams.

Second Access Pattern: RAG Architectures and the Power of Your Data

Next thing we're going to talk about are RAG architectures. Again, dramatic pause, because I'm losing my voice. Now, earlier I told you that there is a key element that is the differentiator for these new Gen AI apps. That key differentiator is your data, right? Your data is what makes Gen AI applications that are unique to your business. Everybody here has access to the exact same foundational models just like we saw. You can get Llama models, you can get Bedrock models, right? But only the folks here that are using their data to build AI apps are going to be creating real value. They're going to be creating something that you can build on that is going to improve your customers' experience and your employees' experience, right?

Your data is the differentiator between just generic apps, something like ChatGPT, and something that knows your business and your customers extremely well. But the good thing is you don't have to build your own model. We have some customers that have three to four decades of documents. They have billions of documents that they can refer back to, and they can hire data scientists to create LLMs. You don't have to do that, right? You can use your organizational data to fine-tune these foundational models through this process called Retrieval Augmented Generation, or RAG. And to get very, very technical with you, what does that mean? You fetch the relevant information, you add that to your context, and you generate a response. That's some level 400 stuff. I'll try to keep it down from there, folks. I apologize.

Give you an example. Let's say that you have this online shoe store and you want to help out with an interactive agent. For example, they say, can I return my shoes? I want to get a refund. This is where your operational data store comes in. So the agent is going to do a fact lookup against your database that has your inventory or your database that holds that order information to get the relevant details. Then it's going to do a vector similarity search against your collection that holds your policy documents. Maybe there's a fee that you charge if they've ordered it more than 30 days ago, or maybe they get a free upgrade if it's within seven days, whatever it is, right? You're doing a similarity search against your data store as that knowledge repository to be able to make a decision to replace their shoe.

That's high level. Let's look at the backend. Let's look at the architecture and kind of walk through that. So number one, that user is going to ask a question there. You're going to have a prompt that handles that, but this is where the interaction starts. Next, there is going to be a repository to hold the conversation history. A lot of times that conversation was part of an ongoing conversation. You have to take the kids to lacrosse practice. You have to go cook dinner or whatever, right? You get distracted, you come back, you want to hold on to that. Next, the application needs to query for your situational data, and that's going to be out of your data store. That's what we were talking about earlier, like the inventory, the order status. Application's going to tokenize that original question using an LLM to generate the question embedding. With that, it's going to perform that similarity search in the vector data store and that's going to use some kind of algorithm, approximate nearest neighbor. We'll see that later. Then it's going to synthesize all of that into an engineered prompt to send back to the LLM to get a response. We have to go update that conversation again and guess what? We end by returning a response. It's kind of a data flow that we go through there.

Now, this is where DocumentDB comes in. So with its support of vector search through vector indexes, your vector search and your data source search are going to run against the same repository because DocumentDB is your operational data store. And on top of that, even though the conversation history is up there by itself all by its lonesome, that can sit in DocumentDB. You just severely decrease the complexity of that architecture with that support of vector indexes on Amazon DocumentDB.

Building a DocumentDB Chatbot: A Comprehensive RAG Demo

Let's go into a demo. Let's take a look at this one. So here, have a black screen. It's very nice. There we go. Here what we're going to do is create a DocumentDB chatbot. Hopefully. First thing we got to do is install some libraries. So the two big libraries we're going to use is that Gradio, and that's just for interaction with it, just for the demo, and LangChain. LangChain is what we're going to use to create our vectors that are going to go into Amazon DocumentDB. Once we have those established, we have to set up some variables for the vector itself. We have to say how many embeddings we need. We have to set some parameters for the index that we'll create in just one second. We're going to touch on those values a little bit later. We won't read too much into it at this point. Then we have to establish the index that we're going to create on Amazon DocumentDB.

We're going to create an HNSW index on Amazon DocumentDB with cosine similarity, and we're going to pass in some of those values that we established earlier. Once we have that established, we can go ahead and create our Mongo client connection to our DocumentDB database. This is using PyMongo, the MongoDB API. We're just setting our pool size and our timeouts.

Then we have to chunk up our data. We have to put our data into DocumentDB with the vectors. What we're going to do is chunk our data into 1000 characters with 200 characters of overlap because we want to keep the meaning. We don't want to lose the meaning. What's the data that we're going to be processing? Well, we have the entire 1500-page developer guide of DocumentDB. We have our data modeling guide in PDF, and that's what we're using LangChain for. We're going to break those apart into 1000 character chunks with 200 characters of overlap and then create an embedding off those.

On top of that, we're going to parse every single DocumentDB blog that's out there, and we're going to grab all that information as well and put that into 1000 character chunks. Once we have all that, we're going to take it and feed that through our model. Oh yeah, we're also going to get the pricing page and the FAQ page and the features page, right? Think of this as your operational data. Now all this is public, but for the sake of this demo, imagine this is your internal information.

We're going to pass that through that Titan Embedding Text V2 model in order to create those embeddings. We're going to keep the original text document, the 1000 characters. We're going to keep the vector content, all those numbers of those characters, and we're going to have metadata. Where did this come from? What page is it from? Things like that. We'll look at it in just one second. Once we have all that, we'll feed that through that model. It'll chunk it up, it'll go through the PDFs, it'll go through the blogs, and it will store all of that within DocumentDB and create that index for us so we can start searching it.

Now, this is our prompt. Here we're telling that chat agent what it's going to do. Hey, you are a Q&A assistant specializing in DocumentDB. These are the rules, right? You have to identify technical details. You have to ensure the answer is accurate. We have to tell it to be clear and be concise. This one, pause here, this one's hilarious to me. Be friendly and helpful. I want somebody to take this and put be unfriendly and unhelpful and give me the results. Get my contact information, change that, and send me the results because I really want to go back and do this myself. What happens if you tell it? Will it actually give me wrong answers? That'd be fantastic.

Anyway, so that's part of the prompt. We've even given examples like here are some examples of how you could respond. We're just giving the template of how you should react to the person that you're talking to. Once you have all that, we're setting some thresholds about how precise we want our answers to be. The score threshold of 0.8 means we want you to be at least 80% accurate in those responses you get back. We set our history. Remember we have a conversation history. We want you to remember the past six things that we talked about. And here, this is just Gradio. This is just the interface which we'll see in a second.

But before we go into Gradio, let's look at those documents in DocumentDB. We'll go into that collection that's storing all these, and we're going to look at some projections. We have about 5400 documents from those PDFs and all those blogs. They got parsed into 5400 different documents. Those documents look like this. You have your text content. There's that metadata that tells you where it came from. For example, this one came from the developer guide PDF. We'll skip a few and look at a different one. This one is about vector embeddings and it came from, funny enough, a blog about vector embeddings. How topical. Again, this is a demo pre-recorded. This is all set up. It's not a coincidence.

Let's look at a full document. What I'm doing is I'm just showing you a couple of fields, but if we look at the entire document of what these documents look like when they're in DocumentDB, we have our text content. There are those numbers. They mean nothing to us. That's what the computer is using for that vector index. We have our metadata, the title, where it came from, the page, the link.

So here, we're going to ask it a very simple question. Does DocumentDB offer serverless instances? This is simple, right? I can just read the dev guide and figure this out, but hey, I have a chatbot. It's fun, it's exciting, it's supposed to be helpful. Remember, I want to see what it's like when it's not helpful. We can make it a little bit more complex, right? We can say, how do I audit logins in the cluster, right? I need to know the user I connected. I need to know the time and the date that they connected, right? Again, this vector search is going to come back and it's going to give me the exact answer. It's going to say, hey, do this to enable it. And by the way, when you enable it, these logs are going to go to CloudWatch, and these are the fields in CloudWatch that you're going to have. And by the way, here's a query that you can run in CloudWatch. If there's something I'm worse at than MQL, it's CloudWatch filtering. And so there you go, you have that CloudWatch filter there.

Increasing the complexity, we're going to ask about indexes. I need a partial index on my customer collection. It filters for this address, state of Texas. Any Texans here? Yee haw, I'm glad.

The system parses the query, looks through it again using our data to augment that LLM, and provides us the information. It tells us how to accomplish the task, instructing us to use a specific partial filtered expression. What's particularly useful is that it explains exactly what it's doing and highlights the key points, giving us additional information. This response can be customized by your prompt.

Next, we ask it to join our orders and customers collection by our top three customers. This was that query we saw earlier. We saw the TSQL tool that can do something similar, but here we're doing it in natural language to get this answer. What I like about this one is that it walks us through each stage of that aggregation, so my old relational brain can understand what it's doing. It starts off by doing a dollar sign lookup, then a dollar sign unwind, followed by a group operation. It uses all that knowledge base of what we fed it to give us this answer that's specific to DocumentDB.

We can get more complex with our questions. The next question we're going to ask, whoever's running this demo is terrible, come on, hurry up. Here's a fun one. We have an e-commerce platform currently using Postgres with separate tables for products, inventories, orders, shipping, and customer reviews, and there are multiple variants and categories. I have one hundred thousand SKUs. Watch this on YouTube later and pause the video if you want to read all that. I'm not going to go through it, but we have this very complex question that we're going to ask it.

Let's feed that and see what it tells us. There we go. Remember, one of the documents we fed it was a schema design document for DocumentDB, and it's going to use that to come back and tell us this is what your product collection should look like. This is what your product review collection should look like, this is what your inventory document and your orders collection should be. Put this in the hands of your developers, and it's going to streamline their interaction with DocumentDB and document databases.

We'll ask another question about a more complex situation. We're saying that we have a gaming platform that tracks player progress and achievements, and we have fifteen normalized tables and a million daily player sessions. We're feeding in information about our rate of operations, and it's taking this into consideration. These are the things I need: real-time player leaderboards, cross-game achievement tracking, and in-game purchase history. We're giving all these requirements and asking it to figure this out for me. This is why I'm such a good programmer, because I have tools like this.

It does exactly that. It gives us a schema, tells us how to interact with it, tells us where to store the data, the schema to store the data, and how that data is going to interact with each other. It takes all that information we fed it to give us a customized response for DocumentDB.

The last one I want to show you is one I think gets overlooked a lot, something that's really powerful. We work with a lot of very large enterprises that work across the globe. We have companies that are based in Nebraska with developers in Spain and London. We have customers that are headquartered in Chile with developers in Argentina and Brazil. It's multilingual, and that's the great thing about this. For example, I'm asking what language is this, does anybody know, anybody speak this? Portuguese. This is asking if DocumentDB is supported in the São Paulo region. What it's going to do is translate that for us, go to our operational data source, and say yes, DocumentDB is supported in the São Paulo region.

Now, if you have developers across the globe using multiple languages, they can interact with a tool like this in their native language and it handles that. It's going to answer them the same way it answers your person in Nebraska, in Spain, in England, in Chile, and Argentina. You get those same answers. For the next access pattern, I'm going to hand this over to Doug. All right, everybody still hear me? All right, thanks Cody.

Third Access Pattern: Model Context Protocol (MCP) Servers Explained

We're going to talk about a third access pattern. Cody talked about the TSQL plugin for Mongo Shell and talked about RAG architecture. Now we're going to talk about Model Context Protocol or MCP servers. How many of you have not heard of MCP? A few, so most folks are familiar with it, some not so familiar. I'll do a little bit of introduction about MCP and how this relates to DocumentDB. To put it into context, like with RAG and the TSQL plugin, maybe you know how to access the data and you're familiar with querying data through Mongo APIs, but you've got data you really need to dive deep into. It's like, where do I start? I can start writing a bunch of queries and things like that, but that's going to take time. Let's take a look at MCP servers and how that might help you.

Just a little bit about MCP, Model Context Protocol servers. If you're not familiar with it, a good way to think of MCP is it's a way to provide seamless integration between your agentic AI components and applications with existing tools or systems. Even simpler, it's just a standardized way to access capabilities that you already have. It is a client-server architecture, and as you see on the screen here, there are a couple of components.

There are four main things to be aware of. First, there's a host. The host is an application that coordinates and manages multiple MCP clients. A host could be an agentic AI application that you've developed. A host may be Visual Studio Code with a client running in it, which is what you'll see in the demo. The point is there are a lot of different hosts, but they host MCP clients.

What MCP clients do is maintain a one-to-one relationship with an MCP server or a one-to-one connection with an MCP server. Your host will understand that it needs to reach out to this server for it to take some action. That will create a client, establish that connection, and now the host is going to keep track of all of the different clients and which servers they're connected to. The server is a lightweight program that exposes some capabilities, typically not a lot, but it exposes some capabilities through a standard interface through a JSON RPC interface. Over on the far right are your existing data sources. They could be maybe local data, local sources like databases or files. They could be remote data sources, maybe it's a web service or another API, but the point is MCP servers can access all kinds of sources whether locally or remotely.

I want to focus on MCP servers here just a little bit more. MCP servers have three different kinds of features: resources, prompts, and tools. Resources are read-only persistent data that the server can expose to clients. Think of it maybe as some files that you have, some static files you want to be able to expose through an MCP server. You can do that. You can define a resource in your MCP server, and now the clients that are running in the host, through maybe a natural language interface, you can ask it, hey, who's the engineering lead for the mobile app team? You have all of the HR information in the static files you've exposed as a resource through an MCP server, and now you can have that kind of interaction with the static resources. Resources don't perform any kind of actions. They're not going to take any action on any other resources or external systems. They're simply read-only.

The second feature of MCP servers are prompts. As you see here, they're predefined instructions. They're templates that the server provides to the clients to understand what tools do you provide, what capabilities do you provide, what does the interface look like, what information do I need to give to you to take this action, what information do you give back to me. Then the tools, these are the things that will actually do the work, so they will take some sort of action. They could read, they could write, but again they're defined by a schema, so it's well-defined what this tool does and the interface to the tool. A really important point on the bottom is that by default execution requires explicit user approval, because most likely you don't want your agentic AI tool just running off and doing all kinds of things, making changes, dropping tables, changing data. Now, as you'll see in the demo, you can set things to auto-approve and things like that, but by default with MCP servers you do need to provide explicit approval.

Now let's just focus on maybe a particular MCP server. This is actually what you'll see in the demo. It's maybe a condensed view of the Amazon DocumentDB MCP server, and we'll have a QR code with a link to it at the end. But as you see here, we have a client that's running in a host. That client connects to the MCP server, the Amazon DocumentDB MCP server, which has all of the different tools that it exposes and the logic needed to implement those tools. For example, there's a tool to allow you to connect to an Amazon DocumentDB cluster. There's a tool that will allow you to list all of the databases in the cluster, list all of the collections in the databases. There's a tool that will allow you to query the data in the database. Again, the idea is it makes it easier for you to maybe start exploring a new dataset or a database that maybe you don't know a whole lot about, what's in there, how can I use it.

MCP Server Demo: Analyzing E-Commerce Sales Trends with Cline and Visual Studio Code

So with that, let's take a look at this demo here. In this demo I'm using Cline in Visual Studio Code. That's my host, and I'm connecting to the AWS DocumentDB MCP server.

If you see at the very bottom there, I've got it configured. I am using the cloud Anthropic Cloud 3.7 Sonnet model. Oh, I forgot to press play. So we're going to take a look at the MCP server itself, right?

There's an Amazon DocumentDB MCP server. In this case, it exposes 16 different tools. I'm not going to go through all of them, but for example, you see one here, a connect tool to allow you to connect to an Amazon DocumentDB cluster. This is already set to auto approve because I've been using it, and that's okay to connect to a cluster. It's not destructive in any way, but again, they will default those as unchecked. Then you can see a description of what that tool does.

Here's another tool, a disconnect tool that will go ahead and disconnect from the database. There's a find tool. This one I don't have auto approved. Maybe I'm not comfortable with letting it run queries before I take a look at them. So again, the point is through your host you'll have this option to approve, auto approve or not.

So now your role is you're a developer in an e-commerce company, and you're tasked with trying to analyze some sales trends from past holiday seasons. Your first step is, well, what's in this database, right? So you can ask through Cline what collections are in this database. Through the MCP server, you can see it's gone out and taken a look at it. It's like, okay, I've got addresses, customers, orders, products, reviews. This looks like it has the information I would need to find what I want to.

So I'll just ask it, okay, well, what are the most popular products over the past few years during the holiday season? Now it goes off and starts working. We're running a new tool that we haven't run before, so the server exposes an analyze schema tool to take a look at the documents and what's in there. I go ahead and auto approve it because I'm okay with letting it just do this every time. So you can see it's going through. It's checking the orders collection, the products collection, the customers collection, taking a look at what's in there. After it goes off and does this a little bit, it'll give us some information.

Now here we go, another new tool, an aggregate tool, right? The MCP server is smart enough to know I've got to now do some aggregations to figure out what are the top products. So I'm going to give an approval to do that, and then we can see here what it's thinking, right? I'm going to group the orders by state, and I'm going to calculate the total revenue per state. Go ahead and sort those and again get the top products and get the details about those products. So I go ahead and approve that, and it'll continue through the process. You can see it's going by pretty quick, right, but you can see what it's doing, the different aggregations and some of the information that's coming back.

So now it's got some of the top products. It's trying to do a find to get the information about these products, but again, I need to approve it because the default is it won't run without approval. So I'm going to set it to auto approve so the next time it'll just carry on. So it's going to go ahead and execute that find. And we see now that it's processing a lot, and here's a report.

Basically, it took all the information it got from DocumentDB and ran it through the LLM to create this report. So we can see the top five states by revenue, the most popular product in each of these states based on quantities. We see Mississippi likes the immediate tub, Maine likes the puny sandpaper, and so on. So we've got all the information that we need there.

Now while you were doing this, you realized, okay, you're going to need to do this every year. So if you're like me, you'll forget how you did this. So just say, hey, give me some Python code to do this so you can ask Cline to do that. You can see it's creating the Python code needed to generate this report. It's generating quite a bit of code, but you can see when you look at it all of the code to handle the different steps, all right? Code to find the states with the top revenue, and then to find, okay, based on state, what's the popular products in that state, right? And you can see the aggregations and queries there for a product.

Give me the product details, right? So you see all of the queries and aggregations that it created to create this report. It's now available in this code. Of course, you'll review the code before you run it, but you've got a really good solid starting point.

Now while you were doing this, your boss came along and said, hey, I know you're working with us. Can you tell us what's going to be hot this holiday season? So again through natural language you can just ask it, based on past products, past performance, past sales, what is going to be popular this holiday season. You see, it's trying a lot of different things. Sometimes you'll see things maybe that don't work, so it gets an error, so it'll go try something else. But the point is it'll get to the point where it'll do the analysis and now it's creating a report of based on the past couple of years and sales, this is what's expected to be popular this holiday season, so you can see relatively easily through MCP servers, agentic AI type of solutions, you can quickly go in, explore the data, generate code, give you a good starting point for your applications.

Best Practices: Choosing Between IVFFlat and HNSW Vector Indexes

With that, let's talk about some best practices in the time that we've got left here. This is probably the first time, well, this is the first time you're seeing these in these presentations, but if you're familiar with vector embeddings on Amazon DocumentDB, we support two different types of vector indexes: IVFFlat and HNSW. You've got two to choose from. Which one do I choose? There's some trade-offs between them, but at a high level, IVFFlat, the indexes are smaller, they use less memory. But they require pre-populated data, so you have to already have your data set loaded before you can create your indexes, and if you do have workload where that information is being updated or you're adding more information, you will need to rebuild those indexes.

HNSW on the other hand, slower index build times, they use more memory, but there's better accuracy, lower latency, and you don't need to rebuild those as your data changes. So again, which one do you choose? Well, it's going to depend on your requirements, but typically HNSW is probably a good place to start. And sum it up, IVFFlat, if you need the fastest indexing, IVFFlat's a good option for that. If you want something easy to manage, better performance, better recall, HNSW.

Just a little bit to give some information about index build times or index creation times. There is some pretty significant differences. So if you're going to be building indexes, creating indexes a lot, this could factor into that. But this particular test was run on an 8XL instance and what you see here is the number of embeddings. I remember Cody showed you the human readable text and then the arrays and numbers, those embeddings, and so we've got one, two, three, four, five, six embeddings. The green bar is IVFFlat. The other bar is HNSW. So you see, like we said on the other slide, IVFFlat, those build times are much lower in terms of seconds compared to the HNSW. So this, the scale here is in seconds, so that's good representation visual for the difference in build times between the two index types.

Now HNSW, remember IVFFlat, you can only build those indexes on pre-existing data, so that's your only option, but those index build times are pretty quick. HNSW, you do have options. You can create the index first and it will be built as you load the data, or you can also create it after. But what you see here, that low purple line is if you create the index first. Those indexes are going to build much faster because you're adding the data, you're updating the indexes. If you're building it after the fact it's going to take longer, and honestly it's the same as recommendations with any index you create on DocumentDB. If you can create it before you load your data because you're going to be able to update that much faster than you can if you're trying to build it after the fact.

So a couple of decision points and IVFFlat versus HNSW. I need exact nearest neighbor search or 100% recall, meaning I need to find the exact match. Don't use vector indexes. Just use a normal index for that because you're basically just doing a point lookup and a quality match. If you want the fastest indexing, IVFFlat. Easiest management, frequent updates, HNSW. Higher performance recall rates, HNSW.

So yeah, this is a good slide to take a picture of if you want, or like Cody said, pause the video when it's out on YouTube. Vector embeddings take up space in your documents, right? I mean this is great. We can add these vector embeddings, but there is a finite amount of space. There is a limit to the size of documents in DocumentDB, right? It is 16 megabytes. So if you've got a lot of embeddings and a lot of metadata, hopefully you're still not near the 16 megabytes, but the point is there is a limit. So you do want to keep in mind the impact on the size of your documents for these vector embeddings.

So pretty simple formula. You see on the top right here, the key length plus 1 plus dimensions times 13. So the key length is in this case embedding. So key length plus 1 plus dimensions times 13. What you've got with the 13 is 4 bytes for an int32 value, 8 bytes for the element length, so for each item in that array we store it in 8 bytes, and then there's a 1 byte terminator. There's this 14 byte value sort of at the beginning, and then each element 8 bytes, 1 byte terminator. All that together gives you the size of that. Or if you just look at the chart, you can see how 100 dimensions is about 1.3k, 1000 is about 10 times as much as you can imagine. So again, just something to keep in mind. Yeah, it's great to have a lot of embeddings in your vectors to get better matches, but they're going to take up more space.

So there's a lot of things to think about, right? Which vector index type to use, all of the different settings on those, how many embeddings, number of connections, and so on. But a better way to think about this is to take a step back. What are your requirements? How many queries per second do you need to support? What's your query latency? Does this thing need to run in 1 second, less than 1 second, 30 seconds? What's your recall rate? Does it always have to be an exact match? Again, vector index is probably not the way to go. Do you need maybe 90% recall rate, or is 50% good enough? Again, depending on the answers to those, it's going to influence which vector index you use, IVFFlat or HNSW, ingestion time. Again, IVFFlat, the index builds are really fast on existing data. So if ingestion time, low ingestion time is a priority, maybe you want to go with IVFFlat, and index build time as well. So these are the things. Better to think, kind of flip it around, think about what your requirements are and work into what your index type and your settings are, work into it that way.

All right, so just a couple more slides here. Some tests on query performance, right, on the different types of indexes. So this is IVFFlat index. What we're showing here is along the bottom, the 20, 40, 80, 400, 600 are the number of probes or the number of lists to search. So an analogy to think of maybe with IVFFlat, you've got people living in apartment buildings, right, and there's a finite number of apartment buildings. Let's say 20 apartment buildings. More people, there's just going to be more people in each apartment building. So that apartment building is like that list, right? So you've got 20 groupings of embeddings, 40 groupings of embeddings, 80, and so on. That's across the bottom there.

So what you see now on the queries per second, certainly the fewer groupings, the fewer probes, the faster those queries are because there's less to search, but our recall is lower. It's about 85%. As you increase the number of probes or the number of groups, the queries per second go down, but the recall goes up. But as you see, once in this case, once you get to about 400, it's kind of the point of diminishing returns. Yes, you can go to 600, get a little bit better recall, but you're getting fewer queries per second. So yeah, just keep in mind you don't need to always go to the maximum, and it's going to vary based on your data set, based on your embeddings. But just realize there's a point where you probably don't really need to go any further. Same thing with HNSW.

Different terms are used in HNSW. They call it EF search. This is the list size when you're querying, how many items you're trying to look at. You see at the top some information about the index there, but it's a similar idea. Not to go into all the details too much, but again, the smaller the lists that you're considering, the higher the queries per second. You can see much higher queries per second. Now we're in the thousands of queries per second with HNSW. They're more performant.

But the recall, maybe that's good enough for your use case, maybe not. Crank up the EF search, but again, you get to a point where you get close to one. No matter how much further you go with the EF search, it doesn't really go up much, but your queries per second go down. The way to sum it up is just test and get the right value where it's close enough to what you need and go with that. Don't always just max it out because it's not necessary.

Key Takeaways: Commit to Experimentation and Leverage Your Data Differentiator

All right, so the best way to sum all of this up. You've seen a lot of information, you've seen three demos, I've talked about different types of indexes and all of these things. The best way to sum this up is the first point there, commit to experimentation early. Just start using these things if you haven't. If you haven't already started using this, start using it, see what it can do, get familiar with it. I know myself, I didn't know anything about MCP servers, and when I started using them, I'm like, holy cow, this is amazing. I love it. So work with it. The more you use it, the more you're going to find use cases for it, ways to solve problems. So start playing with it, start using it.

To the second point there, you may find new ways of working, maybe new ways of developing, new ways of solving problems with these technologies. But at the end of the day, one of the main takeaways here is this is why vector support on DocumentDB is so important. Like Cody said, your data is a differentiator. We're allowing you now to store these vector embeddings with your data in DocumentDB. You've got everything you need right there to allow you to do all of these things that we've showed you here today.

If you want to take a picture of that real quick, there are QR codes that'll take you to the Amazon DocumentDB MCP server and the Mongo Shell TSQL plugin. And I realized my mistake here. I forgot to emphasize something very important early on, and what I was supposed to emphasize early on is for HNSW indexes, you should always create your indexes before inserting your data on a new namespace. Very important. So I don't know if you've been to one of these sessions like this, but if you go to the booth and maybe you tell somebody that, hey, always create your indexes before inserting your data in your namespaces, you might be able to get a hoodie.

All right, so with that, thank you all very much for your time, for your attention. Please give us feedback. Just be candid. We want candid feedback. We want to make these better. We think we're telling people what they want to hear, maybe it's not. Maybe we told you stuff that you don't find that helpful. Let us know so we can adjust and continue to make these things better. So we do take a look at the feedback and we do take it seriously, so please, please provide that. But again, thank you very much. On behalf of Cody, enjoy the rest of your stay here at re:Invent and have a good evening.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community