Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

In this video, AWS leaders Carl Meadows and Mukul Karnik, along with NVIDIA's Corey Nolet, present major OpenSearch innovations. Key highlights include OpenSearch reaching 1.3 billion downloads with 11x performance improvements since the fork, Amazon OpenSearch Service processing over 10 trillion requests monthly for 100,000+ customers, and new capabilities like Automatic Semantic Enrichment for simplified hybrid search, PPL query language enhancements with 39 new features, and S3 vectors integration for trillion-scale deployments. The collaboration with NVIDIA delivers 20x faster index builds using cuVS library and GPU acceleration, achieving 14x end-to-end speedup with 12x cost reduction. Additional announcements include Auto Optimize for automated vector tuning, MCP server support for agentic AI, OpenSearch Optimized OR2/OM2 instances with 70% indexing improvements, derived source feature reducing storage by 40%, and Cluster Insights for performance monitoring.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to OpenSearch: Use Cases and Architecture

Thanks everybody for coming out. We're here to talk about what's new in OpenSearch and the OpenSearch Service. I'm Carl Meadows, Director of Product here at AWS. With me, I have Mukul Karnik, who will be coming up, who's our Director, and Corey Nolet from Nvidia is going to talk to us about how they've helped us with OpenSearch as well.

Before we jump in, it's important to talk about what you can do with this OpenSearch thing. OpenSearch is a very popular platform for a number of use cases. First and foremost, as the name implies, it's a search engine, and so it is super popular for building search applications. More and more, search applications are becoming the foundation for generative AI and AI-powered applications that leverage OpenSearch as a search backend. Additionally, it's very popular with analytics use cases where, since it's a very flexible engine, the data is highly indexed and fast, and you can find the needle in the haystack. It's very popular for observability, security analytics, and general real-time analytical use cases where I can on the fly do aggregations, do searching, and build deep insights into my data.

When we talk about OpenSearch, there's actually a set of open source technologies that power OpenSearch, and then there are the AWS services that we provide for those open source projects. The primary open source pieces, when we talk, are Data Prepper, which is an ingestion pipeline that sits in front of OpenSearch that we deliver as open source and OpenSearch Ingestion Service. Then there's OpenSearch itself, which is the engine, the core data engine and analysis engine, which is delivered via Amazon OpenSearch Service and our serverless offering. The front end for data visualization and exploration is powered by OpenSearch Dashboards. In the service, we deliver this as OpenSearch UI, which is a centralized dashboard that you can use to connect to any of your OpenSearch Service domains or collections, which we'll talk about. Before we jump into the service, I want to talk a little about the open source.

OpenSearch Open Source Foundation: Community Growth and Performance Improvements

OpenSearch is an open source project. Last year, we transitioned it from AWS being the primary steward to transitioning it to the Linux Foundation, and it's been a really great year. We launched the OpenSearch Software Foundation to support the project in the Linux Foundation. We are a premier member. Recently, IBM joined us as a premier member. We have a number of other community members also helping us support this project. The folks supporting the project are actually independent from the people contributing to the project and the actual maintainers of the project. Anyone can participate, and technical merit drives the governance of the project. These folks are the ones that are helping us put together events, do trainings, and really spread the word about OpenSearch to help make sure that we've got a broad, thriving community. In the last year, we've been really thrilled with the progress that we've had with OpenSearch.

Since the initial fork, we're now up to 1.3 billion downloads, which is a lot in four years. We've got, in the last year, over 3,000 active contributions from 400 different organizations. It's not just the number of contributions, it's the deep, meaningful contributions that we're starting to get into the project that's really driving a lot of excitement and momentum around the project.

As evidenced, user communities have sprung up across the globe. If you're in any of these locations, you can look up and find the user community, and you can go to local events and participate with the other folks that are interested in OpenSearch. If there's not a location there, talk to us. We can help you set up a user conference. It's really exciting to see this community growing.

When you think about why people are gravitating to OpenSearch, choosing a platform that's going to be as important to your business as your core search engine and your core analytics engine, these are multi-year bets. You're making a large investment, and so it's really important that you feel like it's innovating, that it's sustainable. I think that innovation, as you can see by the progressively more and more meaningful features we've been able to land in OpenSearch every year, gives people confidence that this is the horse they want to bet on.

This is the project they want to be involved in because of all this innovation they're getting for free by participating in this open source project. So I'm really excited about where OpenSearch is headed and all the things we're doing there.

Another reason is that since we started the fork, we've taken a deep look into the engine on how we can really improve performance. No data engine ever went wrong by being cheaper, faster, or more resilient. Since the 1.X line, which is essentially when we started the fork, to version 3.3, which is the most current release on the service—actually, the most current release is 3.4, but it doesn't release for another couple of weeks—we've had an 11x improvement in overall search and aggregations. We've also seen a 2.5x increase in our vector search functionality. We're not done yet, so we're going to continue to drive greater efficiency, better performance, and lower costs on the engine as we increase capabilities, which we'll talk more about as we go. I'm really excited about what's going on upstream.

Amazon OpenSearch Service: Operational Excellence at Scale

Here at AWS, we take that project and deliver it to you as Amazon OpenSearch Service, which may be a little biased, but I think is absolutely the best way to consume the open source project. We wrap the service up and provide you operational simplicity, all of the features and benefits you get in the open source with none of the pain of managing it. We offer cost efficiency, and it's deeply integrated into the AWS ecosystem, which makes it super easy for you to integrate into the rest of your AWS operations.

As evidenced, over 100,000 active customers use the service today. We process more than 10 trillion requests per month on the service. It gives you self-managed patching, one-click upgrades, 24/7 monitoring, self-healing, and no downtime. You can completely change your topology with a click of a button and no downtime—we'll do a blue-green deployment and move you to a completely new environment. It provides deep security features like fine-grained access control, encryption keys, and audit logging, all the things that you need to run an enterprise-grade application. We support multi-AZ deployments with an SLA up to four nines on a single cluster. We've built a lot of innovation on the product as well, on top of the open source, for delivering great economics, such as our UltraWarm features and specialized instance types like the OpenSearch Optimized instances that we'll talk about, which give you a fully resilient environment. So if you're running OpenSearch, I believe we're the place to do it.

OpenSearch Ingestion: Serverless Data Pipeline with Enhanced Connectivity

As we jump in, when you think about the landscape most simplistically, you've got a bunch of data that could be in the form of documents, logs, or in a transactional system. It could be embeddings, it could be pictures and videos, and other things we could put on here now too. I want to take that data, process it, and then eventually land it on OpenSearch Service and then vend out the use cases from OpenSearch Service to my end customers.

When we talk about that first part, the data ingestion part, I mentioned Data Prepper and the OpenSearch Ingestion service. What this is is a very simple serverless capability that allows you to select a source. It provides a buffer and processing and transformation on that data, and then we'll sync that data into OpenSearch, into a managed cluster or a serverless collection. It can also sync data into Amazon S3 as well if you want to do routing such that all your raw data goes into S3 and you just want to put aggregations into OpenSearch or things like that. You can do that in OpenSearch Ingestion very easily.

Some of the things we've done in the last year with OpenSearch Ingestion— I mentioned it's a serverless product, so it's powered by OpenSearch Compute Units. You don't have to provision anything. These automatically provision for you and scale up and down based on the traffic that they're processing. We increased the memory of those at no additional cost to 15 gigabytes, so that even the largest aggregations or complex computing jobs can be done on those pipelines without having to scale out additional pipelines. We also enhanced the auto-scaling so that it's more responsive to different types of changes in demand by getting more signals.

When you look across the capabilities of OpenSearch Ingestion, you'll see there's a wide range of sources that we support. It supports both push and pull sources, so you can push to it through an HTTP endpoint or an OTEL endpoint, but it can also pull from systems like DynamoDB, DocumentDB, RDS Aurora, RDS Postgres, Elasticsearch itself, and OpenSearch itself. One of the most popular use cases is actually pulling data out of S3 and processing it in OpenSearch.

We've added RDS and Aurora support, as well as Jira and Confluent connectors, and we're going to keep adding connectors to make it really easy to get data into OpenSearch. Once the data is in the pipeline, you can do a large amount of processing on that data to enrich, select entries, do conditional routing, drop, aggregate, and perform OTEL processing. We added a Lambda processor this year, which allows you to call out and do anything you can write in a Lambda function on that data stream. If none of these processors work or you want to do something different, you can leverage that Lambda to enrich the data from another data source outside the service, or you could write a custom connector.

We also added batch AI inference to make it really cost-effective to build embeddings off this pipeline. Then, as I mentioned, you can sync that data into a managed cluster, serverless collection, or into S3. Another feature that I think has really improved usability, which we launched this year, is an improved user experience that provides a guided visual workflow. It turns out that not everybody just wants to write YAML, so making this easier to configure these processors gives you high confidence that the permissions are set up right, the processors are configured properly, and with a few clicks, the data is going to be delivered in the format that you want while building these pipelines.

Log Analytics with OpenSearch UI and PPL: From Discovery to Root Cause Analysis

Alright, and so with that, I'm going to turn it over to Mukul, who is going to walk us through logs. Yeah, thanks Carl. Folks can hear me, right? Yep, cool. So one of the impressive things in Carl's talk was just the sheer momentum we're seeing in OpenSearch. Getting to 1.3 billion downloads in four years is an achievement, so the project is really growing fast, and we're seeing the momentum grow even faster. So really excited to see that.

Before I get into logs and after that search, I wanted to just get a sense of how many folks have used OpenSearch for logs use cases. Okay, so a few out there. And how about search vectors? Okay, so kind of more on the search vector side. Good to see that. So, as we know, log data continues to increase, and with generative AI and agents, it's increasing even more. So why do we collect all this data? We collect all this data because downtime happens in software, and when there is downtime, you want to reduce the MTTR, or the mean time to respond and come back online. To do that, you need to debug and get to the root cause quickly, and that's why you need all of the telemetry to get to that root cause.

I think Gartner did a survey a couple of years back and found that on average, a company has 87 hours of downtime and spends about $3.6 million in lost revenue or sales. So it's really important to reduce that downtime and get to the root cause quickly. To do that, you typically need a single pane of glass. The log data can be in different places. Some data can be in CloudWatch, some of the data is in OpenSearch, and some of the data can be in S3 because it's just too expensive to index it.

What we launched last year is OpenSearch UI, which takes the OpenSearch Dashboards open-source project and is a fully managed SaaS experience. It now connects to not only just OpenSearch, so you can connect to multiple OpenSearch clusters, but you can also connect to data in CloudWatch, you can connect to data in S3 and Security Lake, and have a singular view from this one endpoint where you can visualize the data in a single dashboard.

So it's really powerful to be able to do that because data exists in different places and it's expensive to bring it into one place. This is a set of new things that we've launched like the CloudWatch integration and the Security Lake integration. We've also been improving the experience of the OpenSearch Dashboards UI. So, how many folks here are familiar with the Discover experience? Not many, I guess, but we've been really improving this experience and the key area where we've improved is being able to type in a query, which is a PPL query, and I'll jump into what PPL is. You can get visualizations as well as logs so you can get the raw data which is the logs or you can get aggregations and visualizations and you can get trends within your logs pretty quickly with this interface.

And the key power here is the PPL queries that we've implemented. So PPL is a Piped Processing Language. It supports different kinds of commands, search, deduplication, and all kinds of commands to basically extract, filter, and transform log data. And so with this you're able to deep dive into logs and get the insight that you're looking for. And over the last year we've spent a lot of focus on improving the PPL capabilities, adding a lot of new analytical capabilities to PPL. So we've added the ability to join indices. So if you have log data in two different indices, you can now join them using PPL. You can do additional lookups and filters and so it's a pretty powerful set of capabilities.

Just this year we've launched about 39 different capabilities and in the OpenSearch 3.3 version that was released last week, you'll find that all of these capabilities are available and it would be a really good time to try this out. And I'm going to jump into a demo of how the new dashboard experience along with the Discover and these PPL capabilities can help you get to what you're looking for in your logs. So here's the new OpenSearch Dashboards experience we launched late last year and since then we've been improving on it. You have this concept of workspaces. Think of workspaces as a unit of collaboration. So if you have a team and you want to collaborate on an observability use case, then you can create an observability workspace.

We have other workspaces like Security Analytics and search workspaces, but let's click into an observability workspace and see how things have changed. So when you click on it, you land on a new, newly designed Discover experience where at the top you have a place where you can enter the PPL commands and then you can quickly look at the logs and visualize the trends of the data in those logs. So that's at a high level how it works. Now let's say you have an issue going on, then you can quickly type in a PPL query where you're searching for error in your logs. That's effectively what you're doing. You type in that query and then you're able to visualize, okay, here are some of the errors in my logs, and then you can click on the AI summary and understand an LLM summarized view of what is going on in your logs.

As you read through that, you quickly realize that the errors in your logs are actually nested into a body message and so it's becoming hard for you to debug what's going on and you want to maybe extract the error message from the body. So you can go and update your PPL command to extract fields from your JSON object that is there in your logs and with that updated command now you're able to better pinpoint all the errors in your logs. So initially it looked like there are many different log messages, but it boils down to this target error message that is coming and so now you want to figure out, okay, looks like there's an error that's happening. Is it a big problem or not? So you want to be able to do stats on it and so you can quickly do stats and find out, okay, how often is this happening and looks like it's happened 70,000 times, so it's a significant issue and you're probably going to want to know what's going on. So now, what do you do? You want to ideally

find out which service is causing this issue and how can you get to the owner of the service. You can join this logs data with maybe service metadata that you have, and with that you're able to find out, okay, these error messages are actually coming from the load generator service, so maybe it's not as critical as a production system, but still, you can quickly find out what's causing these errors.

Then you're able to take those error messages and find out, okay, when did they start, so you can have a time span and find out these error messages start at a particular time and you can use that to engage the owner of the service and get it resolved. You can also take the visualizations that you've generated here and add them to your dashboard so that way the next time you can just directly go to your dashboard and look at the errors. So it's a pretty powerful way to get to the root cause of the errors that are happening in the system and debug it using PPL.

Now, many users are not familiar with PPL, so we also support natural language which is pretty intuitive where you can just type in a natural language query and it generates the PPL and of course gives you back the response. One of the things that we've also been doing is adding MCP capabilities, and I'll talk about it a little bit later, to OpenSearch. So if you have an OpenSearch backend and let's say you have an agent that you build that's looking through all your logs and different systems, you can use the MCP server of OpenSearch and pass in these natural language questions or the PPL questions. The rich analytical capabilities, all the functions, the joins, all of that are available through that MCP integration, and so you are able to get to that root cause pretty quickly without having to even maybe interact with OpenSearch Dashboards. So pretty powerful core capabilities that we all can leverage for debugging logs.

Evolution of Search: From Keyword to Hybrid Search with Automatic Semantic Enrichment

So that's the logs overview. A lot of improvements and many coming over the next year as well. So if you have not looked at OpenSearch for logs, you should definitely look at it. Now, search and search and AI. There's a lot of innovation going on in generative AI, a lot of excitement, a lot of new models coming out, and search is changing in this world of generative AI. If you think about it, at the core, this problem of search is a core problem of information retrieval. You have different kinds of information, you have documents, you have images, you have videos, and the goal of search is to really be able to extract that information so that you can get your insights quickly. To do that there's several techniques, right? If you looked at about 20 years back, keyword search was a big thing and typing in the keywords and finding relevant information was really useful.

Then as machine learning models came along, you're able to do semantic search, and with that you're able to find not only the exact keywords but similar sentences that had semantically similar meaning, so it was very helpful to get semantically relevant information from your data. Then came what we realized is actually keyword works really well for 70 to 80% of the use cases. So combining keyword search along with semantic search and developing hybrid search was the next innovation that happened in search. These days we see hybrid search deployments across many customers. Many customers are using hybrid search for all kinds of use cases, whether they're searching for data in their internal systems or whether they're powering their external facing user applications, so a lot of use of hybrid search across different systems.

We also see customers using OpenSearch for new use cases, including agentic search where they want to not only leverage that data that's available in OpenSearch but also leverage data in other systems outside OpenSearch. So we'll go into details of all of this, but that's how search is evolving and a lot of innovation happening in a very short time frame. So let's look at a typical search workflow these days, right?

You have your data or information, images and documents. You take this and typically ingest that into and index that into OpenSearch. You also use a machine learning model to generate vector embeddings, and you store those vector embeddings in OpenSearch. So you'll have a lexical index and you'll have a vector index, and you store that information in OpenSearch. Then when you get a query, you're able to use both these indices to retrieve the relevant documents and then do re-ranking if necessary and surface those documents or the results to your user. So that's a typical search workflow that we see.

The challenge is it is pretty complicated. You have to figure out which model to use, how to host the model, then you have to generate all these embeddings, store them in a separate index along with the lexical index, and then figure out how to combine these results in an intuitive way. So it's a pretty complicated setup.

One of the things we launched pretty recently was Automatic Semantic Enrichment. Think of it as an out-of-the-box semantic search and hybrid search capability for your use case. In this case, you're simply sending your documents to OpenSearch. In the background, we are using a sparse neural model that we host and manage, and we generate the vector embeddings and the semantically enriched documents that we store along with your data. So you don't have to worry about hosting, you don't have to worry about paying for the GPUs when you're not indexing. So it's pay-as-you-go, but it enriches your document. Then when you're querying, you're able to leverage the semantically enriched index and get better search results.

What we've seen in our testing, and we've used different benchmarks, is that this technique actually does really well compared to even the other embedding models that are out there. So it's a really easy way to get started with semantic search and hybrid search for your application. If you're not using any of these techniques, there's a very simple way to get started.

Let's dive deep and understand how this semantic enrichment works. So let's say you have a text document. In this case, we have picked the Cricket World Cup which happened last year, and we've got a bunch of documents related to that. During ingestion, when you ingest those documents, we call a machine learning model to expand the vocabulary and generate semantically similar words to what is there in the document and store that along with your regular data and index. When the query comes, we are able to leverage this semantically enhanced, think of it as a synonym dictionary, to be able to give really relevant results for use cases. So it's a pretty powerful technique and a pretty simple way to improve the accuracy of your search results.

Scaling Vector Search to Trillion Scale: Disk-Optimized Mode and S3 Vectors Integration

So let's talk about OpenSearch as a vector engine. All these machine learning models generate vector embeddings, and you want to store these vector embeddings someplace, and OpenSearch is a pretty popular vector database these days. The reason is we started building vector capabilities in OpenSearch in 2019, and at that time there were very few machine learning use cases for things like personalization and similarity search. We needed maybe 10 million vectors, which were sufficient, and so OpenSearch started with that. As years progressed in 2021 and 2022, that grew to a billion vectors, and we're supporting more of the BERT kind of models that were out there, still a reasonably small size.

Then in 2023 and 2024, as Generative AI became really popular, we started seeing a lot of use cases for larger workloads, and OpenSearch was now supporting up to 100 billion vectors. Amazon's fraud detection system uses OpenSearch for up to 70 billion vectors, and so it's a pretty large use case where OpenSearch scales really well.

More recently, many customers we see are wanting to index and create vector embeddings for all their documents, and that makes sense because all these models are pretty powerful and you want to leverage and get the most out of your data. To do that, you want to generate these vector embeddings. Now we support up to 1 trillion scale on OpenSearch, which is really useful for the large use cases for using vector embeddings.

But doing that at that kind of scale can be expensive, right? So you've got to be able to do it in a cost-effective manner and quickly. To do it cost-effectively, of course, if you put all of your data in memory and use exact K-NN search, you're going to get the best results, but it's the most expensive. The next thing you can do is use an approximate technique, keep the data still in memory, and that approximate technique will help you reduce the amount of data and amount of compute you want to use and reduce cost.

Then we introduced the disk mode early this year that lets you keep the vector data on disk and only keep quantized data in memory, and that reduces your cost even further. Just yesterday we announced the general availability of S3 vectors and the OpenSearch integration with S3 vectors, and with that you can keep your data all in S3, the vector data, and reduce your cost even more. So as we get to this trillion scale, there are different ways to reduce cost and get to that scale.

So let's look at the disk-optimized vector mode, right? The way this works is you take the high-dimensional vectors that you have and you use different byte quantization techniques to generate vectors of lower fidelity. So you get up to 16 or 32 times compression, and you keep those vectors in memory and you keep the high-precision vectors on disk. Then when you get a query, you first do a K-NN search or approximate search in memory for the byte-quantized vectors, get maybe 1,000 results instead of the 10 results, and for those 1,000 results you retrieve the exact high-precision vectors and do an exact K-NN.

So with that, it's a two-pass kind of method, but you still get the higher recall that you're looking for, and you don't have to keep all of your data in memory. The only caveat is that it adds a little bit more latency. So if your workload can tolerate a little higher latency, this works out pretty well.

The other way to do this is to use S3 vectors, which we just announced, and there are two kinds of integrations we have with S3 vectors. One is you provision a cluster and you can now pick S3 engine as a vector engine in OpenSearch. Once you pick S3, when you send us the vector embeddings, we'll store them in S3 instead of storing them in OpenSearch. You can still keep the lexical index in OpenSearch. That way you have the vector embeddings in S3, the lexical index in OpenSearch, and when you get a query you're able to do the vector search using S3. You're doing the regular search using OpenSearch, and you can combine those results and still do hybrid search and get all of the rich OpenSearch functionality while getting the low cost of S3 vectors.

This works really well for use cases where you don't have high throughput. At higher throughput you may want to keep all of your vector data in OpenSearch as well, but for low to mid kind of queries per second use cases, the S3 vectors will give you an improvement in cost. So that's one way to use S3 vectors if you want to.

The other way to use S3 vectors is to keep and take the data in S3 and bring that into OpenSearch. So you could choose to, for example, bring only the latest data or data for a particular category or a segment into OpenSearch and use that for vector search. That way you're not bringing in all of this information into OpenSearch, and that's another way to use OpenSearch in combination with S3 vectors. So these are the two kinds of integrations we have available.

GPU-Accelerated Vector Indexing: NVIDIA cuVS Partnership for Cost-Efficient Performance

With that, I want to bring Corey on. He's going to talk about how NVIDIA has helped OpenSearch scale to the Australian vectors. Awesome, thank you, Mukul. Hey everybody, can you raise your hand if you can hear me? Just a quick sound check. Awesome. Nice, kept these seats full. Yeah, Corey Nolet. I'm a principal architect for vector search and various machine learning libraries at NVIDIA.

This is no surprise that vectors are the language of AI today, right? Unstructured data, as Mukul pointed out, one trillion vector scale is becoming more commonplace. You know, even just as recently as six months ago, I would hear trillion scale thrown around by maybe two or three different organizations, and I'd hear it maybe once every three or four months. Now I'm hearing it literally weekly. People are really paying close attention to this. A lot of organizations are in the process of kind of dipping their toe in the water, and we're really seeing this trending up. If we can get to 100 billion scale, we can get to one trillion scale, right? If we can get to 10 billion scale, we can get to 100 billion scale. So growth has been exponential that we've been seeing since about 2017.

What a lot of people don't realize, and Mukul kind of alluded to this as well, is that vector search indexes are not like traditional database indexes, right? They're approximate, and when you make something approximate, that now means that you have to model it, right, which now means that it's a machine learning model of some sort. So, you know, traditional database indexes, if we don't tune them properly, right, we still get the correct results. We just may not get them back quite as fast, right? Vector search models, we have to consider the trade-offs, right? For a more accurate model, we may not necessarily get the best indexing throughput, right? We may not necessarily get the best search throughput, and we have to make these trade-offs. Well, these trade-offs can be fairly expensive. They can have a huge impact on both cost and performance. That's the big deal here, right? And so by utilizing GPUs, we can try to work with some of those trade-offs and make them more manageable. We can make them more cost efficient. We can make them higher performance, right, especially at trillion scale.

And so our partnership with the AWS OpenSearch team kind of formed around these four challenges that we're finding, right? And I'm going to walk through in the next eight minutes or so how we constructed our solution to address these four challenges. The first challenge being index builds, right? Indexing one trillion vectors, building machine learning models for one trillion vectors can take a long time, as you can imagine, right? It doesn't necessarily scale linearly in all cases, right? Interoperability is a big deal too. Something a lot of folks don't realize is in agentic AI and RAG workloads, most of the time, at least at the present moment with our current technology, the actual vector search lookup is not often the bottleneck, right? When you have an LLM in the mix and you might be reaching out to a different remote service to inference with that LLM, or even if the LLM is local and it takes several hundred milliseconds, that's usually orders of magnitude longer than the vector search. If I could do the vector search in zero time, I'm gaining nothing, right? So having a GPU to do that, allocated for that period of time and sitting idle for most of the time doesn't make sense, right? So it's important that we're able to interoperate, that we can build indexes really, really fast on the GPU and not lock you into GPU for search.

As Mukul also pointed out, mixed types are a big deal, right? We're not often just doing a semantic search. We might need to do a structured search, combine that with a semantic search to improve our results or even just to make our results, right? We might need to query from, you know, from geo coordinates within a certain bounding box and do a filter by a certain age group before we do our semantic search. And then of course last being cost efficiency. We don't want to have to spend a lot more money to be able to do this on the GPU, right? We want to be able to really have a mix of both. So we've been working for the past couple of years with our AWS OpenSearch team to solve these four challenges.

So I'm going to put the solution together into a little layered stack like this. I'm going to start with the index build, which you probably recognize as green to signify NVIDIA, so it probably means that it's being done on the GPU. A couple years ago we created a library called cuVS, using our standard CU prefix for CUDA Vector Search. Why another library? Well, CUDA can be challenging to write. It can be expensive to write because it's challenging. It can be very low level. You can spend a lot of time optimizing an algorithm in CUDA, and then a new architecture comes out that introduces several new instructions. A new library comes out with a nice abstraction. Now you've got to rewrite or refactor your code, so you're constantly playing this catch-up game. So one of the big benefits of having a library that you can just pull off the shelf that can provide the building blocks for implementing vector search, whether that's directly in an application or inside of a database.

It allows the manufacturers of the GPU and the creators of the CUDA versions to maintain this going forward, making sure that you're always getting the best performance and the best cost efficiency out of the hardware and the software. It's fully open source and Apache 2 licensed. The goal is to provide both the building blocks and end-to-end algorithms, so it can be used in applications but also integrated into databases. We build on top of all the foundational libraries that you would probably be used to if you were doing CUDA development.

Most of the algorithms that folks are using today, especially off the shelf in a lot of the databases, are on CPU. We're working to change this, but most of the algorithms that you're using today are on CPU. One of the standard algorithms is the HNSW algorithm, a graph-based algorithm with really fast search but not so fast build. HNSW is not foundationally a GPU-centric algorithm. It utilizes multiple threads locking on a centralized data structure to try to have low latency for insertions into a graph. We kind of had to go back to the drawing board, and we built an algorithm that we call CAGRA from the ground up for the GPU to be able to do the construction of this graph in one big batch with minimal to no locking.

CAGRA is proving to be a pretty useful algorithm. It's a little bit different fundamentally. The H in HNSW stands for hierarchical, but CAGRA is not hierarchical. It's a flat graph. The SW stands for small world, and CAGRA is not a small world graph. But it turns out that it is navigable enough that we can convert that into an HNSW graph. Now we find that as the number of dimensions increase, like today's embedding models are getting kind of out of control with the number of dimensions that we have to use. We can compress them down and make it a little bit better, but they're still kind of out of control though.

We noticed that the gap increases when we're building indexes as the dimensionality increases and as the scale increases. The more vectors that we need to index, and also as the quality of the model increases, we notice that gap increasing even more. So if I want a model that can give me 99% recall versus something that gives me like 80 or 85%, then we're going to notice that gap increasing even more. Unsurprisingly, the lowest rung on our stack diagram here would be the cuVS library that we've provided to solve the challenge of index builds.

For the challenge of interoperability, we can now convert this CAGRA graph into an HNSW graph on the CPU so that we can search on the CPU without losing any quality and without losing any latency. This bottom chart here can show you that you can actually benefit in some ways on getting better latency by doing this conversion. But this is a big deal because we can build indexes 20x faster or more on the GPU, and then we can convert them so you're not really losing anything. That still presents a little bit of a challenge though.

The Faiss library has for several years now pioneered vector search on the GPU before we called it vector search. It was approximate nearest neighbors, and this was an approximate nearest neighbors library that came out of Meta. They've done this stuff for a long time and have done a great job at it. Their algorithms are really well optimized for both CPU and GPU. We have been collaborating with the Faiss folks for about the past three or four years. We will eventually swap the Faiss classical GPU backend out for the cuVS backend, and we're working towards that. But at the present time, we have a backend for cuVS for Faiss, and this provides that seamless interoperability so you can build an index on the GPU and then search the index on the CPU.

This is another one of those missing pieces, this interoperability piece, that we can put in our rung right here. So the Faiss library kind of became our solution to that. It was kind of delightful to find out that AWS OpenSearch had already invested in a Faiss backend for OpenSearch, so they were able to reap the benefits of that interoperability mostly by flipping a switch to now use the cuVS backend on GPU for building indexes.

Mixed workloads is a big deal. It's not enough to say that I can do a semantic search now. I have to have a solution that's going to let me do that hybrid in between, doing the structured search, doing the semantic search, and doing the sparse lexical search along with the semantic search. The Amazon OpenSearch Service has enabled this. They have pulled out the standard approach of building the index and then searching the index in the same process. They've pulled this out into separate processes now so we can offload index building to a different instance if we need to, and that's really a big deal.

It's not good to have a GPU that I have to pay for all the time if it's going to sit idle most of the time. There are many reasons why I might need to build and rebuild an index, especially in the foundational architecture here, which is Lucene. I might want to adopt a new model, which means I need to rebuild my indexes. I might want to do some level of tuning of the parameters so I can find the recall and latency trade-off that I need, which means I need to build new models and reindex. So for the mixed types, we've adopted the OpenSearch Service.

Now this is kind of the big cinematic climax here, right? In order for me to be able to extract a cost benefit out of this, I need to be able to give the GPU back when I'm done. I'm not building indexes constantly. I might be building indexes constantly, and at that point that's great because I can have my index service running. However, most people are not building indexes constantly. They might have a continuous ingest going at a fairly low volume. But the ability to give that hardware back when I'm done is what really makes this cost efficient, and this is a big deal. This is something that our OpenSearch friends, our collaboration with the OpenSearch Service developers, has done that is completely novel. This hasn't been done up till this point, and we're finding an extreme benefit here.

So there's some benchmarks here showing cost and speed over on the right. End to end, if I'm just building the index, I can get a 20x speed up, and that's not including having to ship the data to a serverless infrastructure or having to ship the model back afterwards. Everything said and done, end to end, I still see a 14x speed up. I'm seeing 12x cost benefits out of this. This is a big deal.

So kind of putting this all together, I can get faster index builds with the cuVS library on the GPU. I can build on the GPU and I can search on the CPU with the Faiss library. Putting this all into the OpenSearch Service allows me to do this with mixed types with an end-to-end database, and then having the new OpenSearch Serverless GPU allows me to give back the GPU computing when I'm done so I can reap the benefits without having to pay for it running all the time.

So summarizing this again, these are the four challenges that we've solved here. What we're noticing with cost efficiency, which I think was just announced at Matt Garman's keynote, is that we're seeing up to 10x faster performance. That's an average here at about 375% lower cost.

AI-Powered Search Capabilities: Auto Optimize, MCP Server, and Agentic Memory

So thank you everybody. Thanks Corey. Really exciting to partner with NVIDIA on this. We are seeing some amazing outcomes. Like one metric now is that we can build a billion-scale vector index in under an hour. Previously it would take maybe even sometimes days to build, and now we can build it under an hour. So really exciting to see that. Let's continue on some of the search innovations.

So as you all probably know, building a vector index can be complicated. Corey talked about the trade-offs between latency and recall, and then you want to also factor in costs. You have all these different parameters and different modes to configure, and so it can become challenging. Usually you have to build it, evaluate, and then again, okay, this is not working, so you go back to the drawing board, and it's just kind of rinse and repeat of different parameters, and that can take time.

Typically what you're looking for is a trade-off between latency and recall, and cost is a factor as well. The thing is, the challenge is these parameters behave differently on your data. If your workload and your data has certain characteristics, it will behave differently than other workloads, so you cannot even generalize it. You really need to use your data to figure out what parameters work well to give you the recall that you're looking for and the latency profile that you're looking for.

To kind of address some of these challenges, what we launched yesterday is Auto Optimize. So Auto Optimize is a workflow where you upload your data. We take that data and then run through a bunch of different experiments to find out which combination of latency and recall that you've specified gives you the lowest cost option for that. So if you're okay with higher latency, we'll recommend a disk-optimized mode. If you want really high recall,

we'll recommend some particular hyperparameter configuration. We'll run all the different configurations, and this job will provide you an output that you can directly apply to your cluster and get the best outcome that you're looking for in your vector tuning without having to spend days trying to figure out what to do. So this will really help accelerate your proof of concept to production for vector use cases.

As I was saying, we parallelize all the different combinations that you want to evaluate with the GPU acceleration capabilities. We are able to really do that quickly and give you an output within an hour. So that's kind of a pretty important launch.

Looking back at the overall OpenSearch stack, the stack has come a long way from traditional search. At the bottom of the stack, you have Lucene and different ways to get different engines for OpenSearch, and now S3 vectors as a new engine. Then you have different use cases. You have all the search use cases like hybrid search, multimodal search, and semantic search. And then, of course, you have the vector database capabilities of OpenSearch, and you can do exact K-NN and approximate K-NN and tiered storage in there.

We are also building many of the AI-powered use cases, so agentic capabilities in OpenSearch, MCP server, and I'll talk about some of that. And we are also building tools to be able to improve your experience, so being able to have connectors that connect out to different services and AI Workbench where you can build different workflows. So a lot of this is now part of OpenSearch. You all should try it out for your next application. OpenSearch has come a long way from just a traditional search engine.

One of the areas we are seeing a lot of interesting innovation is in the agent world. Search is playing a pretty critical role as you all build agents. What we are seeing is agents need context, and agents are very iterative. You ask a question, and they're pretty iterative in how they decide what plan to use and execute. Agentic search is very different from a RAG kind of use case.

A traditional RAG use case is you have a query, you go to a vector database, get additional context, and you take that context and pass it to an LLM. But with agentic search, what you're able to do is, given a query, you take that query and you do some amount of reasoning. You use different MCP tools to get additional context. Then you use some data from short-term memory and long-term memory to get more context. Maybe call an LLM, get a plan back, refine that plan based on the additional context, and so it's an agentic loop that you have to execute. This requires a lot more tools than what we have.

I'm excited to announce that we have many of these tools available. So we have an MCP server, we have agentic memory capabilities, and we also have some specialized agents that we built in OpenSearch. The MCP server is pretty standard. I think most of you, almost everyone, probably knows how an MCP server works, but OpenSearch has an MCP server capability that you all can use. We have different sets of tools like list index and search index that you can directly call using the MCP protocol, and your agents can directly access your data in OpenSearch. We integrate and provide authentication and integrate with popular frameworks out there.

We also launched agentic memory capabilities in OpenSearch. So let's say you're using OpenSearch for vector capabilities and you want to store either short-term or long-term context for your agents. You can now do that with OpenSearch, and you're able to search through that memory using OpenSearch's search capabilities. So it's a pretty powerful way to store both the short-term as well as long-term memory, and we have different ways to manage the temporal capabilities. So if you have time-sensitive data, you can delete that, so it helps you manage that as well.

Finally, we've also launched three different agents. We've launched the Flow agent with OpenSearch, which gives you a sequential kind of execution capability. So if you have a single-turn workflow for search, you can use the Flow agent.

You can use a conversational agent for more of a multi-turn capability, and the LLM in this case reasons about which tools to use and gives you that ability. We've also launched a plan, execute, and reflect agent that leverages the LLM to plan an execution flow, get additional context, and do the deep research work that some agents do, so you can build a deep research agent using that capability. So some pretty powerful capabilities in the latest version of OpenSearch for building agentic search capabilities.

Service Infrastructure Enhancements: Cluster Insights, Optimized Instances, and Serverless Expansion

With that, Carl can talk about the infrastructure enhancements. Thank you, Nicole. You guys can still hear me, right? So we've talked a lot about the capabilities in OpenSearch. Before we close, I'm going to talk a little bit more about the benefits of these on the service. So mention the benefit of Amazon OpenSearch Service, the scale, you know, these 1 trillion vector indexes. We do support up to 1,000 nodes in a single cluster, up to 25 petabytes in a single cluster, with high availability 99.99% SLA on a single cluster, durability of 11 nines when you use OpenSearch Optimized Instances which are S3 backed, and all of the performance benefits that we highlighted earlier.

Another feature I'm really excited about is we launched Cluster Insights very recently. This Cluster Insights feature, when you're managing a cluster, monitors the cluster. It can give you insights into how nodes are performing, how shards are performing, give you recommendations for better tuning and optimizing your cluster, queries, hot queries, hot shards, and all this just makes it much easier for you to make sure your cluster is operating at peak performance with detailed recommendations and insights. This is now available for any cluster running 2.17 or greater. You'll see a Cluster Insights link that takes you to the OpenSearch UI where this panel will show up automatically.

Few other things I wanted to call out. The OpenSearch Optimized Instances, I mentioned this before, these are high indexing, high throughput instances that are backed by S3. We launched the OR1 last year. This year we launched our second generation with the OR2, and we also launched our first M series of the OpenSearch Optimized, the OM2. The OR2 has a 70% improvement in indexing throughput over an R7G. The OM2 has a 66% improvement over an M7G, and we'll continue to see improvements on those generations, so definitely check those out if you have not already and you have high indexing workloads.

We also added a derived source feature. What this does is it can cut your storage in your cluster by up to 40%. So OpenSearch traditionally keeps source, which is the JSON data around. It needed that JSON data in case whenever they needed to do segment merges and shard re-indexing and other cluster operations. With this, we drop that. We actually don't use the JSON data for any of those operations anymore. We actually perform those operations based off of the pre-indexed data or the derived source. This not only means I don't have to waste 40% of my cluster on that source data anymore, it also is much faster, 20% improvement in the indexing and merges because the data is already pre-indexed, so I don't have to index that data again to do those operations. So definitely just check that box, turn that on, save you quite a bit of expense.

I also want to call out the custom plugins feature we launched last year. We added scripting plugins. If you have custom plugins, you can now run them on the service, and we're continuing to expand the capabilities of the classes we support there. So love to get your feedback on those features. I'm running pretty close on time, so I'll go quickly. On the Amazon OpenSearch Serverless, which is the easiest way to get started with OpenSearch, with the serverless, I just create a collection. I don't have to worry about shards. I don't have to worry about sizing, so it's by far the easiest way.

We're continuing to expand and mature the serverless product. We now support up to 100 terabytes in a single collection for time series data. We've expanded regions out to 22 regions. We added some key features like being able to audit data plane calls and CloudTrail and snapshot restore, and we're going to continue to invest in the serverless platform. Very excited about where that's going.

So to wrap up, if you want to learn more, there's a link to our skills for more deeper learning on this topic and lots of OpenSearch topics. Also, you know, we're at our booth. There's a booth in D23 in the AWS Village. There's actually also an OpenSearch Project booth down there in the expo. Come down, say hi. Would love to learn more about what you guys are doing, any questions that you have. And last but not least, I really thank everybody for coming, and if you could please fill out that survey, that data is really valuable for us. We want to make sure that we learn the good, the bad, and are delivering the best possible content for you guys here. So please take the time to fill out the survey if you could. And with that, I appreciate everybody sticking around for the hour and really excited to talk about OpenSearch and see what you guys do with it.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community