babuvenky76 for MongoDB

Posted on Jan 19

Semantic Search API: MongoDB Atlas Vector Search With Amazon Bedrock & AWS Serverless

#mongodb #vectorsearch #serverless #bedrock

Authors:
Amorosi, Andrea (Senior Solutions Architect at AWS)
Vogel, Pascal (Solutions Architect at AWS)
Doshi, Akash (Solutions Architect at AWS)

Contributor:
Babu Srinivasan (Senior Partner Solutions Architect at MongoDB)

Searching through large volumes of unstructured data to find the most relevant information is critical to many applications. However, traditional keyword-based search approaches often fall short when dealing with complex natural language queries.

Semantic search overcomes this challenge by understanding the meaning and purpose behind search queries. This comprehension improves the accuracy and relevance of search results by taking into account intent and meaning. Semantic search can be used with complex natural language queries and provides a contextual understanding of words and phrases based on different meanings in different situations.

These capabilities make semantic search a powerful approach for many search use cases, including enterprise knowledge, legal and medical documents, e-commerce products, and media libraries.

MongoDB Atlas Vector Search makes it easy to build semantic search by integrating the operational database and vector search into a single, fully managed platform with a native MongoDB interface that leverages large language models (LLMs) through popular frameworks.

Amazon Bedrock provides access to a range of high-performing foundation models (FMs), including LLMs, developed by leading AI companies such as Amazon, AI21 Labs, Anthropic, Cohere, Meta, and Stability AI. Amazon Bedrock is a serverless service that provides access to a variety of foundation models through a single API.

By using Amazon Bedrock to generate vector embeddings and storing them in MongoDB Atlas, you can quickly build powerful semantic search applications. Combining these technologies with cloud-native design patterns unlocks an intelligent semantic search back end that understands the nuances of language. It allows users to query information in natural language and discover highly relevant results — even if the query and keywords don’t match exactly.

With Amazon Bedrock and MongoDB Atlas, you benefit from comprehensive data protection and privacy. You can use AWS PrivateLink to establish private connectivity from these managed services to your Amazon Virtual Private Cloud (Amazon VPC) without exposing your traffic to the Internet.

This tutorial walks through an architecture for a scalable and secure semantic search API built using MongoDB Atlas Vector Search, Amazon Bedrock, and AWS serverless services. The accompanying GitHub repository contains code and detailed deployment details to get you started.

Solution Overview

The solution presented in this tutorial has two main features:

Generating vector embeddings (represented as 1,2,3 and 4 in the diagram)
Performing the semantic search (represented as A, B, and C in the diagram)

To generate vector embeddings:

The Create Embeddings AWS Lambda function can be invoked via an Amazon API Gateway REST API to generate an initial set of vector embeddings for documents stored in the MongoDB Atlas database.
Ongoing database changes are captured and published to an Amazon EventBridge event bus with an Amazon Simple Queue Service (Amazon SQS) queue as the target.
The Ingestion Lambda function receives change events from the SQS queue using Lambda event source mappings. It generates new or updates existing embeddings using the Titan Embeddings model via Amazon Bedrock.
The new or updated embeddings are stored in MongoDB Atlas via the private interface endpoint connection. AWS Secrets Manager is used for secure secret storage.

To perform semantic search:

A. Users submit their search queries to an API endpoint provided by the API Gateway REST API.
B. The Search Lambda function generates an embedding of the search query using the Titan Embeddings model via Amazon Bedrock. To ensure private connectivity, it uses an interface endpoint provided by AWS PrivateLink.
C. The Search function then performs a semantic search on the MongoDB Atlas vector search index using the interface endpoint for AWS PrivateLink. Results are returned to the client through the API Gateway.

The following sections describe these key architectural elements in more detail.

Generating vector embeddings with Amazon Bedrock and Titan Embeddings

This post uses the movies collection in the sample_mflix database as an example to illustrate the presented concepts. You can easily load this database as MongoDB sample data. Each document in the movies collection contains details on a single movie, such as title, runtime length, release date, genre, and IMDb rating. It also contains a plot field with a short summary of the movie’s plot. Let’s assume you want to enable semantic search on this plot field to allow your users to discover movies using natural language queries.

Semantic search relies on vector embeddings which convert words or phrases into numerical vectors of fixed size. As contextually similar words and phrases also produce similar vector representations, these vectors can capture the meaning of a text. Semantically similar words are mapped to proximate points in the vector space which allows semantic search algorithms to identify relevant search results. As a first step, you need to generate vector embeddings for the text stored in the plot field of each document.

Amazon Bedrock supports generating vector embeddings using the Titan Embeddings model (amazon.titan-embed-text-v1). This model can generate embeddings for a maximum input text of 8K tokens and generates vectors with up to 1536 dimensions. Atlas Vector Search currently supports indexing vector embeddings with up to 2048 dimensions.

This solution uses the AWS SDK for JavaScript v3 in the Search Lambda function to connect to the embedding model in Amazon Bedrock using the BedrockRuntimeClient.

import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient();

const inputText = "Text to create embeddings for."

const input = {
  modelId: "amazon.titan-embed-text-v1", 
  contentType: "application/json",
  accept: "*/*",
  body: JSON.stringify({
    inputText,
  }),
};

const command = new InvokeModelCommand(input);
const response = await client.send(command);

After receiving the vector embeddings from Amazon Bedrock, the Lambda function uses the MongoDB driver for Node.js to store the generated vector embeddings for the plot field in a new plot_embedding field in the MongoDB document.

All the Lambda functions used in this solution securely connect from an isolated VPC to Amazon Bedrock and MongoDB Atlas using VPC interface endpoints provided by AWS PrivateLink. This enables access to both MongoDB Atlas and Amazon Bedrock as if they were in your VPC, without the use of an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. The path between a VPC endpoint and an AWS or AWS-based service stays within AWS and does not traverse the Internet.

Indexing vector embeddings and performing the semantic search with Atlas Vector Search

To store the vector embeddings of the plot text in the plot_embedding field, you can use a knnVector type field in MongoDB Atlas. The vector field is represented as an array of numbers (BSON int32, int64, or double data types only).

Next, you need to index the vector embeddings stored in the plot_embedding field of each document. MongoDB Atlas enables you to define a vector search index on knnVector type fields with the following configuration:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "plot_embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}

To perform search queries on this index, you can use a $vectorSearch aggregation pipeline stage. This search query compares the similarity of the vectors stored in the plot_embedding field with the vector representation of the search query submitted by the user. It uses an approximate nearest neighbor search approach.

A query can then look as follows:

{
  "$vectorSearch": {
    "index": "plot_embedding_index",
    "path": "plot_embedding",
    "queryVector": [<array-of-numbers>],
    "numCandidates": 50,
    "limit": 3,
  }
}

See the Vector Search Queries documentation for a detailed description of fields.

Change data capture with Atlas Triggers and Amazon EventBridge

Data is rarely static. To make new documents and documents where fields are updated searchable by semantic search, you can set up a process for automatically embedding new and re-embedding updated fields. For example, in the case of the movies dataset, you may need to update the plot of some of the movies, which in turn requires an update to the plot_embedding field for the document.

Atlas Triggers allow you to execute server-side logic in response to database events or on a schedule. Database triggers are a type of Atlas trigger that allows you to execute server-side logic whenever a document is added, updated, or removed in a linked Atlas cluster.

There are several ways to configure the types of events that cause a trigger to be executed. First, you can select one or more database change events (INSERT, UPDATE, REPLACE, and DELETE). Second, you can provide a match expression to further filter events based on their properties.

A database trigger can either execute a serverless function with your JavaScript code or send trigger events to an Amazon EventBridge partner event bus.

In the case of this sample application, all INSERT, UPDATE, and REPLACE change events are sent to an EventBridge event bus and placed on an Amazon Queue Service (Amazon SQS) queue. From there, the ingestion Lambda function consumes batches of change events via Lambda event source mappings and creates or updates embeddings for the plot_embeddings document field.

Use a match expression to only forward database events if the document in question either does not have a plot_embeddings field yet or if the plot field has changed:

{"updateDescription.updatedFields.plot":{"$exists":true}}

Serverless semantic search API with Amazon API Gateway and AWS Lambda

Finally, you need a scalable and secure API endpoint that you can integrate with your applications and expose to clients. This solution creates a REST API endpoint using Amazon API Gateway. Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. API Gateway offers multiple authentication options, built-in caching and request validation, and many other features that you can configure to integrate this semantic search solution into your project. As a serverless service, you benefit from automatic scaling, built-in high availability, and a pay-for-use billing model.

Clients send search requests to the /searchendpoint of the REST API and receive a list of relevant search results in response:

curl --request POST \
  'https://<API endpoint>.execute-api.us-east-1.amazonaws.com/prod/search' \
  --aws-sigv4 "aws:amz:us-east-1:execute-api" \
  --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
  --header "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
  --header 'Accept: application/json' \
  --data '{ "query": "sports" }' \
  | jq .

The response for this particular request would contain the first three movies, including the _id, title, plot, and score fields:

[
  {
    "_id": "573a1398f29313caabcea388",
    "plot": "Molly is a high school track coach who knows just as much about football as anyone else on the planet. When the football coach's position becomes vacant, she applies for the job, despite ...",
    "title": "Wildcats",
    "score": 0.7063020467758179
  },
  {
    "_id": "573a1397f29313caabce879f",
    "plot": "It started as a friendly meeting between 4 old buddies with their basketball coach and ended up in revealing the truth about their relationship. The meeting forces the five men to reveal ...",
    "title": "That Championship Season",
    "score": 0.6836512088775635
  },
  {
    "_id": "573a1394f29313caabcdf0a6",
    "plot": "Pat's a brilliant athlete, except when her domineering fiance is around. The lady's golf championship is in her reach until she gets flustered by his presence at the final holes. He wants ...",
    "title": "Pat and Mike",
    "score": 0.6823728084564209
  }
]

Under the hood, incoming search requests are routed from the API Gateway to the Search Lambda function using a Lambda proxy integration.

Because embeddings only need to be generated when new data is added or data is updated, event-driven computing with AWS Lambda allows embedding generation to be triggered on-demand rather than running continuously. AWS Lambda is a serverless computing service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.

Scaling and extending the solution

This solution serves as a blueprint that can be enhanced and extended to develop your use cases based on a semantic search with MongoDB Atlas and Amazon Bedrock. Keep the following considerations in mind when scaling the production solution.

The default Amazon Bedrock quotas implement rate limits for the API operations performed in this example application. For instance, the default quotas allow 2,000 requests per minute or 300,000 tokens processed per minute to invoke the Amazon Titan Embeddings model. Depending on the volume and size of your embedding API calls, you may need to configure provisioned throughput to get a higher level of throughput for a fixed cost.

With automatic scaling, built-in high availability, and a pay-for-use billing model, AWS Lambda is well-suited as a computing platform for embedding workloads. To ensure your Lambda functions can handle large numbers of invocations, such as ingesting large amounts of data at once, make sure to manage Lambda function concurrency appropriately. To do this, configure reserved concurrency and provisioned concurrency. For more information about scaling Lambda functions and configuring reserved and provisioned concurrency, see the AWS Lambda Developer Guide.

Consider enabling API Gateway caching to increase the responsiveness of the integration and to optimize the cost of repeat requests. Also, set up access logging for the API Gateway with Amazon CloudWatch to keep a record of who accessed your API endpoint and how. For an overview of security recommendations for API Gateway, see security best practices in Amazon API Gateway.

The integration presented in this tutorial follows security best practices such as storing your MongoDB credentials in Secrets Manager and utilizing IAM to secure access to resources in your AWS account. To protect your MongoDB account, you should regularly rotate your MongoDB credentials and update them in Secrets Manager.

Conclusion

This article demonstrates how to use MongoDB Atlas Vector Search, Amazon Bedrock, and AWS serverless services to build a secure and scalable semantic search API. This approach allows you to not only use MongoDB Atlas to store your data sets but also to unlock more value by using Atlas Vector Search alongside Amazon Bedrock's serverless API integrations.

The associated GitHub repository contains the solution source code and detailed deployment instructions to get you started. Open a GitHub issue to provide your feedback or create a pull request to extend the solution.

See the MongoDB Atlas Vector Search documentation for more information and tutorials.