<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: babuvenky76</title>
    <description>The latest articles on DEV Community by babuvenky76 (@babuvenky76).</description>
    <link>https://dev.to/babuvenky76</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F775714%2Fdf0faa8b-ec46-42f4-a459-947ec69c8e3e.png</url>
      <title>DEV Community: babuvenky76</title>
      <link>https://dev.to/babuvenky76</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/babuvenky76"/>
    <language>en</language>
    <item>
      <title>Build Your Movie Recommendation System Using Amazon Personalize, MongoDB Atlas, and AWS Glue</title>
      <dc:creator>babuvenky76</dc:creator>
      <pubDate>Fri, 12 Apr 2024 07:02:12 +0000</pubDate>
      <link>https://dev.to/babuvenky76/build-your-movie-recommendation-system-using-amazon-personalize-mongodb-atlas-and-aws-glue-4o91</link>
      <guid>https://dev.to/babuvenky76/build-your-movie-recommendation-system-using-amazon-personalize-mongodb-atlas-and-aws-glue-4o91</guid>
      <description>&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt;&lt;br&gt;
Siddharth Joshi, (Technical Account Manager at AWS)&lt;br&gt;
Sornavel Perumal (Technical Account Manager at AWS)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributor:&lt;/strong&gt;&lt;br&gt;
Babu Srinivasan (Senior Partner Solutions Architect at MongoDB)&lt;/p&gt;

&lt;p&gt;In today's data-driven world, personalized recommendations have become an integral part of enhancing user experiences. With the power of cloud computing and advanced database solutions, building your own personalized movie recommendation system is now more achievable than ever. In this article, we'll explore the integration of MongoDB Atlas, AWS Glue, and Amazon Personalize to create a robust and scalable recommendation engine.&lt;/p&gt;
&lt;h2&gt;
  
  
  Understanding the components
&lt;/h2&gt;

&lt;p&gt;Before diving into the integration process, let's briefly understand the key components involved in our movie recommendation system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mongodb.com/atlas/database"&gt;MongoDB Atlas&lt;/a&gt; is a fully managed, cloud-based database service that enables seamless deployment, scaling, and maintenance of MongoDB databases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/glue/"&gt;AWS Glue&lt;/a&gt; is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analysis. It helps bridge the gap between our MongoDB Atlas data and the services we'll use for recommendation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/pm/personalize/"&gt;Amazon Personalize&lt;/a&gt; is a machine learning service that makes it easy to build, train, and deploy personalized recommendation models. It will analyze the data from MongoDB Atlas and generate personalized movie recommendations for users&lt;/p&gt;
&lt;h2&gt;
  
  
  Reference architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbsrgvgrnmw37ir91ibt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbsrgvgrnmw37ir91ibt.png" alt="Image description" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This architecture seamlessly ingests data from MongoDB Atlas, powering personalized recommendations. The AWS Glue Spark Job extracts transforms (filtering, cleaning, joining), and loads data into S3. This prepared data becomes the foundation for your chosen AI/ML service (SageMaker, Personalize, etc.), enabling highly accurate and personalized recommendations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/free/free-tier/"&gt;AWS account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/marketplace/pp/prodview-pp445qepfdy34?trk=bee522df-5e20-4004-9407-60ca7f22e092&amp;amp;sc_channel=el"&gt;MongoDB Atlas&lt;/a&gt; free cluster&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html"&gt;AWS CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tutorial will be well understood if you have a good understanding of &lt;a href="https://www.mongodb.com/atlas/database"&gt;MongoDB Atlas services&lt;/a&gt; and Amazon Web Services (AWS), mentioned in the above architecture diagram.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setting up MongoDB Atlas for movie data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Begin by creating a MongoDB Atlas database to store information about movies, genres, and user interactions. Populate the database with relevant data, ensuring it is well-structured for the recommendation model.&lt;br&gt;
For this article, we will be using the &lt;a href="https://grouplens.org/datasets/movielens/"&gt;MovieLens&lt;/a&gt; dataset. &lt;/p&gt;

&lt;p&gt;a. If you do not already have one, you can &lt;a href="https://aws.amazon.com/marketplace/pp/prodview-pp445qepfdy34?trk=bee522df-5e20-4004-9407-60ca7f22e092&amp;amp;sc_channel=el"&gt;sign up&lt;/a&gt; for a MongoDB Atlas account. &lt;/p&gt;

&lt;p&gt;b. Create a &lt;a href="https://www.mongodb.com/docs/atlas/atlas-ui/databases/#create-a-database"&gt;database&lt;/a&gt; named &lt;strong&gt;movielens&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;c. Get your &lt;a href="https://www.mongodb.com/docs/guides/atlas/connection-string/"&gt;connection&lt;/a&gt; URI to connect to MongoDB Atlas noted down.&lt;/p&gt;

&lt;p&gt;d. &lt;a href="https://files.grouplens.org/datasets/movielens/ml-latest-small.zip"&gt;Download&lt;/a&gt; the MovieLens dataset. &lt;/p&gt;

&lt;p&gt;e. Unzip the file locally and run the &lt;a href="https://github.com/siddharj-amz/mongo-personalize-recommender/blob/main/load_data_to_atlas.py"&gt;Python script&lt;/a&gt; to upload data to MongoDB Atlas. (Replace &lt;strong&gt;&lt;/strong&gt; and &lt;strong&gt;path_to_extracted_files&lt;/strong&gt;.)&lt;/p&gt;
&lt;h2&gt;
  
  
  Using AWS Glue for data preparation
&lt;/h2&gt;

&lt;p&gt;AWS Glue comes into play for ETL operations. Create a Glue job to extract data from MongoDB, transform it into a suitable format for training the recommendation model, and load it into an Amazon S3 bucket.&lt;/p&gt;

&lt;p&gt;a. Create an &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html"&gt;S3 bucket&lt;/a&gt; to store the processed file from Glue.&lt;/p&gt;

&lt;p&gt;b. Store your MongoDB connection properties credentials in AWS Secrets Manager.&lt;/p&gt;

&lt;p&gt;c. Create a new AWS Glue Studio job with the Spark script editor option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F44y8f6bzlvlsxu0p8oez.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F44y8f6bzlvlsxu0p8oez.png" alt="Image description" width="800" height="217"&gt;&lt;/a&gt;From the AWS Glue Studio console, select jobs from the menu and select “Script editor.” &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3rh5qd9cypi4h4eolrf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3rh5qd9cypi4h4eolrf.png" alt="Image description" width="800" height="333"&gt;&lt;/a&gt;&lt;br&gt;
Select the Spark option from the dropdown menu and click &lt;strong&gt;Create script&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;d. &lt;a href="https://aws.amazon.com/blogs/big-data/compose-your-etl-jobs-for-mongodb-atlas-with-aws-glue/"&gt;Create an ETL job&lt;/a&gt; using Glue. &lt;a href="https://github.com/siddharj-amz/mongo-personalize-recommender/blob/main/pyspark_atlas_to_s3.py"&gt;Replace&lt;/a&gt; the Python script. &lt;/p&gt;

&lt;p&gt;e. Specify input arguments. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key : Value&lt;/strong&gt;&lt;br&gt;
--BUCKET_NAME : &amp;lt;bucket_name&amp;gt;&lt;br&gt;
--OUTPUT_FILENAME1 :  ratings&lt;br&gt;
--OUTPUT_FILENAME2 :  items&lt;br&gt;
--COLLECTION_NAME3 :  movies&lt;br&gt;
--COLLECTION_NAME2 :  tags&lt;br&gt;
--COLLECTION_NAME1 :  ratings&lt;br&gt;
--SECRET_NAME      :  &amp;lt;name_of_secret&amp;gt;&lt;/p&gt;

&lt;p&gt;f. Run the job.&lt;/p&gt;
&lt;h2&gt;
  
  
  Create a dataset group and an interactions dataset in Amazon Personalize
&lt;/h2&gt;

&lt;p&gt;a. Go to Amazon Personalize in your AWS console.&lt;br&gt;
b. In the left navigation pane, click on &lt;strong&gt;Dataset groups&lt;/strong&gt;.&lt;br&gt;
c. Click the &lt;strong&gt;Create dataset group&lt;/strong&gt; button. Enter &lt;code&gt;movie-datasetgroup&lt;/code&gt; as the name for your dataset group. Select &lt;strong&gt;Video on demand&lt;/strong&gt; as the Domain. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfkdfj4k4fz9zssxgpxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyfkdfj4k4fz9zssxgpxc.png" alt="Image description" width="800" height="507"&gt;&lt;/a&gt;&lt;br&gt;
In Amazon Personalize, click &lt;strong&gt;Create dataset group&lt;/strong&gt;, provide the name of your dataset, and select the “Video on demand” option. Click &lt;strong&gt;Create group&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;d. After creating the dataset group, you need to add datasets to it. Click on the dataset group you just created.&lt;br&gt;
e. Click on the &lt;strong&gt;Create dataset&lt;/strong&gt; button, and select &lt;strong&gt;Item interactions dataset&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fly3b6xsnkc53ysiatiwh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fly3b6xsnkc53ysiatiwh.png" alt="Image description" width="800" height="222"&gt;&lt;/a&gt;&lt;br&gt;
Select the “Item interactions dataset” from the dropdown menu.&lt;/p&gt;

&lt;p&gt;f. Select Import data directly into Amazon Personalize datasets as the Import method.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0ay32vro0zq4o60fdui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0ay32vro0zq4o60fdui.png" alt="Image description" width="800" height="291"&gt;&lt;/a&gt;&lt;br&gt;
Choose the “Import data directly into Amazon Personalize datasets.” &lt;/p&gt;

&lt;p&gt;g. Provide &lt;code&gt;movie-interactions&lt;/code&gt; as Dataset name and Schema name. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfh0sxdk96p0l43u5g9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsfh0sxdk96p0l43u5g9g.png" alt="Image description" width="800" height="337"&gt;&lt;/a&gt;&lt;br&gt;
Give a name to the Dataset and select “Create a new domain schema by modifying the existing default schema for your domain.”&lt;/p&gt;

&lt;p&gt;h. To configure your dataset, import the job, select &lt;strong&gt;Import data from S3&lt;/strong&gt;, and provide a name for the import job. Specify the path of ratings.csv in your S3 bucket as the data location, and specify an IAM role that has access to the S3 bucket.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y26ttszg1kqr7lumd2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y26ttszg1kqr7lumd2y.png" alt="Image description" width="800" height="348"&gt;&lt;/a&gt;&lt;br&gt;
Select “Import data from S3.” &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gbr2r803edjr0a3fyil.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1gbr2r803edjr0a3fyil.png" alt="Image description" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
Select the S3 data location.&lt;/p&gt;
&lt;h2&gt;
  
  
  Create the User dataset
&lt;/h2&gt;

&lt;p&gt;The User dataset is the dataset for all the users listed in the system. MovieLens does not provide a user dataset so we will be using one that has been created for this post. In the real world, this dataset would be coming from your application. &lt;/p&gt;

&lt;p&gt;a. Copy the &lt;a href="https://github.com/siddharj-amz/mongo-personalize-recommender/blob/main/users.csv"&gt;users.csv&lt;/a&gt; file and put it on your S3 bucket created earlier. &lt;br&gt;
b. Follow the steps above to create the User dataset.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4x75pejpu58hmunkzn4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4x75pejpu58hmunkzn4.png" alt="Image description" width="800" height="252"&gt;&lt;/a&gt;&lt;br&gt;
Select “Users dataset” from the dropdown menu.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3kmw8khh30xh6mpqidc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3kmw8khh30xh6mpqidc.png" alt="Image description" width="800" height="374"&gt;&lt;/a&gt;&lt;br&gt;
Choose “Import data directly into Amazon Peronalize Datasets.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jwrzfmi55j56rjr7ac4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8jwrzfmi55j56rjr7ac4.png" alt="Image description" width="800" height="454"&gt;&lt;/a&gt;&lt;br&gt;
Provide a name for the Dataset and choose “Create a new domain schema by modifying the existing default schema for your domain.”&lt;/p&gt;

&lt;p&gt;c. Ensure that the Schema definition looks like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
“type” : “record”,
“name” : “Users”,
“namespace”: “com.amazonaws.personalize.schema”,
“fields” : [
       { “name”: “USER_ID”, “type”: “string”},
       { “name”: “SUBSCRIPTION_MODEL”, “type”: “string”,”categorical”: true},
               ],
“version”:”1.0”
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;d. Provide S3 as the data import source and the S3 path for users.csv as the data location. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8ql4g8ulb765dm9d502.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8ql4g8ulb765dm9d502.png" alt="Image description" width="800" height="318"&gt;&lt;/a&gt;&lt;br&gt;
Select the “Import data from S3” option and provide the name to “Dataset import job name.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgazfhvoc9gbt0zros0zw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgazfhvoc9gbt0zros0zw.png" alt="Image description" width="800" height="487"&gt;&lt;/a&gt;&lt;br&gt;
Select the S3 bucket location and provide the IAM role.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create the Items dataset
&lt;/h2&gt;

&lt;p&gt;The Items dataset refers to a list of all the movies available in our application. Our Glue ETL job has converted the MongoDB collection “movies” into a .csv file in a format usable with Amazon Personalize.&lt;br&gt;
You need to follow similar steps as above to create an Items dataset. For data location, you need to provide the path of items.csv on your S3 bucket.&lt;/p&gt;

&lt;p&gt;a. Select the “Items dataset” from the dropdown menu.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt2wv0nw8vxeue3enw7g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnt2wv0nw8vxeue3enw7g.png" alt="Image description" width="800" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;b. Select the “Import data directly into Amazon Personalize datasets” option.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvhjlhuypgzkazwd32lc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbvhjlhuypgzkazwd32lc.png" alt="Image description" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. Provide a name — “movie-item” — to the Dataset name and select the “Create a new domain schema by modifying the existing default schema for your domain” option.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxsst6rhlyrjt3px1krl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxsst6rhlyrjt3px1krl.png" alt="Image description" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. Select the “Import data from S3” option and provide the Dataset import job name — “movie-ds-item.”&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2yuqtet3son8hvwqvc36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2yuqtet3son8hvwqvc36.png" alt="Image description" width="800" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select the S3 location for the Data location and provide the IAM Role.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7da22dqixydw2ru6fct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs7da22dqixydw2ru6fct.png" alt="Image description" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. Before proceeding to the next step, you should wait until all three datasets become active.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6wqchy8x82vvey70dbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6wqchy8x82vvey70dbd.png" alt="Image description" width="800" height="245"&gt;&lt;/a&gt;&lt;br&gt;
Ensure you can see “3/3 datasets active” in green.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run data analysis
&lt;/h2&gt;

&lt;p&gt;Now, use Amazon Personalize for analyzing the data imported, meaning Users, Item interactions, and item datasets.&lt;br&gt;
Start the data analysis by clicking on &lt;strong&gt;Run data analysis&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra5ajitsv42m1m08b59z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fra5ajitsv42m1m08b59z.png" alt="Image description" width="800" height="173"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Ensure the data analysis run has been completed successfully.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xm9z1usk55zwjgaf2h5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xm9z1usk55zwjgaf2h5.png" alt="Image description" width="800" height="407"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create recommenders
&lt;/h2&gt;

&lt;p&gt;Create the recommenders after the Domain dataset group is created successfully. A recommender is a Domain dataset group resource that generates recommendations. Use a recommender in the application to get real-time recommendations with the &lt;a href="https://docs.aws.amazon.com/personalize/latest/dg/API_RS_GetRecommendations.html"&gt;GetRecommendations&lt;/a&gt; operation.&lt;/p&gt;

&lt;p&gt;a. select  &lt;strong&gt;Use e-commerce recommenders&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fot8taj16jj8ugwnfl1h4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fot8taj16jj8ugwnfl1h4.png" alt="Image description" width="800" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;b. For the use case, select &lt;strong&gt;Because you watched X&lt;/strong&gt; and provide a name to the recommender. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2donq3bo217zyo6jr2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk2donq3bo217zyo6jr2k.png" alt="Image description" width="800" height="582"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. You can leave Advanced configuration as the default.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fluud28eicf1o0g4zz1e1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fluud28eicf1o0g4zz1e1.png" alt="Image description" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. Review the configuration and click on &lt;strong&gt;Create recommenders&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xsrq1cvm4s0v4o4rrsv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7xsrq1cvm4s0v4o4rrsv.png" alt="Image description" width="800" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. Before proceeding to the next step, please wait until the recommender becomes active.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6we8wob62so59yf5i9xy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6we8wob62so59yf5i9xy.png" alt="Image description" width="800" height="198"&gt;&lt;/a&gt;&lt;br&gt;
Ensure the status is “Active” for the movie-recommender.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test recommender
&lt;/h2&gt;

&lt;p&gt;Now that we have created a recommender, we are ready to get recommendations. In a real-world scenario, our application would be sending requests to Amazon Personalize and getting recommendations. For the post, we will test it using the Amazon Personalize console. &lt;br&gt;
a. Go to the Amazon Personalize console.&lt;br&gt;
b. Click on &lt;strong&gt;Recommenders&lt;/strong&gt; under movie-dataset group and select &lt;strong&gt;movie-recommender&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3du1x7nwcjf3owm9toqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3du1x7nwcjf3owm9toqk.png" alt="Image description" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;c. Click on &lt;strong&gt;Test&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24b9o9nz97phoakl3y7c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F24b9o9nz97phoakl3y7c.png" alt="Image description" width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;d. Enter a valid user ID and movie ID (Item ID), and click on &lt;strong&gt;Get recommendations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbu19yjsm0zi54gztuxj6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbu19yjsm0zi54gztuxj6.png" alt="Image description" width="800" height="638"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e. The recommender will provide the list of recommendations in the form of the movie ID. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0c4cs0zulp3ggz8r98j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm0c4cs0zulp3ggz8r98j.png" alt="Image description" width="800" height="738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a real-world scenario, your application will map these movie IDs to movie names and will show them as recommendations to users. &lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this post, we explored the integration of MongoDB, AWS Glue, and Amazon Personalize to build a personalized movie recommendation system. This powerful combination allows you to leverage the flexibility of MongoDB, the data preparation capabilities of AWS Glue, and the machine learning prowess of Amazon Personalize to deliver a tailored and engaging user experience. As you embark on your journey to enhance user engagement, this integration offers a scalable and efficient solution for building recommendation systems in various domains.&lt;/p&gt;

&lt;p&gt;Refer to the following links for further reading:&lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-intro-tutorial.html"&gt;Writing an AWS Glue for Spark script&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mongodb.com/docs/atlas/create-database-deployment/"&gt;MongoDB Atlas database&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.mongodb.com/community/forums/"&gt;MongoDB Community Forum&lt;/a&gt;&lt;/p&gt;

</description>
      <category>atlasvectorsearch</category>
      <category>generativeai</category>
      <category>aiml</category>
      <category>aws</category>
    </item>
    <item>
      <title>Developer's Guide: Crafting API-Driven Apps with MongoDB Atlas Using AWS CDK, API Gateway, and Lambda</title>
      <dc:creator>babuvenky76</dc:creator>
      <pubDate>Wed, 24 Jan 2024 08:23:47 +0000</pubDate>
      <link>https://dev.to/mongodb/developers-guide-crafting-api-driven-apps-with-mongodb-atlas-using-aws-cdk-api-gateway-and-lambda-5ab9</link>
      <guid>https://dev.to/mongodb/developers-guide-crafting-api-driven-apps-with-mongodb-atlas-using-aws-cdk-api-gateway-and-lambda-5ab9</guid>
      <description>&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/deepti-chugh-42631015/" rel="noopener noreferrer"&gt;Deepti Chugh&lt;/a&gt; (Sr Partner Success SA at AWS)&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/bharathsuresh/" rel="noopener noreferrer"&gt;Bharath S&lt;/a&gt; (Senior Partner Solutions Architect ISVs at AWS)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributor:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/babusrinivasan/" rel="noopener noreferrer"&gt;Babu Srinivasan&lt;/a&gt;  (Senior Partner Solutions Architect at MongoDB) &lt;/p&gt;

&lt;p&gt;Welcome to our technical blog, where we unveil a step-by-step guide to deploying a robust REST API powered by Lambda functions, expertly bridging the gap between MongoDB Atlas and AWS, all with the added convenience of automation through the AWS Cloud Development Kit (CDK). Our mission is to empower developers like you to seamlessly integrate MongoDB Atlas with AWS API Gateway, all while implementing authentication via Cognito User Pools. If you're ready to embark on a journey that not only streamlines the process of building modern API-driven applications but also leverages the power of automation, you're in the right place. Let's dive into the details and unlock the potential of this dynamic integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Will You Build?&lt;/strong&gt;&lt;br&gt;
This solution comprises the following AWS services which get deployed using CDK (Cloud Development Kit):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In the MongoDB SaaS account:&lt;br&gt;
o   A MongoDB cluster&lt;br&gt;
o   A MongoDB project&lt;br&gt;
o   A MongoDB database user&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the AWS customer account:&lt;br&gt;
o   Amazon Cognito UserPool - user directory for authentication and authorization&lt;br&gt;
o   AWS Secrets Manager – for keeping MongoDB Database Credentials&lt;br&gt;
o   Application Programming Interface (API) Gateway –  acts as the "front door" for applications to access data, business logic, or functionality from your backend services&lt;br&gt;
o   Lambda function – connects to the Mongo DB database using PyMongo which is the Python driver for MongoDB&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reference architecture&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1j52v4wlmu7k0n71dyh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1j52v4wlmu7k0n71dyh.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above figure, the users call the API gateway endpoint to access MongoDB Atlas by invoking the AWS Lambda function. The user is authenticated by Amazon Cognito services. The credentials are stored in AWS Secrets Manager and the entire setup can be automated using the AWS CDK. MongoDB Atlas resides in a distinct Atlas VPC, fully administered by MongoDB. It is accessed securely through a private link for enhanced security.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Implementation Steps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This solution uses AWS CDK to deploy the solution on AWS. The first step involves creating a MongoDB cluster and database and then deploying AWS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html" rel="noopener noreferrer"&gt;AWS CDK&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://docs.npmjs.com/downloading-and-installing-node-js-and-npm" rel="noopener noreferrer"&gt;NPM&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; &lt;a href="https://account.mongodb.com/account/login" rel="noopener noreferrer"&gt;MongoDB Atlas Account&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; AWS Account and &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" rel="noopener noreferrer"&gt;AWS CLI&lt;/a&gt; Installed and Configured&lt;/li&gt;
&lt;li&gt; Activate MongoDB Atlas &lt;a href="https://aws-ia.github.io/cfn-ps-mongodb-atlas/" rel="noopener noreferrer"&gt;CloudFormation resources&lt;/a&gt; in your AWS account with sufficient permissions&lt;/li&gt;
&lt;li&gt; Store MongoDB Atlas programmatic API Keys in AWS Secrets Manager. You can find more details about these in &lt;a href="https://www.mongodb.com/developer/products/atlas/deploy-mongodb-atlas-aws-cdk-typescript/" rel="noopener noreferrer"&gt;MongoDB's tutorial&lt;/a&gt; or the GitHub &lt;a href="https://github.com/mongodb/awscdk-resources-mongodbatlas" rel="noopener noreferrer"&gt;repository&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 0: Initialize the CDK Project&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the IDE of your choice — Cloud9, VS Code, etc.&lt;/li&gt;
&lt;li&gt;Execute the below commands to initialize the environment.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

#Get the application code
    git clone https://github.com/mongodb-partners/Microservice_Application_with_MongoDBAtlas_AWSCDK_APIGW_Lambda.git
    cd aws_mongodb_sample_dir

# If you DONT have cdk installed
    npm install -g aws-cdk


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

# Make sure you in root directory
    python3 -m venv .venv
    source .venv/bin/activate
    pip3 install -r requirements.txt


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Deploy MongoDB Atlas and AWS resources&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Set up the &lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html" rel="noopener noreferrer"&gt;AWS CLI and connect to the session&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy the [MongoDB Atlas Organization ID] and this will be used in the next step.(&lt;a href="https://www.mongodb.com/docs/atlas/access/orgs-create-view-edit-delete/#view-organizations" rel="noopener noreferrer"&gt;https://www.mongodb.com/docs/atlas/access/orgs-create-view-edit-delete/#view-organizations&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the below commands to install the Python dependencies included with this sample.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

#Install Dependencies for Lambda Function
    cd aws_mongodb_sample
    pip install --target ./dependencies pymongo
    cd ..

# Set Environment Variables
    export ORG_ID="&amp;lt;ORG_ID&amp;gt;"
    export MONGODB_USER="&amp;lt;MONGODB_USER&amp;gt;"
    export MONGODB_PASSWORD="&amp;lt;MONGODB_PASSWORD&amp;gt;"

    cdk bootstrap aws://&amp;lt;ACCOUNT_NUMBER&amp;gt;/&amp;lt;AWS-REGION&amp;gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Run the below commands to deploy the CDK template.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

cdk synth
cdk deploy --all


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Copy the API gateway output endpoint from the terminal as you will need this while testing the API gateway. Alternatively, you can copy it from the stack output from the cloud formation in the console.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Explore the Deployed Resources&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once the CDK is deployed, go to the AWS Console and verify the resources&lt;br&gt;
1)    MongoDB::Atlas::Cluster&lt;br&gt;
2)    MongoDB::Atlas::Project&lt;br&gt;
3)    MongoDB::Atlas::DatabaseUser&lt;br&gt;
4)    MongoDB::Atlas::ProjectIpAccessList&lt;br&gt;
5)    Secret for storing ATLAS DB URI&lt;br&gt;
6)    Cognito User Pool&lt;br&gt;
7)    Lambda&lt;br&gt;
8)    API Gateway&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Test the Resources&lt;/strong&gt;&lt;br&gt;
1)    Navigate to the Cognito user pool and copy the user pool ID and client ID (in the App Integration tab) from the Cognito user pool&lt;/p&gt;

&lt;p&gt;2)    Open Cloud Shell and create a user with the command below:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

aws cognito-idp admin-create-user --user-pool-id  &amp;lt;YOUR_USER_POOL_ID&amp;gt;  --username apigwtest


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;3)    Once you’ve created the user, since it’s created by an admin, we will have to force change the password by running the below command:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

aws cognito-idp admin-set-user-password --user-pool-id &amp;lt;YOUR_USER_POOL_ID&amp;gt;  --username apigwtest  --password &amp;lt;YOUR_PASSWORD&amp;gt; --permanent


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;4)    Replace the user pool ID and client ID copied in the above step. Also, replace the password of the user created above.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

aws cognito-idp admin-initiate-auth --user-pool-id &amp;lt;YOUR_USER_POOL_ID&amp;gt; --client-id &amp;lt;CLIENT_ID&amp;gt;  --auth-flow ADMIN_NO_SRP_AUTH --auth-parameters USERNAME=apigwtest,PASSWORD=&amp;lt;YOUR_PASSWORD&amp;gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;5)    Copy the ID token created from the above step and run the below command to test the API. Copy the API_GATEWAY_ENDPOINT from the API gateway console --&amp;gt; API Gateway: APIs: ApiGateway (xxxxxx) :Stages&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --location --request GET 'https://&amp;lt;API_GATEWAY_ENDPOINT&amp;gt;.execute-api.us-east-1.amazonaws.com/dev' --header 'Content-Type: application/json' --header 'Authorization: &amp;lt;ID_TOKEN&amp;gt;'


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As we wrap up our journey into the world of modern API-driven applications, we hope this blog has illuminated the path to seamless integration. With AWS CDK, MongoDB Atlas, Cognito, and Lambda at your disposal, you're armed with the tools to craft dynamic, efficient, and scalable applications. The power of these technologies lies in your hands, and we encourage you to roll up your sleeves, dig into the code, and embark on your development adventure. The possibilities are boundless, and your next innovative application could be just a few lines of code away. So, go ahead and explore, experiment, and turn your ideas into reality with the combination of AWS CDK, MongoDB Atlas, Cognito, and Lambda. Your journey is just beginning, and the future of application development is at your fingertips.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try Out&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/mongodb/awscdk-resources-mongodbatlas" rel="noopener noreferrer"&gt;AWS CDK for MongoDB Atlas&lt;/a&gt;, &lt;br&gt;
&lt;a href="https://aws.amazon.com/pm/cognito/" rel="noopener noreferrer"&gt;Amazon Cognito&lt;/a&gt;, and &lt;br&gt;
&lt;a href="https://aws.amazon.com/pm/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rollback&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

cdk destroy --all


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Cost and Licenses&lt;/strong&gt;&lt;br&gt;
There is no cost to use this Partner Solution, but you will be billed for any AWS services or resources that this Partner Solution deploys. For more information, refer to the AWS Partner Solution General Information Guide.&lt;br&gt;
This Partner Solution deploys MongoDB Atlas resources with the latest stable MongoDB enterprise version, which is licensed and distributed under the Server Side Public License (SSPL).&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>serverless</category>
      <category>awscdk</category>
      <category>applicationmodernization</category>
    </item>
    <item>
      <title>Semantic Search API: MongoDB Atlas Vector Search With Amazon Bedrock &amp; AWS Serverless</title>
      <dc:creator>babuvenky76</dc:creator>
      <pubDate>Fri, 19 Jan 2024 08:19:00 +0000</pubDate>
      <link>https://dev.to/mongodb/semantic-search-api-mongodb-atlas-vector-search-with-amazon-bedrock-aws-serverless-31in</link>
      <guid>https://dev.to/mongodb/semantic-search-api-mongodb-atlas-vector-search-with-amazon-bedrock-aws-serverless-31in</guid>
      <description>&lt;p&gt;&lt;strong&gt;Authors:&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://www.linkedin.com/in/dreamorosi/" rel="noopener noreferrer"&gt;Amorosi, Andrea&lt;/a&gt; (Senior Solutions Architect at AWS)&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/pascal-vogel/" rel="noopener noreferrer"&gt;Vogel, Pascal&lt;/a&gt; (Solutions Architect at AWS)&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/akash-doshi-aws/" rel="noopener noreferrer"&gt;Doshi, Akash&lt;/a&gt; (Solutions Architect at AWS)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributor:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/babusrinivasan/" rel="noopener noreferrer"&gt;Babu Srinivasan&lt;/a&gt; (Senior Partner Solutions Architect at MongoDB)&lt;/p&gt;

&lt;p&gt;Searching through large volumes of unstructured data to find the most relevant information is critical to many applications. However, traditional keyword-based search approaches often fall short when dealing with complex natural language queries.&lt;/p&gt;

&lt;p&gt;Semantic search overcomes this challenge by understanding the meaning and purpose behind search queries. This comprehension improves the accuracy and relevance of search results by taking into account intent and meaning. Semantic search can be used with complex natural language queries and provides a contextual understanding of words and phrases based on different meanings in different situations.&lt;/p&gt;

&lt;p&gt;These capabilities make semantic search a powerful approach for many search use cases, including enterprise knowledge, legal and medical documents, e-commerce products, and media libraries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mongodb.com/products/platform/atlas-vector-search" rel="noopener noreferrer"&gt;MongoDB Atlas Vector Search&lt;/a&gt; makes it easy to build semantic search by integrating the operational database and vector search into a single, fully managed platform with a native MongoDB interface that leverages large language models (LLMs) through popular frameworks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/bedrock/" rel="noopener noreferrer"&gt;Amazon Bedrock&lt;/a&gt; provides access to a range of high-performing foundation models (FMs), including LLMs, developed by leading AI companies such as Amazon, AI21 Labs, Anthropic, Cohere, Meta, and Stability AI. Amazon Bedrock is a serverless service that provides access to a variety of foundation models through a single API.&lt;/p&gt;

&lt;p&gt;By using Amazon Bedrock to generate vector embeddings and storing them in MongoDB Atlas, you can quickly build powerful semantic search applications. Combining these technologies with cloud-native design patterns unlocks an intelligent semantic search back end that understands the nuances of language. It allows users to query information in natural language and discover highly relevant results — even if the query and keywords don’t match exactly.&lt;/p&gt;

&lt;p&gt;With Amazon Bedrock and MongoDB Atlas, you benefit from comprehensive data protection and privacy. You can use &lt;a href="https://aws.amazon.com/privatelink/" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt; to establish private connectivity from these managed services to your &lt;a href="https://aws.amazon.com/vpc/" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (Amazon VPC)&lt;/a&gt; without exposing your traffic to the Internet.&lt;/p&gt;

&lt;p&gt;This tutorial walks through an architecture for a scalable and secure semantic search API built using MongoDB Atlas Vector Search, Amazon Bedrock, and &lt;a href="https://aws.amazon.com/serverless/" rel="noopener noreferrer"&gt;AWS serverless services&lt;/a&gt;. The accompanying &lt;a href="https://github.com/mongodb-partners/Semantic_Search_API_MongoDB_Atlas_Vector_Search_and_Amazon_Bedrock" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; contains code and detailed deployment details to get you started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14a23q4tp7ig9r84bom8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14a23q4tp7ig9r84bom8.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The solution presented in this tutorial has two main features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generating vector embeddings (represented as 1,2,3 and 4 in the diagram)&lt;/li&gt;
&lt;li&gt;Performing the semantic search (represented as A, B, and C in the diagram)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;To generate vector embeddings:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Create Embeddings &lt;a href="https://aws.amazon.com/lambda/" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt; function can be invoked via an &lt;a href="https://aws.amazon.com/api-gateway/" rel="noopener noreferrer"&gt;Amazon API Gateway&lt;/a&gt; REST API to generate an initial set of vector embeddings for documents stored in the MongoDB Atlas database.&lt;/li&gt;
&lt;li&gt;Ongoing database changes are captured and published to an &lt;a href="https://aws.amazon.com/eventbridge/" rel="noopener noreferrer"&gt;Amazon EventBridge&lt;/a&gt; event bus with an &lt;a href="https://aws.amazon.com/sqs/" rel="noopener noreferrer"&gt;Amazon Simple Queue Service (Amazon SQS)&lt;/a&gt; queue as the target. &lt;/li&gt;
&lt;li&gt;The Ingestion Lambda function receives change events from the SQS queue using &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html" rel="noopener noreferrer"&gt;Lambda event source mappings&lt;/a&gt;. It generates new or updates existing embeddings using the Titan Embeddings model via Amazon Bedrock.&lt;/li&gt;
&lt;li&gt;The new or updated embeddings are stored in MongoDB Atlas via the private interface endpoint connection. &lt;a href="https://aws.amazon.com/secrets-manager/" rel="noopener noreferrer"&gt;AWS Secrets Manager&lt;/a&gt; is used for secure secret storage.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;To perform semantic search:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A. Users submit their search queries to an API endpoint provided by the API Gateway REST API.&lt;br&gt;
B. The Search Lambda function generates an embedding of the search query using the &lt;a href="https://aws.amazon.com/bedrock/titan/" rel="noopener noreferrer"&gt;Titan Embeddings&lt;/a&gt; model via Amazon Bedrock. To ensure private connectivity, it uses an interface endpoint provided by AWS PrivateLink.&lt;br&gt;
C. The Search function then performs a semantic search on the MongoDB Atlas vector search index using the &lt;a href="https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html" rel="noopener noreferrer"&gt;interface endpoint&lt;/a&gt; for AWS PrivateLink. Results are returned to the client through the API Gateway.&lt;/p&gt;

&lt;p&gt;The following sections describe these key architectural elements in more detail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generating vector embeddings with Amazon Bedrock and Titan Embeddings&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This post uses the &lt;code&gt;movies&lt;/code&gt; collection in the &lt;a href="https://www.mongodb.com/docs/atlas/sample-data/sample-mflix/" rel="noopener noreferrer"&gt;sample_mflix&lt;/a&gt; database as an example to illustrate the presented concepts. You can easily load this database as &lt;a href="https://www.mongodb.com/docs/atlas/sample-data/" rel="noopener noreferrer"&gt;MongoDB sample data&lt;/a&gt;. Each document in the &lt;code&gt;movies&lt;/code&gt; collection contains details on a single movie, such as title, runtime length, release date, genre, and IMDb rating. It also contains a &lt;code&gt;plot&lt;/code&gt; field with a short summary of the movie’s plot. Let’s assume you want to enable semantic search on this &lt;code&gt;plot&lt;/code&gt; field to allow your users to discover movies using natural language queries.&lt;/p&gt;

&lt;p&gt;Semantic search relies on &lt;a href="https://en.wikipedia.org/wiki/Word_embedding" rel="noopener noreferrer"&gt;vector embeddings&lt;/a&gt; which convert words or phrases into numerical vectors of fixed size. As contextually similar words and phrases also produce similar vector representations, these vectors can capture the meaning of a text. Semantically similar words are mapped to proximate points in the vector space which allows semantic search algorithms to identify relevant search results. As a first step, you need to generate vector embeddings for the text stored in the &lt;code&gt;plot&lt;/code&gt; field of each document.&lt;/p&gt;

&lt;p&gt;Amazon Bedrock supports generating vector embeddings using the &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-service.html#models-supported" rel="noopener noreferrer"&gt;Titan Embeddings model&lt;/a&gt; (&lt;code&gt;amazon.titan-embed-text-v1&lt;/code&gt;). This model can generate embeddings for a maximum input text of 8K tokens and generates vectors with up to 1536 dimensions. Atlas Vector Search currently supports indexing vector embeddings &lt;a href="https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/#configure-fts-field-type-field-properties" rel="noopener noreferrer"&gt;with up to 2048 dimensions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This solution uses the AWS SDK for JavaScript v3 in the Search Lambda function to connect to the embedding model in Amazon Bedrock using the BedrockRuntimeClient.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";

const client = new BedrockRuntimeClient();

const inputText = "Text to create embeddings for."

const input = {
  modelId: "amazon.titan-embed-text-v1", 
  contentType: "application/json",
  accept: "*/*",
  body: JSON.stringify({
    inputText,
  }),
};

const command = new InvokeModelCommand(input);
const response = await client.send(command);


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;After receiving the vector embeddings from Amazon Bedrock, the Lambda function uses the &lt;a href="https://www.mongodb.com/docs/drivers/node/current/" rel="noopener noreferrer"&gt;MongoDB driver for Node.js&lt;/a&gt; to store the generated vector embeddings for the &lt;code&gt;plot&lt;/code&gt; field in a new &lt;code&gt;plot_embedding&lt;/code&gt; field in the MongoDB document.&lt;/p&gt;

&lt;p&gt;All the Lambda functions used in this solution securely connect from an isolated VPC to Amazon Bedrock and MongoDB Atlas using VPC interface endpoints provided by AWS PrivateLink. This enables access to both MongoDB Atlas and Amazon Bedrock as if they were in your VPC, without the use of an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. The path between a VPC endpoint and an AWS or AWS-based service stays within AWS and does not traverse the Internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Indexing vector embeddings and performing the semantic search with Atlas Vector Search&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To store the vector embeddings of the &lt;code&gt;plot&lt;/code&gt; text in the &lt;code&gt;plot_embedding&lt;/code&gt; field, you can use a &lt;a href="https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/#std-label-fts-data-types-knn-vector" rel="noopener noreferrer"&gt;knnVector&lt;/a&gt; type field in MongoDB Atlas. The vector field is represented as an array of numbers (BSON int32, int64, or double data types only).&lt;/p&gt;

&lt;p&gt;Next, you need to index the vector embeddings stored in the &lt;code&gt;plot_embedding&lt;/code&gt; field of each document. MongoDB Atlas enables you to &lt;a href="https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/#define-the-index-for-the-fts-field-type-type" rel="noopener noreferrer"&gt;define a vector search index&lt;/a&gt; on &lt;code&gt;knnVector&lt;/code&gt; type fields with the following configuration:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "plot_embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      }
    }
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To perform search queries on this index, you can use a &lt;code&gt;$vectorSearch&lt;/code&gt; &lt;a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#mongodb-pipeline-pipe.-vectorSearch" rel="noopener noreferrer"&gt;aggregation pipeline&lt;/a&gt; stage. This search query compares the similarity of the vectors stored in the &lt;code&gt;plot_embedding&lt;/code&gt; field with the vector representation of the search query submitted by the user. It uses an &lt;a href="https://en.wikipedia.org/wiki/Nearest_neighbor_search" rel="noopener noreferrer"&gt;approximate nearest neighbor search&lt;/a&gt; approach.&lt;/p&gt;

&lt;p&gt;A query can then look as follows:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{
  "$vectorSearch": {
    "index": "plot_embedding_index",
    "path": "plot_embedding",
    "queryVector": [&amp;lt;array-of-numbers&amp;gt;],
    "numCandidates": 50,
    "limit": 3,
  }
}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;See the &lt;a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#fields" rel="noopener noreferrer"&gt;Vector Search Queries documentation&lt;/a&gt; for a detailed description of fields.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change data capture with Atlas Triggers and Amazon EventBridge&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data is rarely static. To make new documents and documents where fields are updated searchable by semantic search, you can set up a process for automatically embedding new and re-embedding updated fields. For example, in the case of the &lt;code&gt;movies&lt;/code&gt; dataset, you may need to update the plot of some of the movies, which in turn requires an update to the &lt;code&gt;plot_embedding&lt;/code&gt; field for the document.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.mongodb.com/docs/atlas/triggers/" rel="noopener noreferrer"&gt;Atlas Triggers&lt;/a&gt; allow you to execute server-side logic in response to database events or on a schedule. Database triggers are a type of Atlas trigger that allows you to execute server-side logic whenever a document is added, updated, or removed in a linked Atlas cluster.&lt;/p&gt;

&lt;p&gt;There are several ways to configure the types of events that cause a trigger to be executed. First, you can select one or more &lt;a href="https://www.mongodb.com/docs/atlas/triggers/trigger-configuration/#std-label-database-event-operation-types" rel="noopener noreferrer"&gt;database change events&lt;/a&gt; (&lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;REPLACE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt;). Second, you can provide a &lt;a href="https://www.mongodb.com/docs/master/reference/operator/aggregation/match/#mongodb-pipeline-pipe.-match" rel="noopener noreferrer"&gt;match expression&lt;/a&gt; to further filter events based on their properties.&lt;/p&gt;

&lt;p&gt;A database trigger can either execute a &lt;a href="https://www.mongodb.com/docs/atlas/app-services/functions/" rel="noopener noreferrer"&gt;serverless function&lt;/a&gt; with your JavaScript code or send trigger events to an Amazon EventBridge &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-saas.html" rel="noopener noreferrer"&gt;partner event bus&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the case of this sample application, all &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;REPLACE&lt;/code&gt;  change events are sent to an EventBridge event bus and placed on an Amazon Queue Service (Amazon SQS) queue. From there, the ingestion Lambda function consumes batches of change events via &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/invocation-eventsourcemapping.html" rel="noopener noreferrer"&gt;Lambda event source mappings&lt;/a&gt; and creates or updates embeddings for the &lt;code&gt;plot_embeddings&lt;/code&gt; document field.&lt;/p&gt;

&lt;p&gt;Use a &lt;a href="https://www.mongodb.com/docs/manual/reference/operator/aggregation/match/" rel="noopener noreferrer"&gt;match expression&lt;/a&gt; to only forward database events if the document in question either does not have a &lt;code&gt;plot_embeddings&lt;/code&gt; field yet or if the &lt;code&gt;plot&lt;/code&gt; field has changed:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

{"updateDescription.updatedFields.plot":{"$exists":true}}


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Serverless semantic search API with Amazon API Gateway and AWS Lambda&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Finally, you need a scalable and secure API endpoint that you can integrate with your applications and expose to clients. This solution creates a REST API endpoint using Amazon API Gateway. Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. API Gateway offers multiple &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html" rel="noopener noreferrer"&gt;authentication options&lt;/a&gt;, built-in &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html" rel="noopener noreferrer"&gt;caching&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-method-request-validation.html" rel="noopener noreferrer"&gt;request validation&lt;/a&gt;, and many &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-rest-api.html" rel="noopener noreferrer"&gt;other features&lt;/a&gt; that you can configure to integrate this semantic search solution into your project. As a serverless service, you benefit from automatic scaling, built-in high availability, and a pay-for-use billing model.&lt;/p&gt;

&lt;p&gt;Clients send search requests to the &lt;code&gt;/search&lt;/code&gt;endpoint of the REST API and receive a list of relevant search results in response:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

curl --request POST \
  'https://&amp;lt;API endpoint&amp;gt;.execute-api.us-east-1.amazonaws.com/prod/search' \
  --aws-sigv4 "aws:amz:us-east-1:execute-api" \
  --user "${AWS_ACCESS_KEY_ID}:${AWS_SECRET_ACCESS_KEY}" \
  --header "x-amz-security-token: ${AWS_SESSION_TOKEN}" \
  --header 'Accept: application/json' \
  --data '{ "query": "sports" }' \
  | jq .


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The response for this particular request would contain the first three movies, including the &lt;code&gt;_id&lt;/code&gt;, &lt;code&gt;title&lt;/code&gt;, &lt;code&gt;plot&lt;/code&gt;, and &lt;code&gt;score&lt;/code&gt; fields: &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

[
  {
    "_id": "573a1398f29313caabcea388",
    "plot": "Molly is a high school track coach who knows just as much about football as anyone else on the planet. When the football coach's position becomes vacant, she applies for the job, despite ...",
    "title": "Wildcats",
    "score": 0.7063020467758179
  },
  {
    "_id": "573a1397f29313caabce879f",
    "plot": "It started as a friendly meeting between 4 old buddies with their basketball coach and ended up in revealing the truth about their relationship. The meeting forces the five men to reveal ...",
    "title": "That Championship Season",
    "score": 0.6836512088775635
  },
  {
    "_id": "573a1394f29313caabcdf0a6",
    "plot": "Pat's a brilliant athlete, except when her domineering fiance is around. The lady's golf championship is in her reach until she gets flustered by his presence at the final holes. He wants ...",
    "title": "Pat and Mike",
    "score": 0.6823728084564209
  }
]


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Under the hood, incoming search requests are routed from the API Gateway to the Search Lambda function using a &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-lambda-proxy-integrations.html" rel="noopener noreferrer"&gt;Lambda proxy integration.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because embeddings only need to be generated when new data is added or data is updated, event-driven computing with AWS Lambda allows embedding generation to be triggered on-demand rather than running continuously. AWS Lambda is a serverless computing service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling and extending the solution&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This solution serves as a blueprint that can be enhanced and extended to develop your use cases based on a semantic search with MongoDB Atlas and Amazon Bedrock. Keep the following considerations in mind when scaling the production solution.&lt;/p&gt;

&lt;p&gt;The default &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/quotas.html" rel="noopener noreferrer"&gt;Amazon Bedrock quotas&lt;/a&gt; implement rate limits for the API operations performed in this example application. For instance, the default quotas allow 2,000 requests per minute or 300,000 tokens processed per minute to invoke the Amazon Titan Embeddings model. Depending on the volume and size of your embedding API calls, you may need to configure &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html" rel="noopener noreferrer"&gt;provisioned throughput&lt;/a&gt; to get a higher level of throughput for a fixed cost.&lt;/p&gt;

&lt;p&gt;With automatic scaling, built-in high availability, and a pay-for-use billing model, AWS Lambda is well-suited as a computing platform for embedding workloads. To ensure your Lambda functions can handle large numbers of invocations, such as ingesting large amounts of data at once, make sure to manage Lambda function concurrency appropriately. To do this, configure reserved concurrency and provisioned concurrency. For more information about scaling Lambda functions and configuring reserved and provisioned concurrency, see the AWS Lambda Developer Guide.&lt;/p&gt;

&lt;p&gt;Consider enabling &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-caching.html" rel="noopener noreferrer"&gt;API Gateway caching&lt;/a&gt; to increase the responsiveness of the integration and to optimize the cost of repeat requests. Also, set up access logging for the API Gateway with &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/set-up-logging.html" rel="noopener noreferrer"&gt;Amazon CloudWatch &lt;/a&gt;to keep a record of who accessed your API endpoint and how. For an overview of security recommendations for API Gateway, see &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/security-best-practices.html" rel="noopener noreferrer"&gt;security best practices in Amazon API Gateway&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The integration presented in this tutorial follows security best practices such as storing your MongoDB credentials in Secrets Manager and utilizing IAM to secure access to resources in your AWS account. To protect your MongoDB account, you should regularly rotate your MongoDB credentials and update them in Secrets Manager.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article demonstrates how to use MongoDB Atlas Vector Search, Amazon Bedrock, and AWS serverless services to build a secure and scalable semantic search API. This approach allows you to not only use MongoDB Atlas to store your data sets but also to unlock more value by using Atlas Vector Search alongside Amazon Bedrock's serverless API integrations.&lt;/p&gt;

&lt;p&gt;The associated &lt;a href="https://github.com/mongodb-partners/Semantic_Search_API_MongoDB_Atlas_Vector_Search_and_Amazon_Bedrock" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt; contains the solution source code and detailed deployment instructions to get you started. Open a GitHub issue to provide your feedback or create a pull request to extend the solution.&lt;/p&gt;

&lt;p&gt;See the &lt;a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/" rel="noopener noreferrer"&gt;MongoDB Atlas Vector Search documentation&lt;/a&gt; for more information and tutorials.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>vectorsearch</category>
      <category>serverless</category>
      <category>bedrock</category>
    </item>
  </channel>
</rss>
