Romina Elena Mendez Escobar for AWS Community Builders

Posted on Dec 31, 2025 • Edited on Feb 1

From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock

#ai #aws #python #cloud

In recent months, we have increasingly incorporated artificial intelligence into our solutions, and with it a recurring need has emerged: searching and querying our own data using natural language efficiently.

Use cases such as semantic search or building solutions based on Retrieval-Augmented Generation (RAG) are no longer optional. Today, we need to understand the meaning of text, combine it with structured filters, and do so in an efficient and scalable way.
In this article, I explore a recent alternative within the AWS ecosystem: Amazon S3 Vectors 🪣, a serverless approach for vector storage and querying that aims to balance scalability, simplicity, and cost.

To make it more concrete (and a bit more entertaining)...we will work with a dataset of coffee products ☕ and build a complete flow that goes from generating embeddings with Amazon Bedrock 🧠 to an application deployed on AWS with Streamlit ✨, which allows natural language searches combined with filters.

A quick note on embeddings and semantic search

Before diving into the implementation, it is worth briefly clarifying two key concepts used throughout this tutorial:

Embeddings are numerical representations of text that capture semantic meaning. Instead of relying on exact word matching, embeddings map text into high-dimensional vector spaces where semantically similar pieces of text are positioned closer together. This representation allows systems to reason about intent and context rather than purely lexical similarity.
Semantic search builds on top of embeddings by retrieving results based on meaning rather than exact terms. A user query is first transformed into an embedding and then compared against stored vectors using similarity metrics such as cosine or Euclidean distance. This approach enables more flexible, intent-aware searches and can be further refined by combining semantic similarity with structured metadata filters to improve precision and relevance.

What is Amazon S3 Vectors?

Amazon S3 Vectors is a new type of storage within Amazon S3 designed specifically to natively store and query vectors.
In addition to storing vectors, this type of bucket allows associating structured metadata, which enables queries that combine semantic search with filters on those attributes.
Vector buckets support searches based on distance metrics, such as:

Cosine similarity: measures how similar two vectors are based on the angle between them, and is very common in text embeddings.
Euclidean distance: measures the “geometric” distance between two vectors in space. Unlike traditional vector databases, Amazon S3 Vectors makes it possible to implement a fully serverless architecture, achieving a good balance between scalability, operational simplicity, and cost. Below are some of the main benefits of using this functionality:

How do vectors work in Amazon S3?

Amazon S3 Vectors is based on the following main components:

🪣 1. Vector buckets
These are specialized buckets optimized for vector storage.
They support encryption and organize data internally through vector indexes, which enables efficient large-scale searches.

🧭 2. Vector indexes
An index defines how vectors are stored and queried within the bucket.
In addition to the vector, it allows associating metadata, which can later be used in queries through filters with a syntax similar to well-known operators, such as those used in MongoDB.

🔍 3. Queries
Queries are based on similarity searches, using the distance metric configured when creating the index, such as cosine or Euclidean.
These searches can be combined with metadata filters to refine results and reduce ambiguities.

⚙️ 4. API
Amazon S3 Vectors exposes an API that allows querying data through operations such as QueryVectors.
These queries can be executed using tools like the AWS CLI or Boto3, combining a query vector with metadata-based filters and parameters such as the number of results to return or whether to include the distance between vectors.

Process Flow

The previous image shows the complete workflow to implement semantic search using Amazon S3 Vectors, divided into three main stages:

1️⃣ Generate Vector Embeddings

The process starts from the input documents. These documents are sent to an embeddings model, in this case AWS Titan through Amazon Bedrock, which transforms the text into numerical vectors.
At this stage, not only are the vectors generated, but metadata describing each document is also associated.

2️⃣ Store Vector Data

The generated vectors, together with their metadata, are stored in an S3 Vector Bucket.
Within the bucket, the data is organized through one or more vector indexes, defined with a specific distance metric.
Being integrated into AWS, this data can be consumed by other services such as Amazon Bedrock, Amazon SageMaker, or Amazon OpenSearch.

3️⃣ Semantic Search via Vector Index

To perform a search, a natural language query is transformed again into a vector using the same embeddings model.
This query vector, together with metadata filters and the topK parameter, is used to query the vector index and retrieve the most semantically similar results.

Reference Architecture

In this tutorial, the use case is based on processing data initially stored in JSON format, which is transformed into Parquet as part of a data preparation workflow. From this processed data, the Amazon Titan model is invoked through Amazon Bedrock to generate embeddings, which are then stored together with their metadata in an Amazon S3 Vectors bucket, thus enabling semantic queries over the information.

Data processing is carried out through an Amazon Glue job in Python, where a typical clean data stage of any production data pipeline is implemented. In this phase, only the relevant fields are selected, text descriptions are normalized and corrected when necessary, and only after this cleaning is completed is the Titan model invoked. This approach helps optimize costs and performance by avoiding unnecessary model calls on data that will not be used later.

Finally, the data stored in the vector bucket is consumed by an application developed with Streamlit, which is deployed on AWS Elastic Beanstalk within a VPC. The application allows user queries to be transformed back into embeddings and used to query the vector index, combining semantic search with metadata-based filters, while access to services and system observability are managed through IAM roles and CloudWatch Logs.

Amazon Bedrock and Amazon Titan

Amazon Bedrock is a fully managed service that allows developers to build, deploy, and scale applications powered by artificial intelligence without the need to manage infrastructure. Through a unified API, Bedrock provides access to foundation models from different providers, making their integration into cloud architectures simple and secure.

For this tutorial, we use Amazon Titan Text Embeddings V2, a model available in Bedrock that can process up to 8,192 tokens or 50,000 characters and generate 1,024-dimensional vectors. This model is optimized for information retrieval tasks, semantic search, similarity measurement, and clustering, making it a suitable choice for RAG scenarios and large-scale text analysis.

Amazon Elastic Beanstalk

Amazon Elastic Beanstalk is a managed service that allows you to deploy and run web applications without the need to directly manage the underlying infrastructure. It automatically handles resource provisioning, load balancing, scaling, and monitoring, allowing the focus to remain on application development rather than operations.
In this tutorial, we use Elastic Beanstalk to deploy the application developed with Streamlit, taking advantage of its native integration with services such as EC2, Auto Scaling, and CloudWatch, which enables a fast, secure, and scalable deployment.

Below is a summary of some of the main benefits of using this solution:

📊 Dataset

The dataset used in this tutorial was obtained from the Amazon Reviews 2023 project, presented in the paper Bridging Language and Items for Retrieval and Recommendation (Hou et al., 2024). This dataset contains reviews and metadata for Amazon products, including titles, descriptions, categories, stores, and ratings.
For this use case, only the “Grocery_and_Gourmet_Food” category was selected, and within it, products related to coffee were filtered. This allows us to work with rich textual information and structured attributes that are ideal for semantic search scenarios.
The project repository includes both the filtered coffee product datasets and the already processed versions containing vector embeddings, making it easier to reproduce the tutorial and analyze the complete workflow.

Use Case

The use case presented in this tutorial starts from a simple but representative scenario: a user who wants to query coffee products using natural language, exploring the available catalog in a more flexible and intuitive way than a traditional search.

To enable this type of query, different textual attributes of the product are used, such as the title, description, and category, which helps better capture user intent. Within the dataset, several coffee-related categories are included, such as Coffee, Instant Coffee, Ground Coffee, Whole Coffee Beans, Single-Serve Capsules & Pods, Iced Coffee & Cold-Brew, among others.

Based on this, an application is designed in which the user can interact primarily through natural language, while complementing the search with structured filters to reduce ambiguities. These filters include, for example, product rating, store name (a detail that users often do not know or remember precisely), and price, allowing more accurate and relevant results without relying exclusively on a textual query.

Prerequisites

(1) 🗂️ Code repository

To follow this tutorial, it is necessary to clone the project repository, where the complete solution code is available.
In the following sections, the most relevant aspects of the implementation and design decisions are highlighted, rather than providing an exhaustive walkthrough of the entire source code.
If you find this tutorial useful, do not forget to leave a star ⭐️ on the repository and follow me to receive notifications about new articles. Your support helps keep creating valuable technical content for the community 🚀

RominaElenaMendezEscobar / s3-vector-coffee-tutorial

S3 Vector tutorial using cafe data and creating a Streamlit app deployed on Elastic Beanstalk

From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock

Use cases such as semantic search or building solutions based on Retrieval-Augmented Generation (RAG) are no longer optional. Today, we need to understand the meaning of text, combine it with structured filters, and do so in an efficient and scalable way In this article, I explore a recent alternative within the AWS ecosystem: Amazon S3 Vectors 🪣, a serverless approach for vector storage and querying that aims to balance scalability, simplicity, and cost.

To make it more concrete (and a bit more entertaining)...we will work with a dataset of coffee products ☕ and build a complete flow that goes from generating embeddings…

View on GitHub

(2) 🪣 Create Amazon S3 buckets

As part of this workflow, we need two Amazon S3 buckets:

A standard bucket to store raw and processed data.
An Amazon S3 Vectors bucket to store vectors and their metadata. In this tutorial, the following names are used as references:

AWS_BUCKET_NAME = "coffee-products-tutorial-full-data"
AWS_BUCKET_VECTOR_NAME = "coffee-products-tutorial"
AWS_INDEX_VECTOR_NAME = "idx-coffee-products"

(2.1) 🪣 Creating the S3 Vectors bucket

The first step is to create the vector bucket from the Amazon S3 console, in the Vector buckets section, select Create vector bucket and define a unique name for the bucket.
In the encryption configuration, you can use Amazon S3–managed encryption (SSE-S3), which is sufficient for this use case. It is worth noting that this setting cannot be modified later, so it is important to define it correctly from the beginning.

(2.2) 🧭 Creating the vector index

Once the bucket is created, the next step is to define a vector index, which will be responsible for organizing and querying the vectors efficiently.

During this configuration, three key aspects must be specified:

Index name, which must be unique within the bucket.
Vector dimension, which must match the output of the embeddings model (in this case, 1,024 dimensions for Amazon Titan).
Distance metric, where you can choose between cosine or Euclidean. For text embeddings, cosine similarity is usually the most commonly used option.

Like the bucket, the index also inherits the encryption configuration, and this cannot be modified once it has been created.

(3) 🔐 Policies

To work on this project, it is necessary to configure a set of IAM policies that allow access to the different services involved in the workflow.

In particular, the following are required:

Amazon Titan policy: allows invoking the Amazon Titan embeddings model through Amazon Bedrock to generate vectors from text.
Amazon S3 policy: enables reading and writing data in the Amazon S3 bucket used to store raw and processed data.
Amazon S3 Vectors policy: allows writing and querying vectors, along with their metadata, in the Amazon S3 Vectors bucket.

Finally, these policies are attached to an IAM role that is used by the application deployed on AWS Elastic Beanstalk, ensuring controlled and secure access to the required resources.

All the policies mentioned are available in the project repository.

🛠️ Implementation Guide

✅ Step 1: Dataset

As mentioned earlier, we start from a dataset in JSON format, which we download and then process into Parquet, since this format is more efficient for reading, storage, and processing in data pipelines.
The dataset used in this tutorial is available in my repository, inside the data/ folder.

⚙️ Step 2: Process data (embedding generation)

To generate the embeddings, we use a class that I created to simplify the code and encapsulate the interaction with Amazon Bedrock. By default, the class uses the amazon.titan-embed-text-v2:0 model, although the design allows it to be easily changed if you want to try another model.

This class includes three main methods:

create_client(): creates the Bedrock Runtime client with Boto3, using region and credentials.
get_embeddings(text): invokes the Titan model by sending the text and returns the generated vector.
generate_embeddings_batch(texts): generates embeddings in batches by iterating over a list of texts and showing progress with tqdm.

class EmbeddingsGenerator:
   def __init__(self,
                MODEL_NAME:str='amazon.titan-embed-text-v2:0',
                AWS_ACCESS_KEY_ID:str='',
                AWS_SECRET_ACCESS_KEY:str='',
                AWS_REGION:str=''
                ):
       self.MODEL_NAME = MODEL_NAME
       self.AWS_ACCESS_KEY_ID = AWS_ACCESS_KEY_ID
       self.AWS_SECRET_ACCESS_KEY = AWS_SECRET_ACCESS_KEY
       self.AWS_REGION = AWS_REGION


   def create_client(self):
       client = boto3.client(
               service_name='bedrock-runtime',
               region_name=self.AWS_REGION,
               aws_access_key_id=self.AWS_ACCESS_KEY_ID,
               aws_secret_access_key=self.AWS_SECRET_ACCESS_KEY
           )
       return client

   def get_embeddings(self, text:str):
       client = self.create_client()
       response = client.invoke_model(
           modelId=self.MODEL_NAME,
           body=json.dumps({
               "inputText": text
           })
       )
       response_body = json.loads(response['body'].read())
       embeddings = response_body['embedding']
       return embeddings

   def generate_embeddings_batch(self, texts:list):
       embeddings_list = []
       for text in tqdm(texts):
           embeddings = self.get_embeddings(text)
           embeddings_list.append(embeddings)
       return embeddings_list

To run it locally, you need a .env file with your credentials and region:

AWS_ACCESS_KEY=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY
AWS_REGION=YOUR_AWS_REGION

And a minimal usage example would be the following:

import os
import boto3
from dotenv import load_dotenv


load_dotenv()


AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY')
AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY')
AWS_REGION = os.getenv('AWS_REGION')


emb_generator = EmbeddingsGenerator(
    AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID,
    AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY,
    AWS_REGION=AWS_REGION
)


input_text = "instant coffee sweet creamy vanilla flavor"
query_embedding = emb_generator.get_embeddings(text=input_text)

🪣 Step 3: Store data (S3 + S3 Vectors)

To simplify data ingestion, I created an S3 class that encapsulates access to both the standard S3 bucket and the Amazon S3 Vectors bucket. The idea is to keep the code clean and reusable, separating connection logic from write logic.

This class includes three main methods:

create_client(): creates a Boto3 client for the specified service (s3 or s3vectors).
upload_file(): uploads files to the standard S3 bucket (useful for raw and processed data).
upload_vector_data(): loads vectors into S3 Vectors using * put_vectors, sending them in batches to respect the per-request limit.
query_embedding(): enables semantic search by querying the vector index using an embedding and optional metadata filters, returning the most relevant results ranked by similarity.

lass S3:
   """Class to handle S3 operations including uploading files and vector data"""
   def __init__(self,
                AWS_ACCESS_KEY_ID:str='',
                AWS_SECRET_ACCESS_KEY:str='',
                AWS_REGION:str='',
                AWS_BUCKET_NAME:str='',
                AWS_BUCKET_VECTOR_NAME:str='',
                AWS_INDEX_VECTOR_NAME:str=''
                ):
       self.AWS_ACCESS_KEY_ID = AWS_ACCESS_KEY_ID
       self.AWS_SECRET_ACCESS_KEY = AWS_SECRET_ACCESS_KEY
       self.AWS_REGION = AWS_REGION
       self.AWS_BUCKET_NAME = AWS_BUCKET_NAME
       self.AWS_BUCKET_VECTOR_NAME = AWS_BUCKET_VECTOR_NAME
       self.AWS_INDEX_VECTOR_NAME = AWS_INDEX_VECTOR_NAME


   def create_client(self, service_name:str='s3'):
       """
       Create a boto3 client for the specified AWS service.
       """
       s3_client = boto3.client(
           service_name=service_name,
           region_name=self.AWS_REGION,
           aws_access_key_id=self.AWS_ACCESS_KEY_ID,
           aws_secret_access_key=self.AWS_SECRET_ACCESS_KEY
       )
       return s3_client


   def upload_file(self, file_name:str, object_name:str):
       """
       Upload a file to an S3 bucket.
       """
       s3_client = self.create_client()
       s3_client.upload_file(Filename=file_name, Bucket=self.AWS_BUCKET_NAME, Key=object_name)
       print(f"File {file_name} uploaded to bucket {self.AWS_BUCKET_NAME} as {object_name}")


   def upload_vector_data(self, data:list, batch_size:int=100):
       """
       Upload vector data to S3 Vectors in batches with tqdm for progress tracking.
       batchsize: it is the number of vectors per batch to avoid exceeding maximum size.
       """
       s3_vector_client = self.create_client(service_name='s3vectors')


       # Helper for chunking data into batches
       def chunked(lst, size):
           for i in range(0, len(lst), size):
               yield lst[i:i + size]


       batches = list(chunked(data, batch_size))


       # see the progress of the upload
       for i, batch in enumerate(tqdm(batches, desc="Uploading batches"), start=1):
           try:
               s3_vector_client.put_vectors(
                   vectorBucketName=self.AWS_BUCKET_VECTOR_NAME,
                   indexName=self.AWS_INDEX_VECTOR_NAME,
                   vectors=batch
               )
           except Exception as e:
               print(f"Error uploading batch {i}: {e}")

   def query_embedding(self,
              query_embedding:list,
              filter_data:dict=None,
               top_k=3):
       """Perform complete search with text and filters"""
       s3_vector_client = self.create_client(service_name='s3vectors')

       # Prepare base parameters
       query_params = {
           "vectorBucketName": self.AWS_BUCKET_VECTOR_NAME,
           "indexName": self.AWS_INDEX_VECTOR_NAME,
           "queryVector": {"float32": query_embedding},
           "topK": top_k,
           "returnDistance": True,
           "returnMetadata": True
       }

       # Only add filter if exists
       if filter_data:
           query_params["filter"] = filter_data

       # Execute search
       query_result = s3_vector_client.query_vectors(**query_params)
       return query_result['vectors']

To upload vectors to S3 Vectors, we first need to build the structure expected by put_vectors. Each item must include a key (a unique identifier in string format), the vector in data.float32, and a metadata object with the attributes that we will later use as filters in queries.
In addition, since no more than 100 vectors can be sent per request, the upload is performed in batches controlled by the batch_size parameter.

vector_data = []


for i in range(data_coffee_filter.shape[0]):
   vector_data.append({
       "key": str(data_coffee_filter['id'][i]),  # always need to be string
       "data": {
           "float32": data_coffee_filter['embeddings'][i]
       },
       "metadata": {
           "average": float(data_coffee_filter['average_rating'][i]),
           "rating_number": int(data_coffee_filter['rating_number'][i]),
           "price": float(data_coffee_filter['price'][i]),
           "shop_name": str(data_coffee_filter['shop_name'][i])
       }
   })


s3 = S3(
   AWS_ACCESS_KEY_ID=AWS_ACCESS_KEY_ID,
   AWS_SECRET_ACCESS_KEY=AWS_SECRET_ACCESS_KEY,
   AWS_REGION=AWS_REGION,
   AWS_BUCKET_NAME=AWS_BUCKET_NAME,
   AWS_BUCKET_VECTOR_NAME=AWS_BUCKET_VECTOR_NAME,
   AWS_INDEX_VECTOR_NAME=AWS_INDEX_VECTOR_NAME
)


s3.upload_vector_data(vector_data)

🔍 Step 4: Retrieve (QueryVectors + filters)

To retrieve results from Amazon S3 Vectors, the flow is always the same. First, we convert a natural language query into an embedding (vector) using the same model that was used during indexing. Then, we execute query_vectors, passing that vector as queryVector. From there, the service returns the top K most similar vectors according to the distance metric configured in the index (Cosine or Euclidean) and optionally, we can apply metadata filters to reduce ambiguity and improve precision.

The most important query_vectors parameters are:

queryVector: the embedding of the search text (in the format {"float32": [...]}).
topK: how many results we want to retrieve.
filter: filters based on the metadata stored together with the vector (for example shop_name, average, price).
returnDistance: whether to return the distance or similarity for each result. This is useful for applying a threshold and discarding results that are close but not very relevant.
returnMetadata: whether to also return the metadata associated with the vector, to display information in the app or apply additional logic.

To reduce the complexity of query implementation, a helper method is provided and encapsulated within the S3 utility class. This abstraction centralizes the interaction with Amazon S3 Vectors, simplifying semantic search execution and making the codebase cleaner, more reusable, and easier to maintain.

Amazon S3 Vectors, simplifying semantic search execution and making the codebase cleaner, more reusable, and easier to maintain.

Query Examples with Metadata Filters

🔎 Query by Single Metadata Field (Exact Match)

Example: filter by shop_name

s3.query_embedding( query_embedding=query_embedding,
                  filter_data={"shop_name": "nescafé"})

response

[{'distance': 0.41610199213027954,
  'key': 'de46725d-ef52-47ca-80e2-f1ba82c0353d',
  'metadata': {'price': 11.48,
   'shop_name': 'nescafé',
   'average': 4.4,
   'rating_number': 248}},
 {'distance': 0.47703248262405396,
  'key': '03915b9f-e592-40ec-b806-bd06b4213d90',
  'metadata': {'price': 13.4,
   'average': 3.6,
   'shop_name': 'nescafé',
   'rating_number': 471}},
 {'distance': 0.514411211013794,
  'key': '5037ea28-b789-427a-9b1f-d825ad68dd2d',
  'metadata': {'rating_number': 3052,
   'shop_name': 'nescafé',
   'average': 4.4,
   'price': 17.75}}]

🔢 Query Using Comparison Operators

In filters, you can use comparison operators, for example:

$gt: greater than
$gte: greater than or equal
(and others such as $lt, $lte, $eq, $ne depending on the case)

Here you can find more information about the commands you can use:
https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html

Example: average rating greater than or equal to 4.2

s3.query_embedding( query_embedding=query_embedding,
                  filter_data={"average": {"$gte": 4.2}})

🔗 Query with Combined Conditions

When you need more than one condition, you can combine filters with:

$and: logical AND between multiple conditions
$or: logical OR between multiple conditions

Example: average rating ≥ 4.2 AND price ≤ 20

s3.query_embedding( query_embedding=query_embedding,
                  filter_data={
       "$and": [
           {"average": {"$gte": 4.2}},
           {"price": {"$lte": 20.0}}
       ]
   })

🖥️ Step 5: App (Streamlit)

While developing this tutorial, I realized that although it is possible to run the entire flow directly from Python code, it is not the most comfortable approach for an end user. For this reason, I decided to build a web application using Streamlit, a framework that allows you to create interactive interfaces in Python with very few lines of code.

In the repository, you will find a single file called app.py, which contains all the application logic. This makes it easy to clearly see how embedding generation, querying Amazon S3 Vectors, and result visualization are integrated, while keeping the focus on a simple and straightforward flow.

Streamlit provides an API with many interactive components such as text inputs, selectors, sliders, and chat-oriented elements. These components are ideal for this use case. For more details about the available components, you can check the official documentation:
https://docs.streamlit.io/develop/api-reference

🚀 Step 6: Configure the project to deploy the app (Elastic Beanstalk)

To deploy the application on AWS Elastic Beanstalk, we will package the project into a .zip with a specific structure. Beanstalk uses these files to configure the environment, install dependencies, and define how the app is executed when the instance starts.

    app.zip
    |__ 📂.ebextensions/
    |    |__ 📄 iam-role.config
    |    |__ 📄 securitygroup.config
    |__ 📂img/
    |    |__🏞️ preview_app.png
    |__ 📄 .ebignore
    |__ 📄 app.py
    |__ 📄 Procfile
    |__ 📄requirements.txt

📁 .ebextensions/iam-role.config (instance IAM role)

This file configures which IAM Instance Profile the Elastic Beanstalk instance will use. It is key because that role is what allows your app to have permissions to invoke Bedrock and query S3 and S3 Vectors (based on the policies you defined).

option_settings:
  aws:autoscaling:launchconfiguration:
    IamInstanceProfile: ElasticBeanstalk-CoffeeApp-Role

🔒 .ebextensions/securitygroup.config (restrict access by IP)

By default, the app is publicly accessible (depending on how the environment is configured). In this case, this configuration restricts access to the application only to your IP by adding inbound rules to the Beanstalk security group for HTTP (80) and HTTPS (443). This is useful in test environments or demos to prevent unwanted access.

Tip: you can get your public IP by searching “what is my ip” and replace

Resources:
  httpSecurityGroupIngress: 
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: {"Fn::GetAtt" : ["AWSEBSecurityGroup", "GroupId"]}
      IpProtocol: tcp
      ToPort: 80
      FromPort: 80
      CidrIp: <your_ip>/32

  httpsSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      GroupId: {"Fn::GetAtt" : ["AWSEBSecurityGroup", "GroupId"]}
      IpProtocol: tcp
      ToPort: 443
      FromPort: 443
      CidrIp: <your_ip>/32

🚫 .ebignore

This file works like a .gitignore, but for deployment. It indicates which files should not be uploaded to Elastic Beanstalk. This helps avoid including credentials, system junk, or unnecessary files that increase the package size.

🖥️ app.py (Streamlit application)

This is the main application file, where the Streamlit interface and the logic to generate embeddings, query S3 Vectors, and display results are defined. In this tutorial, the entire app lives in this single file to keep it simple and easy to follow.

🧾 Procfile (startup command)

Elastic Beanstalk needs to know which command to run to start your application. The Procfile defines that entrypoint. In this case, we start Streamlit listening on 0.0.0.0 to accept external traffic, and using a port defined for the environment.

web: streamlit run app.py --server.port=8000 --server.address=0.0.0.0

📦 requirements.txt (dependencies)

This file lists the libraries required for the app to run. Beanstalk installs them automatically during deployment.

🚀 Step 7: Deploy the solution

(1) Create a new application

In this step, a new application is created in AWS Elastic Beanstalk, which acts as the logical container for the project.
You only need to define an application name and, optionally, a short description.

(2) Environment

In this step, the environment where the application will be deployed is configured. For this use case, a Web server environment is selected, since it is a web application built with Streamlit that exposes an HTTP interface for users.

By default, Elastic Beanstalk suggests an environment name based on the application name, which is sufficient for this tutorial. This environment will be responsible for running the app, handling traffic, and applying scaling and monitoring configurations in the following steps.

(2) Environment – Step 1: Configure environment

In this step, the basic environment parameters are defined:

Environment tier: select Web server environment, since the application exposes a web interface over HTTP.
Application name: automatically filled with the name defined in the previously created application.
Environment name: name of the environment; the default suggested value can be used.
Domain: can be left empty so that Elastic Beanstalk automatically generates the subdomain.
Platform:
- Platform: Python
- Platform branch: Python 3.11 running on 64bit Amazon Linux 2023
- Platform version: leave the default recommended version.
Application code:
- Select Upload your code.
- Upload the .zip file generated previously.
Presets: Select Single instance (free tier eligible) for this tutorial.

(2) Environment – Step 2: Configure service access

In this step, the IAM roles that allow Elastic Beanstalk and EC2 instances to access AWS resources are configured:

Service role: the role that Elastic Beanstalk uses to create and manage the environment (Auto Scaling, Load Balancer, logs, etc.).
EC2 instance profile: the role used by the EC2 instances where the application runs.This role must include the necessary policies to access Amazon Bedrock, Amazon S3, and Amazon S3 Vectors.
EC2 key pair (optional): can be omitted if SSH access to the instances is not required. With this configuration, the application is correctly authorized to interact with AWS services in a secure manner.

(2) Environment – Step 3: Set up networking, database, and tags (optional)

In this step, the network where the environment will run is configured. For this tutorial, the default VPC values are used, making only the following adjustments:

VPC: select the account’s default VPC to simplify the configuration.
Public IP address: enable it so the application is accessible from the Internet.
Instance subnets: select two subnets in different Availability Zones, as shown in the image. Selecting more than one subnet allows Elastic Beanstalk to distribute instances across multiple Availability Zones, improving resilience and fault tolerance, even when using a simple deployment for tests or demos.

The remaining options (database and tags) can be left unconfigured for this use case.

(2) Environment – Step 4: Configure instance traffic and scaling

In this step, how the application runs and what type of resources it uses are defined:

Environment type: select Single instance, which is sufficient for this tutorial and helps reduce costs.
Fleet composition: use On-Demand instance, avoiding the complexity of Spot instances.
Architecture: choose x86_64 to ensure compatibility with all Python dependencies.
Instance type: select a lightweight type such as t3.small, suitable for running a low-consumption Streamlit application.
Monitoring and metadata: keep the default values, enabling CloudWatch metrics and using IMDSv2.

This configuration allows the application to be deployed in a simple, stable, and cost-effective way, ideal for tests, demos, and development environments.

(2) Environment – Step 5: Configure updates, monitoring, and logging

In this step, monitoring, update, and observability options for the environment are configured:

Monitoring: enable basic or enhanced monitoring so Elastic Beanstalk reports instance metrics to CloudWatch.
Health reporting: allows you to visualize the application status and detect failures early.
Managed platform updates: automatic environment updates (minor and patch) can be enabled during a defined weekly window.
Email notifications: allows configuring an email address to receive notifications about relevant environment events.
**Rolling updates and deployments: **defines how deployments and configuration changes are applied (for this tutorial, default values can be used).
**Logs: **enable sending instance logs to CloudWatch Logs to facilitate debugging and observability.
*Environment properties: * here you can define environment variables required by the application (for example AWS region, bucket names, or other configuration values the app needs).

With this configuration, the environment is prepared to operate in a stable and observable way, with controlled updates and no additional adjustments required for this use case.

🚀 Step 7: Validate the application deployment (Elastic Beanstalk)

Once the application is deployed, it is important to validate that everything is working correctly:

(1) Environment status

The first step is to verify that the environment status is Health: OK. This indicates that Elastic Beanstalk was able to start the application correctly and that no critical errors were detected during deployment.

(2) Application access

If the status is correct, you can click on the** environment domain** to access the application from the browser and confirm that the Streamlit interface loads correctly.

(3) Log review

If the application does not work as expected or the status is not OK, go to the Logs tab. From there, you can request logs, and it is recommended to download the last 100 records to make error analysis easier.

(4) Deploy a new version

If an issue is detected in the logs and the code needs to be fixed, you can deploy a new version using the Upload and deploy button. In this step, you only need to upload the updated .zip file and assign a new application version.

🧩 Conclusions

This tutorial presents a complete workflow for processing and querying data through semantic search, where it is essential not to lose sight of best practices in data cleaning and the correct definition of metadata. Metadata plays a fundamental role in guiding searches, reducing the amount of information queried, and significantly improving the relevance of results.

During the tests performed,** query performance** was notably fast, to the point that in some cases the spinner implemented in the application barely had time to appear. This shows that Amazon S3 Vectors can deliver suitable performance even for interactive, end-user–oriented scenarios.

When exploring the Boto3 API, it becomes apparent that some features commonly found in traditional databases are still missing, such as aggregated statistics or an equivalent of count(*). Currently, to determine the number of stored vectors, it is necessary to use operations like list_vectors with pagination. This suggests that, as a relatively new feature, there are clear opportunities for improvement in future versions of the service.

On the other hand, AWS Elastic Beanstalk proves to be a very good solution for deploying this type of application quickly and easily. However, in production scenarios, combining it with tools such as Terraform and CI/CD pipelines would allow deployments to be automated and manual intervention to be further reduced. In this tutorial, a console-based deployment was chosen to keep complexity under control and focus on the main use case.

Finally, this approach demonstrates how unstructured text analysis use cases, combined with structured data, offer a very compelling balance. In particular, building a chat-like interface that does not rely exclusively on natural language, but also incorporates explicit filters, makes it possible to create a hybrid model that improves precision, reduces ambiguity, and enriches the search experience.

📚 References

Amazon Web Services. (s.f.). Amazon S3 Vectors: Revolutionizing AI data storage with use cases. AWS re:Post. https://repost.aws/articles/ARY9EKiGFISfisAyvigDX3lQ/amazon-s3-vectors-revolutionizing-ai-data-storage-with-use-cases
Amazon Web Services. (s.f.). Amazon S3 Vectors. https://aws.amazon.com/es/s3/features/vectors/
Amazon Web Services. (s.f.). Vector buckets for Amazon S3. https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-buckets-details.html
Amazon Web Services. (s.f.). Metadata filtering for Amazon S3 Vectors. https://docs.aws.amazon.com/es_es/AmazonS3/latest/userguide/s3-vectors-metadata-filtering.html
Hou, Y., Li, J., He, Z., Yan, A., Chen, X., & McAuley, J. (2024). Bridging language and items for retrieval and recommendation. https://amazon-reviews-2023.github.io/
Streamlit Inc. (s.f.). Streamlit API reference. https://docs.streamlit.io/develop/api-reference
Amazon Web Services. (s.f.). Amazon S3 Vectors. https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3vectors.html

📌 How to cite this article

APA style

Mendez Escobar, Romina Elena. (2025). From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock.

https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b

BibTeX


text
@article{mendez2025aiawscoffee,
  title  = {From Coffee Products to AI Search: Building a Serverless Semantic Search Architecture with Amazon S3 Vectors and Bedrock},
  author = {Mendez Escobar, Romina Elena},
  year   = {2025},
  url    = {https://dev.to/aws-builders/from-coffee-products-to-ai-search-building-a-serverless-semantic-search-architecture-with-amazon-5g5b}
}

Top comments (1)

Paul SANTUS AWS Community Builders • Jan 4

Very interesting and complete blog post!

A few questions

What are the IaC options available for S3 vectors?
What about cost?
Why a reduced dataset? Did you experience any trouble with larger scale?
What are the advantages in managing the embedding process manually vs using Bedrock KB?
You mentioned lightning fast perfs. How much time is spent actually searching for the vectors vs. converting search text into embedding?