DEV Community

mgbec for AWS Community Builders

Posted on • Originally published at Medium on

I go by the name of Vector — Using AWS S3 vector storage for cost effective and performant…

I go by the name of Vector — Using AWS S3 vector storage for cost effective and performant Retrieval Augmented Generation

We’re seeing a rapid expansion in methods to empower GenAI, including many ways to help our systems keep their datasets current and completely applicable to their use case. One of the classic and adaptable ways to do this is with RAG (Retrieval Augmented Generation) functionality.

This capability has been available with AWS Bedrock Knowledge Bases for quite a while — https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html. Knowledge Bases use vector storage under the hood. A vector database is a specialized database that stores both structured and unstructured data (text, images, audio) as numerical arrays called vector embeddings, letting you perform extremely fast similarity searches based on meaning, not just keywords.

Why would you want to use a Vector database?

  • Semantic Search: Understands context and meaning, not just keywords
  • Unstructured Data Handling: Manages complex data like images, audio, and documents by representing them as vectors, allowing similarity searches.
  • AI/ML Enablement: you can include specific business knowledge or data that is more up to date than a previously trained model.
  • Scalable & Fast: Designed for quick “nearest neighbor” similarity searches across billions of items

AWS Bedrock has had the OpenSearch Serverless capability for quite a while but there are many other options available. In this article, I will walk through creating a very economical vector database using AWS S3 Vectors and demonstrate the usefulness with a quick project.

PROJECT

PREREQUISITES
AWS CLI configured with appropriate permissions
Terraform >= 1.5
Python 3.12 with uv package manager
Docker Desktop (for Lambda packaging)

1.PERMISSIONS

Our first step is making sure we have the AWS permissions to create our project.

I created an identity based policy similar to the Administrative access policy shown here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors-iam-policies.html. I created a group for my project and attached these permissions to that group:

I added my IAM user to this group and was ready for the next step.

2.SAGEMAKER EMBEDDING ENDPOINT

For this particular case, I am going to use Terraform to create a Sagemaker embedding endpoint using a model from HuggingFace. A SageMaker endpoint is a secure, HTTPS URL that hosts a trained machine learning model, providing a managed, scalable API for handling the underlying infrastructure like servers and auto-scaling.

AWS Sagemaker gives us a great deal of flexibility with model usage. AWS provides prebuilt inference images (Deep Learning Containers / SageMaker prebuilt images) in region-specific ECR registries or the public ECR gallery. The Hugging Face SageMaker inference container image reads the Hugging Face Model ID and pulls that model from the Hugging Face Hub when the container starts.

Main.tf creates an IAM role, model definition, serverless configuration, and live endpoint for an embedding service. The serverless architecture scales to zero when not in use. (https://github.com/mgbec/despicable-me/blob/main/main.tf)

My variables.tf specifies the AWS region, the Sagemaker container URI, and the embedding model I am using in this case: BAAI/bge-m3. (https://github.com/mgbec/despicable-me/blob/main/variables.tf)

Your Terraform outputs will give you the sagemaker_endpoint_arn and sagemaker_endpoint_name. You will want to add the endpoint name to your .env file similar to:

SAGEMAKER_ENDPOINT=despme — embedding-endpoint

3.VECTOR BUCKET

We get to create our S3 vector bucket now. I am in the console, in S3 and I am naming my bucket “my-despicable-bucket12212025”.

You could specify the type of encryption for the new bucket, but I am going to leave it with the default.

Add the bucket name to your .env file:

VECTOR_BUCKET=my-despicable-bucket12212025

4.INDEX

You’ll need to create an index for your vector bucket. The index is like an index in a book and will organize everything in the vector bucket for faster searches. My index is named despme-index.

Update this in your .env file

5.DIMENSION of embedding model

What is the dimension? The dimension refers to the number of numerical values used to represent an item (like a word, image, or product) as a vector, capturing its meaning and relationships. Higher dimensions often mean richer context but more computation, while lower dimensions are faster but might miss nuances. The value you put in the dimension field will partially depend on your model. For example, the Qwen3-Embedding model supports user-defined output dimensions ranging from 32 to 1024, while OpenAI’s text-embedding-3-large model defaults to 3,072 dimensions. Some generalities for use cases are:

  • 128–300 Dimensions: Good for simpler tasks, keyword matching, or smaller datasets; models like Word2Vec use around 300.
  • 512–1024 Dimensions: Excellent for complex tasks like semantic search in NLP, capturing richer meaning, often a sweet spot for modern models.
  • 1024+ Dimensions: Used by very powerful models (like text-embedding-3-large), offering high accuracy but requiring more storage and computation.

I am using BGE-M3 at 384 dimensions but it is capable of a larger number of dimensions.

6.LAMBDA FUNCTION

We need to create a Lambda function to ingest our data into our vector bucket.

Package.py (https://github.com/mgbec/despicable-me/blob/main/ingest/package.py) bundles your AWS Lambda function’s code and all its required dependencies (libraries, configuration files, etc.) that you use to deploy the function to the AWS Lambda service.

You can run the creation process in uv with “uv run package.py”

The output of this is a zip file with all of the pieces required for the Lambda function that we will deploy through Terraform in the next step.

7.INGESTION

Now, we need to set up Terraform to deploy the rest of the infrastructure for our ingestion pipeline. The main files we will talk about here are:

terraform.tfvars: this specifies your AWS region for the ingestion infrastructure, your SageMaker endpoint name, and your s3 vector index name

https://github.com/mgbec/despicable-me/blob/main/ingest/terraform/terraform.tfvars

main.tf: creates IAM permissions for the Lambda (to write to CloudWatch, our S3 bucket, call the SageMaker embedding endpoint, and perform S3 Vector operations).
Adds some settings for our S3 Vector bucket
Creates our Lambda function for ingestion using environmental variables
Creates an API Gateway, Lambda integration, and API Stage

my version — https://github.com/mgbec/despicable-me/blob/main/ingest/terraform/main.tf

run Terraform init, apply, and add these output values to your .env file:
VECTOR_BUCKET=
DESPME_API_ENDPOINT=
DESPME_API_KEY=

8.TEST INGEST

Can you send documents via the API?

curl -X POST https://xyz.execute-api.us-east-1.amazonaws.com/prod/ingest \
-H “x-api-key: Put your API Key here” \
-H “Content-Type: application/json” \
-d ‘{“content”: “Test document”, “metadata”: {“source”: “test”}}’

9.TEST SEARCH

curl -X POST https://your-api-gateway-url/search \
-H “x-api-key: your-api-key” \
-H “Content-Type: application/json” \
-d ‘{
“query”: “escape the Moon”,
“k”: 5
}’

The score (0–1) indicates similarity — higher scores mean more relevant matches. You can use your very cost effective vector database in a number of ways, one quick way to make use of it is adding it to your project in Bedrock. You can put it into any scenario that requires an updated source of information that can be queried with natural language. It is also easy to amend or add to the knowledge base as your information changes.

There are some other scripts we can test with in my repo: https://github.com/mgbec/despicable-me/tree/main/ingest/scripts
check_model_dimensions.py
search_despicable_me.py
test_api_gateway.py
test_despicable_me_docs.py

SECURITY

We need to think about the security of our pipeline, of course.

ENCRYPTION and DATA SECURITY- There are quite a few interesting encryption techniques to consider, including distance-preserving encryption (property-preserving encryption that encrypts data, often vectors, while maintaining the relative distances between them, allowing for functions like nearest neighbor search and clustering on encrypted data without decryption).Homomorphic encryption is a cryptographic method allowing computations (like addition, multiplication) directly on encrypted data without decryption, producing an encrypted result that yields the same outcome as if operations were done on the original plain data. As much as I like to read about these techniques, I am leaving the details to AWS.

AWS Vector databases are encrypted at rest and in transit. Additionally, for data security, Bedrock Guardrails and Amazon Comprehend can automatically identify and redact or mask sensitive information (PII) before it is stored in the vector database.

API GATEWAY — API key in use with rate limiting, burst limit, and quotas

IDENTITY and ACCESS MANAGEMENT (IAM) — Lambda can only access its specific bucket and SageMaker endpoint, the SageMaker role limited to model execution, and no cross-service or cross-account access is allowed.

NETWORK SECURITY — Vector databases can be deployed within an Amazon Virtual Private Cloud (VPC), which creates a private, isolated network environment. VPC endpoints ensure that traffic to and from the database remains within the AWS network and does not traverse the public internet. Security groups and services like Shield control inbound and outbound traffic.

MONITORING and COMPLIANCE — AWS CloudTrail logs API calls and operations, providing an audit trail for monitoring and compliance requirements. Amazon GuardDuty monitors VPC flow logs and CloudTrail events for anomalous patterns and potential security threats. API Gateway request/response and S3 access logging provides more detail. AWS services adhere to a wide range of compliance certifications, which can help keep our auditor friends happy.

Acknowledgments

  • BGE-M3 Model : Beijing Academy of Artificial Intelligence
  • AWS S3 Vectors : Cost-effective vector database solution
  • Despicable Me Universe : Universal Pictures and Illumination Entertainment
  • Course Inspiration : “Generative and Agentic AI in Production” by Ed Donner https://www.udemy.com/course/generative-and-agentic-ai-in-production

Top comments (0)