Photo by Jeff Kingma on Unsplash
Introduction
In the AI era, a semantic search is an important tool to have under the belt. There are many options for storing embeddings, and there is a place for even more on the market. The recent release of the S3 Vector Bucket looks interesting as a fully serverless option for storing and querying vectors.
Goal
In this blog post, I will build an HTTP endpoint to query books by their descriptions. The request will also include an optional category
field, so callers can filter only results from the given genre. I would like to check if the S3 Vector can offer an acceptable latency to be used in a user-facing service.
Architecture
Creating a bucket
CODE IS AVAILABLE IN THIS REPO
As S3 Vector storage is in preview, there is no CloudFormation support yet. A bucket and index can be created using the console. I followed the user guide and created a new bucket. Then I added the books
index inside.
When creating an index, we need to decide upfront which type of distance metric we are going to use - Cosine or Euclidean. They utilize a different logic to look for the nearest neighbors. It is important, as we can't change these settings later on. In real life, we would probably create two buckets and test search on the data sample to find out which approach is better for our case. For this blog, I picked cosine
.
As per vectors' dimensions, I stuck to 1024
, as I planned to use Titan Text V2 for generating embeddings.
Populate data
As a datasource, I took books dataset on Kaggle. I used the first 1k rows.
AWS provides a handy CLI tool to populate vectors; however, I needed to build integrations for my lambda handler anyway, so I created the uploader script in Rust
When prototyping with Rust
, I like to use unwrap
everywhere and implement proper error handling later on. You were warned.
Let's start by defining domain types:
// rust_app/common/src/models.rs
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Serialize, Debug, Clone)]
pub struct Book {
pub isbn13: u128,
pub title: String,
pub authors: String,
pub categories: String,
pub description: String,
}
#[derive(Debug, Serialize)]
pub struct BookWithEmbeddings {
pub item: Book,
pub embeddings: Vec<f32>
}
In the uploader script, I read the csv
file stored locally, and I get embeddings from Titan
// rust_app/vector_updater/src/main.rs
//...
let mut reader = csv::Reader::from_path("books2.csv").unwrap();
let books: Vec<Book> = reader.deserialize()
.map(|res| res.unwrap())
.collect();
println!("extracted {} books", books.len());
// Initialize AWS config
let config = aws_config::from_env()
.region("us-east-1")
.load()
.await;
let embeddings_svc = EmbeddingsService::new(&config).await;
let books_for_upload = books.iter()
.filter(|b| !b.description.is_empty())
.cloned()
.collect::<Vec<_>>();
println!("getting embeddings for {} books", books_for_upload.len());
let embeddings = books_for_upload
.iter()
.map(|book| get_book_with_embeddings(&embeddings_svc, book))
.collect::<Vec<_>>();
let books_with_embeddings = stream::iter(embeddings)
.buffer_unordered(MAX_CONCURENCY)
.collect::<Vec<_>>()
.await;
println!("{:?}", books_with_embeddings.len());
Embeddings are taken from Bedrock APIs using the following method
// rust_app/common/src/embeddings_svc.rs
pub async fn get_embeddings(&self, query: String) -> TitanResponse {
let body = json!({
"inputText": query
});
let body_blob = Blob::new(serde_json::to_vec(&body).unwrap());
let response = self.client
.invoke_model()
.model_id("amazon.titan-embed-text-v2:0")
.body(body_blob)
.send()
.await
.unwrap();
let bytes = response.body().clone().into_inner();
let result: TitanResponse = serde_json::from_slice(&bytes).unwrap();
result
}
And now, once I have the books with generated embeddings, I upload data to S3 Vector Bucket
// rust_app/vector_updater/src/main.rs
let s3_vectors_client = aws_sdk_s3vectors::Client::new(&config);
let put_vectors_calls = books_with_embeddings
.iter()
.map(|be| upload_vectors(&s3_vectors_client, be))
.collect::<Vec<_>>();
println!("Uploading {} records to S3", put_vectors_calls.len());
let _ = stream::iter(put_vectors_calls)
.buffer_unordered(MAX_CONCURENCY)
.collect::<Vec<_>>()
.await;
The upload_vectors
method
// rust_app/vector_updater/src/main.rs
pub async fn upload_vectors(s3vectors_client: &aws_sdk_s3vectors::Client, book_with_embeddings: &BookWithEmbeddings) -> () {
let mut metadata = HashMap::new();
metadata.insert("source_text".to_string(), Document::String(book_with_embeddings.item.description.clone()));
metadata.insert("categories".to_string(), Document::String(book_with_embeddings.item.categories.clone()));
metadata.insert("title".to_string(), Document::String(book_with_embeddings.item.title.clone()));
metadata.insert("authors".to_string(), Document::String(book_with_embeddings.item.authors.clone()));
let vector_input =
PutInputVector::builder()
.key(book_with_embeddings.item.isbn13.to_string())
.data(VectorData::Float32(book_with_embeddings.embeddings.clone()))
.metadata(Document::Object(metadata))
.build()
.unwrap();
let _result = s3vectors_client
.put_vectors()
.vector_bucket_name(S3_NAME)
.index_name(INDEX_NAME)
.vectors(vector_input)
.send()
.await
.unwrap();
tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;
}
Searchable metadata is created from the Book
details, and the isbn13
is used as the object key. I added the sleep statement, so I didn't reach AWS rate limits when sending items.
Once items were uploaded, I validated the process using s3vectors-embed
tool
s3vectors-embed query \
--vector-bucket-name s3-vector-books \
--index-name books \
--model-id amazon.titan-embed-text-v2:0 \
--query-input "space exploration" \
--k 10S
Query data with Lambda
The lambda handler is pretty simple
use aws_config::BehaviorVersion;
use common::{embeddings_svc::EmbeddingsService, s3Vector_svc::S3VectorsService};
use lambda_http::{run, service_fn, Error, IntoResponse, Request, RequestPayloadExt, Response};
use serde::Deserialize;
#[derive(Deserialize)]
struct Query {
query: String,
category: Option<String>
}
async fn function_handler(embeddings_svc: &EmbeddingsService, s3_vector_svc: &S3VectorsService, event: Request) -> Result<impl IntoResponse, Error> {
let query = event.payload::<Query>()?.unwrap();
let embeddings = embeddings_svc.get_embeddings(query.query.clone()).await;
let books = s3_vector_svc.query_books(embeddings.embedding, query.category).await;
let resp = Response::builder()
.status(200)
.header("content-type", "application/json")
.body(serde_json::to_string(&books).unwrap())
.map_err(Box::new)?;
Ok(resp)
}
#[tokio::main]
async fn main() -> Result<(), Error> {
tracing_subscriber::fmt()
.with_max_level(tracing::Level::INFO)
// disable printing the name of the module in every log line.
.with_target(false)
// disabling time is handy because CloudWatch will add the ingestion time.
.without_time()
.init();
let config = aws_config::defaults(BehaviorVersion::latest()).region("us-east-1").load().await;
let embeddings_svc = EmbeddingsService::new(&config).await;
let s3_vector_svc = S3VectorsService::new(&config).await;
run(service_fn(|ev| {
function_handler(&embeddings_svc, &s3_vector_svc, ev)
})).await
}
I expect to receive the query
and the optional category
to filter results. S3 Vectors provide basic metadata filtering functionalities. It might be super useful.
To query S3 I get the embeddings from a Bedrock
and I use them to the s3_vector call
.
// rust_app/common/src/s3Vector_svc.rs
pub async fn query_books(&self, vector: Vec<f32>, category: Option<String>) -> Vec<Book> {
let data = VectorData::Float32(vector);
let category_filter = category
.map(|category| {
let mut category_json = HashMap::new();
category_json.insert("categories".to_string(), Document::String(category));
Document::Object(category_json)
});
let request = self.client
.query_vectors()
.vector_bucket_name(S3_NAME)
.index_name(INDEX_NAME)
.top_k(5)
.query_vector(data)
.set_filter(category_filter)
.return_metadata(true);
let result = request
.send()
.await
.unwrap();
let data = result
.vectors()
.iter()
.map(|vector| {
let id = vector.key().parse().unwrap();
let metadata = vector.metadata().unwrap().as_object().unwrap();
let source_text = metadata.get("source_text").unwrap().as_string().unwrap().to_string();;
let title = metadata.get("title").unwrap().as_string().unwrap().to_string();
let authors = metadata.get("authors").unwrap().as_string().unwrap().to_string();
let categories = metadata.get("categories").unwrap().as_string().unwrap().to_string();
Book { isbn13: id, title, authors, categories, description: source_text }
})
.collect::<Vec<_>>();
println!("Result: {:?}", result);
data
}
As you see, there are unwrap()
everywhere. I truly believe it is the way to prototype in Rust.
The only interesting thing in this code is that Rust s3_vectorsSDK
uses a generic Document
type from the Smithy
.
For local testing, I use cargo lambda
tool. Once it works locally, I deploy it to AWS.
Deploy and test
I used a generic SAM template for the Rut project. The only thing is that it didn't work for me with the workspace
in the Rust project. It is strange because I am using AWS CDK in my projects, and I face no issues. Anyway, I will build the project in a separate step and point the handler definition in SAM to the binary.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
s3_vector
Sample SAM Template for s3_vector
Globals:
Function:
Timeout: 3
MemorySize: 128
Resources:
FindBooksFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: ./rust_app/target/lambda/handler
Handler: bootstrap
Runtime: provided.al2023
Architectures:
- x86_64
Policies:
- Version: "2012-10-17"
Statement:
- Effect: Allow
Action: bedrock:InvokeModel
Resource: "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0"
- Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- s3vectors:QueryVectors
- s3vectors:GetVectors
Resource: "arn:aws:s3vectors:us-east-1:${AWS::AccountId}:bucket/s3-vector-books/index/books"
Events:
GetBooks:
Type: Api
Properties:
Path: /books
Method: post
Outputs:
FindBooksApi:
Description: "API Gateway endpoint URL for Prod stage for find books"
Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/books"
I was very curious about how much latency I should expect. Turned out that for the cold start, the Lambda is finishing under one second.
The hot one takes around 300ms. Adding category
doesn't really impact the query.
To be honest, those numbers are quite amazing for this kind of serverless workload.
Summary
CODE IS AVAILABLE IN THIS REPO
The most amazing thing about S3 Vector Buckets is that they don't require any computing. You don't need clusters or instances to query vectors. It allows a fully serverless approach.
When combined with AWS Lambda written in Rust, the overall overhead of performing semantic search becomes relatively small. The documentation promises sub-second query performance, and in my case, I was able to complete the whole request under one second when dealing with a cold start.
There are, of course, some trade-offs. Even though the metadata filtering is pretty handy, it doesn't allow more complex queries. If the similarity search needs to be done in the wider context, S3 Vector buckets might not be enough.
Anyway, it is interesting to see what direction this project takes before becoming generally available.
Top comments (0)