szymon-szym for AWS Community Builders

Posted on Aug 28

Serverless Semantic Search: A Hands-On Guide to AWS S3 Vector Buckets with Rust

#rust #serverless #aws

Introduction

In the AI era, a semantic search is an important tool to have under the belt. There are many options for storing embeddings, and there is a place for even more on the market. The recent release of the S3 Vector Bucket looks interesting as a fully serverless option for storing and querying vectors.

Goal

In this blog post, I will build an HTTP endpoint to query books by their descriptions. The request will also include an optional category field, so callers can filter only results from the given genre. I would like to check if the S3 Vector can offer an acceptable latency to be used in a user-facing service.

Architecture

Creating a bucket

CODE IS AVAILABLE IN THIS REPO

As S3 Vector storage is in preview, there is no CloudFormation support yet. A bucket and index can be created using the console. I followed the user guide and created a new bucket. Then I added the books index inside.

When creating an index, we need to decide upfront which type of distance metric we are going to use - Cosine or Euclidean. They utilize a different logic to look for the nearest neighbors. It is important, as we can't change these settings later on. In real life, we would probably create two buckets and test search on the data sample to find out which approach is better for our case. For this blog, I picked cosine.

As per vectors' dimensions, I stuck to 1024, as I planned to use Titan Text V2 for generating embeddings.

Populate data

As a datasource, I took books dataset on Kaggle. I used the first 1k rows.

AWS provides a handy CLI tool to populate vectors; however, I needed to build integrations for my lambda handler anyway, so I created the uploader script in Rust

When prototyping with Rust, I like to use unwrap everywhere and implement proper error handling later on. You were warned.

Let's start by defining domain types:

// rust_app/common/src/models.rs

use serde::{Deserialize, Serialize};

#[derive(Deserialize, Serialize, Debug, Clone)]
    pub struct Book {
        pub isbn13: u128,
        pub title: String,
        pub authors: String,
        pub categories: String,
        pub description: String,
    }

#[derive(Debug, Serialize)]
pub struct BookWithEmbeddings {
    pub item: Book,
    pub embeddings: Vec<f32>
}

In the uploader script, I read the csv file stored locally, and I get embeddings from Titan

// rust_app/vector_updater/src/main.rs

//...

let mut reader = csv::Reader::from_path("books2.csv").unwrap();

    let books: Vec<Book> = reader.deserialize()
        .map(|res| res.unwrap())
        .collect();

    println!("extracted {} books", books.len());

    // Initialize AWS config

    let config = aws_config::from_env()
        .region("us-east-1")
        .load()
        .await;

    let embeddings_svc = EmbeddingsService::new(&config).await;

    let books_for_upload = books.iter()
        .filter(|b| !b.description.is_empty())
        .cloned()
        .collect::<Vec<_>>();

    println!("getting embeddings for {} books", books_for_upload.len());


    let embeddings = books_for_upload
        .iter()
        .map(|book| get_book_with_embeddings(&embeddings_svc, book))
        .collect::<Vec<_>>();

    let books_with_embeddings = stream::iter(embeddings)
        .buffer_unordered(MAX_CONCURENCY)
        .collect::<Vec<_>>()
        .await;

    println!("{:?}", books_with_embeddings.len());

Embeddings are taken from Bedrock APIs using the following method

// rust_app/common/src/embeddings_svc.rs

pub async fn get_embeddings(&self, query: String) -> TitanResponse {

        let body = json!({
            "inputText": query
        });

        let body_blob = Blob::new(serde_json::to_vec(&body).unwrap());

        let response = self.client
            .invoke_model()
            .model_id("amazon.titan-embed-text-v2:0")
            .body(body_blob)
            .send()
            .await
            .unwrap();

        let bytes = response.body().clone().into_inner();

        let result: TitanResponse = serde_json::from_slice(&bytes).unwrap();

        result

    }

And now, once I have the books with generated embeddings, I upload data to S3 Vector Bucket

// rust_app/vector_updater/src/main.rs
let s3_vectors_client = aws_sdk_s3vectors::Client::new(&config);

    let put_vectors_calls = books_with_embeddings
        .iter()
        .map(|be| upload_vectors(&s3_vectors_client, be))
        .collect::<Vec<_>>();

    println!("Uploading {} records to S3", put_vectors_calls.len());

    let _ = stream::iter(put_vectors_calls)
        .buffer_unordered(MAX_CONCURENCY)
        .collect::<Vec<_>>()
        .await;

The upload_vectors method

// rust_app/vector_updater/src/main.rs

pub async fn upload_vectors(s3vectors_client: &aws_sdk_s3vectors::Client, book_with_embeddings: &BookWithEmbeddings) -> () {

    let mut metadata = HashMap::new();

    metadata.insert("source_text".to_string(), Document::String(book_with_embeddings.item.description.clone()));
    metadata.insert("categories".to_string(), Document::String(book_with_embeddings.item.categories.clone()));
    metadata.insert("title".to_string(), Document::String(book_with_embeddings.item.title.clone()));
    metadata.insert("authors".to_string(), Document::String(book_with_embeddings.item.authors.clone()));

    let vector_input =  
            PutInputVector::builder()
                .key(book_with_embeddings.item.isbn13.to_string())
                .data(VectorData::Float32(book_with_embeddings.embeddings.clone()))
                .metadata(Document::Object(metadata))
                .build()
                .unwrap();

    let _result = s3vectors_client
        .put_vectors()
        .vector_bucket_name(S3_NAME)
        .index_name(INDEX_NAME)
        .vectors(vector_input)
        .send()
        .await
        .unwrap();

    tokio::time::sleep(tokio::time::Duration::from_secs(1)).await;


}

Searchable metadata is created from the Book details, and the isbn13 is used as the object key. I added the sleep statement, so I didn't reach AWS rate limits when sending items.

Once items were uploaded, I validated the process using s3vectors-embed tool

s3vectors-embed query \
  --vector-bucket-name s3-vector-books \
  --index-name books \   
  --model-id amazon.titan-embed-text-v2:0 \
  --query-input "space exploration" \
  --k 10S

Query data with Lambda

The lambda handler is pretty simple

use aws_config::BehaviorVersion;
use common::{embeddings_svc::EmbeddingsService, s3Vector_svc::S3VectorsService};
use lambda_http::{run, service_fn, Error, IntoResponse, Request, RequestPayloadExt, Response};

use serde::Deserialize;

#[derive(Deserialize)]
struct Query {
    query: String,
    category: Option<String>
}

async fn function_handler(embeddings_svc: &EmbeddingsService, s3_vector_svc: &S3VectorsService, event: Request) -> Result<impl IntoResponse, Error> {

    let query = event.payload::<Query>()?.unwrap();

    let embeddings = embeddings_svc.get_embeddings(query.query.clone()).await;

    let books = s3_vector_svc.query_books(embeddings.embedding, query.category).await;


    let resp = Response::builder()
        .status(200)
        .header("content-type", "application/json")
        .body(serde_json::to_string(&books).unwrap())
        .map_err(Box::new)?;

    Ok(resp)
}

#[tokio::main]
async fn main() -> Result<(), Error> {
    tracing_subscriber::fmt()
        .with_max_level(tracing::Level::INFO)
        // disable printing the name of the module in every log line.
        .with_target(false)
        // disabling time is handy because CloudWatch will add the ingestion time.
        .without_time()
        .init();

    let config = aws_config::defaults(BehaviorVersion::latest()).region("us-east-1").load().await;

    let embeddings_svc = EmbeddingsService::new(&config).await;

    let s3_vector_svc = S3VectorsService::new(&config).await;

    run(service_fn(|ev| {
        function_handler(&embeddings_svc, &s3_vector_svc, ev)
    })).await
}

I expect to receive the query and the optional category to filter results. S3 Vectors provide basic metadata filtering functionalities. It might be super useful.

To query S3 I get the embeddings from a Bedrock and I use them to the s3_vector call.

// rust_app/common/src/s3Vector_svc.rs
pub async fn query_books(&self, vector: Vec<f32>, category: Option<String>) -> Vec<Book> {

        let data = VectorData::Float32(vector);

        let category_filter = category
            .map(|category| {
                let mut category_json = HashMap::new();
                category_json.insert("categories".to_string(), Document::String(category));
                Document::Object(category_json)
            });

        let request = self.client
            .query_vectors()
            .vector_bucket_name(S3_NAME)
            .index_name(INDEX_NAME)
            .top_k(5)
            .query_vector(data)
            .set_filter(category_filter)
            .return_metadata(true);

        let result = request
            .send()
            .await
            .unwrap();

        let data = result
            .vectors()
            .iter()
            .map(|vector| {
                let id = vector.key().parse().unwrap();
                let metadata = vector.metadata().unwrap().as_object().unwrap();
                let source_text = metadata.get("source_text").unwrap().as_string().unwrap().to_string();;
                let title = metadata.get("title").unwrap().as_string().unwrap().to_string();
                let authors = metadata.get("authors").unwrap().as_string().unwrap().to_string();
                let categories = metadata.get("categories").unwrap().as_string().unwrap().to_string();

                Book { isbn13: id, title, authors, categories, description: source_text }
            })
            .collect::<Vec<_>>(); 

        println!("Result: {:?}", result);

        data
    }

As you see, there are unwrap() everywhere. I truly believe it is the way to prototype in Rust.
The only interesting thing in this code is that Rust s3_vectorsSDK uses a generic Document type from the Smithy.

For local testing, I use cargo lambda tool. Once it works locally, I deploy it to AWS.

Deploy and test

I used a generic SAM template for the Rut project. The only thing is that it didn't work for me with the workspace in the Rust project. It is strange because I am using AWS CDK in my projects, and I face no issues. Anyway, I will build the project in a separate step and point the handler definition in SAM to the binary.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  s3_vector

  Sample SAM Template for s3_vector

Globals:
  Function:
    Timeout: 3
    MemorySize: 128


Resources:
  FindBooksFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./rust_app/target/lambda/handler
      Handler: bootstrap
      Runtime: provided.al2023
      Architectures:
        - x86_64
      Policies: 
        - Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Action: bedrock:InvokeModel
              Resource: "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0"
        - Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Action: 
                - s3vectors:QueryVectors
                - s3vectors:GetVectors
              Resource: "arn:aws:s3vectors:us-east-1:${AWS::AccountId}:bucket/s3-vector-books/index/books"
      Events:
        GetBooks:
          Type: Api 
          Properties:
            Path: /books
            Method: post

Outputs:
  FindBooksApi:
    Description: "API Gateway endpoint URL for Prod stage for find books"
    Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/books"

I was very curious about how much latency I should expect. Turned out that for the cold start, the Lambda is finishing under one second.

The hot one takes around 300ms. Adding category doesn't really impact the query.

To be honest, those numbers are quite amazing for this kind of serverless workload.

Summary

CODE IS AVAILABLE IN THIS REPO

The most amazing thing about S3 Vector Buckets is that they don't require any computing. You don't need clusters or instances to query vectors. It allows a fully serverless approach.
When combined with AWS Lambda written in Rust, the overall overhead of performing semantic search becomes relatively small. The documentation promises sub-second query performance, and in my case, I was able to complete the whole request under one second when dealing with a cold start.

There are, of course, some trade-offs. Even though the metadata filtering is pretty handy, it doesn't allow more complex queries. If the similarity search needs to be done in the wider context, S3 Vector buckets might not be enough.

Anyway, it is interesting to see what direction this project takes before becoming generally available.

DEV Community