Quadri Sheriff

Posted on Jan 11, 2024

Build a Vector Store with Go, PostgreSQL pgVector, Pytorch TorchServe, and MiniLm-L6-v2

Introduction

A vector store is a database that stores and queries vector embeddings(numerical representation of data like words, images, and videos that capture their semantic meaning). Vector stores also implement one or more Approximate Nearest Neighbor (ANN) algorithms, making it possible to retrieve semantically similar vector embeddings to an embedded query.

In this tutorial, you will learn how:

Generate text embeddings with the all-MiniLM-L6-v2 sentence transformer model served with Torchserve.
Convert a PostgreSQL database to a vector store with pgVector.
Store and query vector embeddings in the PostgreSQL vector store with Go.

The code samples in the tutorial are in Golang and Python. Basic knowledge of Golang and Python is needed to follow through with the tutorial.

Architecture

The application will allow a user to perform two main actions.

Generate and store text embeddings in the vector store.
Perform a semantic vector search.

To generate and add an embedding to the vector store:

User uploads the text to the Go backend server.
The server sends the text to the all-MiniLM-L6-v2 sentence transformer model to generate embeddings.
The server stores the generated embeddings in the PostgreSQL vector database.

To perform semantic vector search:

User uploads the text to the Go backend server,
The server sends the text to the all-MiniLM-L6-v2 sentence transformer model to generate embeddings.
The server sends the embedding to the PostgreSQL database to retrieve the five most similar embeddings to the generated embedding and returns it to the user.

Prerequisites

This tutorial assumes that you have:

Docker installed and running. Follow the instructions here to set up docker on your local computer.
gRPC and Go installed.
Postgresql with the pgVector extension installed and running. Follow the instructions here to install pgVector in Postgres.

Set up Embedding server with TorchServe and all-MiniLM-L6-v2

The first step in this tutorial is to serve the all-MiniLM-L6-v2 model with Torchserve. We will be using this perfect (https://github.com/clems4ever/torchserve-all-minilm-l6-v2/tree/main) torchserve-all-minilm-l6-v2 example provided by Mr. Clems4ever with some little changes for the example to work with gRPC.

Open the handler.py file and change the preprocess function code to the following -


       # unpack the data
       text = data[0].get('input')
       if data is None:
           data = data[0].get('data')
       texts = str(text, 'UTF-8')
       logger.info(texts)
       if texts is not None:
           logger.info('Text provided')
           return self.preprocess_text(texts)

       encodings = data.get('encodings')
       if encodings is not None:
           logger.info('Encodings provided')
           return transformers.BatchEncoding(data={k: torch.tensor(v) for k, v in encodings.items()})

       raise Exception("unsupported payload")

Build and run the Dockerfile to start the embedding server. You can now access the gRPC server at port :7071.

Set up Postgres as a vector database using pgVector

After creating an embedding server using all-MiniLM-L6-v2 with Torchserve, the next step is configuring Postgres to work as a vector database using pgVector. This tutorial assumes you've installed the pgVector extension in your Postgres database. If you haven't, see (https://github.com/pgvector/pgvector?tab=readme-ov-file#installation-notes) for instructions on how to add pgVector to Postgres.

Enable the pgVector extension in your database.

CREATE EXTENSION IF NOT EXISTS vector;

Create a new table to store your embeddings

CREATE TABLE IF NOT EXISTS embeddings (
  id SERIAL PRIMARY KEY,
  embedding vector,
  text text,
  created_at timestamptz DEFAULT now()
);

Set up Torchserve gRPC client

TorchServe provides the following gRPC APIs for interacting with the TorchServe server.

Ping: Gets the health status of the running server.
Predictions: Gets predictions from a served model.
StreamPredictions: Gets server-side streaming predictions from a served model.

We will be using the Predictions API to convert our text into embeddings for this tutorial.
Create an inference.proto file and add the following to the file.

syntax = "proto3";
package org.pytorch.serve.grpc.inference;

import "google/protobuf/empty.proto";

option java_multiple_files = true;
option go_package = "goserver/grpc";

message PredictionsRequest {
    // Name of model.
    string model_name = 1; //required

    // Version of model to run prediction on.
    string model_version = 2; //optional

    // input data for model prediction
    map<string, bytes> input = 3; //required
}

message PredictionResponse {
    // TorchServe health
    bytes prediction = 1;
}

message TorchServeHealthResponse {
    // TorchServe health
    string health = 1;
}

service InferenceAPIsService {
    rpc Ping(google.protobuf.Empty) returns (TorchServeHealthResponse) {}

    // Predictions entry point to get inference using default model version.
    rpc Predictions(PredictionsRequest) returns (PredictionResponse) {}
}

Generate the gRPC client code with the following command.

protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative <path/to/inference.proto>

Change <path/to/inference.proto> to the path to the created inference.proto file. An inference.pb.go and inference_grpc.pb.go file should appear in your gRPC folder.

Note - Do not tamper with the code in the two files. You can learn more about how to set up a gRPC client server for TorchServe in the following link - https://pytorch.org/serve/grpc_api.html

Setup Go server to generate and store embeddings

In this section, you will learn how to create a Go server that connects to the Torchserve serve via gRPC, convert a text to embeddings, and stores the generated embeddings in the Postgres vector database. You can find the codebase for this section in the following repository - https://github.com/Quadrisheriff/Go-Embedding-Server/tree/master/go-server.

Create a Go project with the following file structure -

go-server
  grpc 
  internal
     embedding.go
     handler.go
     logger.go
     repository.go
     service.go
  main.go

grpc - will contain the generated Torchserve grpc client code
internal - will contain the internal logic of the backend server
main.go - application’s entry point.

Move your generated gRPC files to the grpc folder. Add the following schema to the embedding.go file.

package internal
import (
    "time"
)

type Embedding struct {
    Embedding []float32 `json:"embedding"`
    Text      string    `json:"text"`
    CreatedAt time.Time `json:"time"`
    ID        string    `json:"id"`
}
type EmbeddingRequest struct {
    Text string `json:"text"`
}

Then, add the following to the logger.go file for our logging code.

package internal

import (
    "os"

    log "github.com/sirupsen/logrus"
)

type Logger struct {
}

func (logger *Logger) LoggerInit() {
    log.SetFormatter(&log.TextFormatter{
        FullTimestamp:   true,
        TimestampFormat: "2006-01-02 15:04:05.000",
    })
    log.SetOutput(os.Stdout)
    log.SetLevel(log.InfoLevel)

}

func (logger Logger) LogDebug(args ...interface{}) {
    log.Debug(args...)
}

func (logger Logger) LogInfo(args ...interface{}) {
    log.Info(args...)
}

func (logger Logger) LogWarn(args ...interface{}) {
    log.Warn(args...)
}

func (logger Logger) LogError(args ...interface{}) {
    log.Error(args...)
}

func (logger Logger) LogPanic(args ...interface{}) {
    log.Panic(args...)
}

Generate and store embeddings in the vector database

We will be using the Repository pattern to decouple our database logic from our application logic. First, add the following code to the repository.go file.

package internal

import (
    "context"
    "database/sql"
    "time"

    "github.com/google/uuid"
    "github.com/pgvector/pgvector-go"
    "github.com/pkg/errors"
)

type Repository struct {
    db *sql.DB
}

func NewRepository(db *sql.DB) *Repository {
    return &Repository{db: db}
}

// store embeddings in database
func (r *Repository) StoreEmbeddingsInDB(ctx context.Context, embedding Embedding) error {
    stmnt := "insert into embeddings (id, text, created_at, embedding) values ($1, $2, $3, $4)"

    _, err := r.db.ExecContext(ctx, stmnt, uuid.NewString(), embedding.Text, time.Now(), pgvector.NewVector(embedding.Embedding))
    if err != nil {
        return errors.Wrap(err, "cannot store embeddings in db currently")
    }

    return nil
}

In this code, we implemented the database repository code and created a function (StoreEmbeddingsInDB) that accepts the Embedding schema and stores it in our vector database.

Add the following code to the service.go file.

package internal

import (
    "context"
    embed_grpc "go-server/grpc"

    "github.com/goccy/go-json"
    "google.golang.org/grpc"
)

type Service struct {
    repository *Repository
    embedding_grpc  embed_grpc.InferenceAPIsServiceClient
    logger Logger
}

func NewService(repository *Repository, conn *grpc.ClientConn,  logger Logger) *Service {
    embedding_grpc := embed_grpc.NewInferenceAPIsServiceClient(conn)
    return &Service{repository: repository, embedding_grpc: embedding_grpc, logger: logger}
}

// generate and store embeddings
// @todo - check if text exceeds token limit before generating embedding
func (s *Service) GenerateAndStoreTextEmbeddings(ctx context.Context, text EmbeddingRequest) error {
    var text_embedding Embedding
    // generate embeddings
    s.logger.LogInfo("generating text embeddings...")
    results, err := s.PerformTextEmbedding(ctx, text.Text)
    if err != nil {
        s.logger.LogError("cannot generate text embeddings", err.Error())
        return err
    }

    embeds := results.GetPrediction()

    var embeddings [][]float32

    json.Unmarshal(embeds, &embeddings)
    text_embedding.Text = text.Text
    text_embedding.Embedding = embeddings[0]

    // store embeddings in db
    s.logger.LogInfo("storing text embeddings...")
    return s.StoreEmbeddings(ctx, text_embedding)
}

// perform text embedding
func (s *Service) PerformTextEmbedding(ctx context.Context, text string) (*embed_grpc.PredictionResponse, error) {
    x := map[string][]byte{"input": []byte(text)}
    input := &embed_grpc.PredictionsRequest{
        ModelName: "my_model",
        Input:     x,
    }

    res, err := s.embedding_grpc.Predictions(ctx, input)
    if err != nil {
        s.logger.LogError(err.Error())
        return &embed_grpc.PredictionResponse{}, err
    }

    return res, nil
}


// store embeddings in db
func (s *Service) StoreEmbeddings(ctx context.Context,embeddings Embedding) error {
    return s.repository.StoreEmbeddingsInDB(ctx, embeddings)
}

The application code has 3 functions for generating and storing embeddings.

PerformTextEmbedding - Takes a text string and generates the embedding with the Torchserve server using the all-MiniLM-L6-v2 model.
StoreEmbeddings - Stores the generated embeddings in our Postgres vector store.
GenerateAndStoreTextEmbeddings - Generates and stores text embedding by calling the PerformTextEmbedding and StoreEmbedding functions.

Perform semantic search

Semantic search is a type of search that involves understanding the intent and context to perform a more relevant search. For this tutorial, we will set up our server to retrieve the five most similar text to a text being searched from our Postgres vector store.

Add the following code to your repository.go file.

// retrieve top 5 most similar embedding from database
func (r *Repository) RetrieveFiveSimilarEmbedding(ctx context.Context, embedding []float32) ([]Embedding, error) {
    stmnt := "select id, text, created_at, embedding from content_embeddings ORDER BY embedding <-> $1 LIMIT 5"
    rows, err := r.db.QueryContext(ctx, stmnt, pgvector.NewVector(embedding))
    if err != nil {
        return []Embedding{}, errors.Wrap(err, "cannot retrieve embeddings from db at the moment")
    }
    defer rows.Close()

    var embeds []Embedding

    for rows.Next() {
        var embed Embedding

        err = rows.Scan(&embed.ID, &embed.Text, &embed.CreatedAt, &embed.Embedding)
        if err != nil {
            return []Embedding{}, errors.Wrap(err, "cannot retrieve embeddings from db at the moment")
        }

        embeds = append(embeds, embed)
    }

    return embeds, nil
}

The repository function takes in an embed and returns the five most similar embeds in the vector database. Add the following to your service.go file

// retrive five similar embeddings from db
func (s *Service) RetrieveFiveSimilarEmbeddingService(ctx context.Context, text string) ([]Embedding, error) {
    results, err := s.PerformTextEmbedding(ctx, text)
    if err != nil {
        s.logger.LogError(err.Error())
        return []Embedding{}, err
    }

    embeds := results.GetPrediction()

    var embeddings [][]float32

    json.Unmarshal(embeds, &embeddings)

    return s.repository.RetrieveFiveSimilarEmbedding(ctx, embeddings[0])
}

In the service code, we converted the text to embeds and then retrieved five similar embeds from the database with the RetrieveFiveSimilarEmbedding repository function.

Setup Gin server

For this project, we will be using two handlers.

EmbedTexts - Handler to add new texts to our vector store.
PerformSemanticSearch - Handler to perform semantic search.

Add the following to the handler.go file to create the two handlers.

package internal

import (
    "errors"

    "net/http"

    "github.com/gin-gonic/gin"
)

var (
    ErrBadRequest           = errors.New("error with api request body")
    ErrCannotPerformRequest = errors.New("error cannot perform request currently")
)

type Handler struct {
    service *Service
    logger  Logger
}

func NewHandler(service *Service, logger Logger) *Handler {
    return &Handler{service: service, logger: logger}
}

func (h *Handler) EmbedTexts(c *gin.Context) {
    var input EmbeddingRequest

    if err := c.ShouldBindJSON(&input); err != nil {
        h.logger.LogError(err.Error())
        c.JSON(http.StatusBadRequest, ErrBadRequest)
        return
    }

    err := h.service.GenerateAndStoreTextEmbeddings(c, input)
    if err != nil {
        h.logger.LogError(err.Error())
        c.JSON(http.StatusInternalServerError, ErrCannotPerformRequest)
        return
    }

    c.JSON(http.StatusOK, "text embedding successful")
}

func (h *Handler) PerformSemanticSearch(c *gin.Context) {
    var input EmbeddingRequest

    if err := c.ShouldBindJSON(&input); err != nil {
        h.logger.LogError(err.Error())
        c.JSON(http.StatusBadRequest, ErrBadRequest)
        return
    }

    response, err := h.service.RetrieveFiveSimilarEmbeddingService(c, input.Text)
    if err != nil {
        h.logger.LogError(err.Error())
        c.JSON(http.StatusInternalServerError, ErrCannotPerformRequest)
        return
    }

    c.JSON(http.StatusOK, response)
}

Set up the application's entry point

Add the following to your main.go file -

package main

import (
   "database/sql"
   "fmt"
   "go-server/internal"

   "github.com/gin-gonic/gin"
   _ "github.com/lib/pq"
   "google.golang.org/grpc"
)
const (
   DBHOST             = "<db_host_value>"
   DBUSER              = "<db_user_value>"
   DBPASSWORD            = "<db_password_value>"
   DBNAME              = "<db_name_value>"
   PYTORCHPORT         = "http://localhost:7070"
   PORT      = ":8090"
)

func main() {
   logger := internal.Logger{}
   logger.LoggerInit()
   db_url := fmt.Sprintf("host=%s port=%d user=%s "+"password=%s dbname=%s sslmode=disable", DBHOST, 5432, DBUSER, DBPASSWORD, DBNAME)

   // connect to database
   db, err := sql.Open("postgres", db_url)
   if err != nil {
       logger.LogError(err.Error())
       return
   }

   defer db.Close()

   // ping database
   err = db.Ping()
   if err != nil {
       logger.LogError(err.Error())
       return
   }

   logger.LogInfo("connected to database successfully.")

   // connect to pytorch grpc server
   conn, err := grpc.Dial(PYTORCHPORT, grpc.WithInsecure(), grpc.WithBlock())
   if err != nil {
       logger.LogError(err.Error())
       return
   }

   defer conn.Close()

   logger.LogInfo("connected to pytorch grpc server successfully.")

   var (
       repository = internal.NewRepository(db)
       service    = internal.NewService(repository, conn, logger)
       handler    = internal.NewHandler(service, logger)
   )

   // implement gin server
   r := gin.Default()

   r.POST("/embed", handler.EmbedTexts)
   r.POST("/search", handler.PerformSemanticSearch)

   r.Run(PORT)
}

Change the following values.

"<db_host_value>" - Your Postgres database host value.
"<db_username_value>" - Your Postgres database username.
"<db_password_value>" -The password of the username.
"<db_name_value>" - The name of your Postgres database.

Note - TorchServe listens on port 7070 for the gRPC Inference API, you can change the port by following the instructions in the following link -https://pytorch.org/serve/configuration.html.

Build and run the main.go file to start the embedding server.

To convert a text to embeddings and store it in your vector database, send a post request to http://localhost:8090/embed with the following request body

{
    "text" : <text_to_embed>
}

To perform a semantic vector search, send a post request to http://localhost:8090/search with the following request body.

{
    "text" : <search_text>
}

The request will return the five most similar texts to the text being searched from your vector store.

Conclusion

You can find the complete code for this project in the following GitHub repository - https://github.com/Quadrisheriff/Go-Embedding-Server/tree/master. I’ve also provided a docker.compose.yml file with the database, torchserve server, and backend for you to run easily.

Important - this tutorial is for educational purposes, do not use it in a production environment.

DEV Community