Akmal Chaudhri for SingleStore

Posted on May 5, 2023 • Updated on May 23, 2023

Quick tip: Using OpenAI with SingleStoreDB to store and query vectors of Fine Food Reviews

#singlestoredb #openai #vectordatabase #gpt3

Abstract

This short article will demonstrate another example of working with OpenAI and storing and querying vectors in SingleStoreDB. In a previous article, we saw that SingleStoreDB was adept at working with vectors through its vector functions, using a local installation. This time, however, we'll use SingleStoreDB in the cloud. In a future article, we'll use Docker.

Introduction

In a great article, the author shows how to create vectors for a clustering task. The vectors are eventually stored and queried using a vector database. However, we could store and query the vectors using SingleStoreDB instead. Since the original article code is copyrighted, we'll just provide the additional code required to use SingleStoreDB, instead of a vector database. There is, however, one code correction required in the original article. The following line:

matrix = np.array(df.ada_similarity.apply(eval).to_list())

needs to be changed to:

matrix = np.array(df.ada_similarity.to_list())

Create a SingleStoreDB Cloud account

A previous article showed the steps required to create a free SingleStoreDB Cloud account. We'll use OpenAI Demo Group as our Workspace Group Name and openai-demo as our Workspace Name. We'll make a note of our password and host name.

We'll use the SQL Editor to create a new database, as follows:

CREATE DATABASE IF NOT EXISTS openai_demo;

Fill out the Notebook

In a Jupyter notebook, at a minimum, we'll need to add the following library:

!pip install singlestoredb --quiet

Next, we'll create a connection to SingleStoreDB, as follows:

import singlestoredb as s2

conn = s2.connect("admin:<password>@<host>:3306/openai_demo")

cur = conn.cursor()

We'll replace the <password> and <host> with the values from our SingleStoreDB Cloud account.

We could store all the Dataframe data in SingleStoreDB, as shown in the previous article, but we'll limit it this time to several columns. Our code below will be similar to the previous article.

First, we'll create the table:

stmt = """
    CREATE TABLE IF NOT EXISTS fine_food_reviews (
        id INT PRIMARY KEY,
        combined TEXT,
        ada_similarity BLOB
    )
"""

cur.execute(stmt)

Next, we'll insert the data into the table, as follows:

# Prepare the statement
stmt = """
    INSERT INTO fine_food_reviews (
        id,
        combined,
        ada_similarity
    )
    VALUES (
        %s,
        %s,
        JSON_ARRAY_PACK_F64(%s)
    )
"""

# Convert the DataFrame to a NumPy record array
record_arr = df.to_records(index=True)

# Set the batch size
batch_size = 100

# Iterate over the rows of the record array in batches
for i in range(0, len(record_arr), batch_size):
    batch = record_arr[i:i+batch_size]
    values = [(
        row[0],
        row[6],
        str(row[8])
    ) for row in batch]
    cur.executemany(stmt, values)

We can also use JSON_ARRAY_PACK_F32 (32-bit, IEEE standard format), instead of JSON_ARRAY_PACK_F64 (64-bit, IEEE standard format).

We'll now create a Python function that will call OpenAI and query SingleStoreDB, returning the values and scores, as follows:

from typing import Tuple, List

def search_reviews(
    query: str,
    num_rows: int = 10
) -> Tuple[List[str], List[float]]:
    """Searches Fine Foods Reviews for the given query and returns the top `num_rows` results.

    Args:
        query: The query to search for.
        num_rows: The number of results to return.

    Returns:
        A list of the top `num_rows` results.
    """

    # Get the embedding of the query
    query_embedding_response = openai.Embedding.create(
        model='text-embedding-ada-002',
        input=query,
    )
    query_embedding = query_embedding_response["data"][0]["embedding"]

    # Create the SQL statement
    stmt = """
        SELECT
            combined,
            DOT_PRODUCT_F64(JSON_ARRAY_PACK_F64(%s), ada_similarity) AS score
        FROM fine_food_reviews
        ORDER BY score DESC
        LIMIT %s
    """

    # Execute the SQL statement
    cur.execute(stmt, [str(query_embedding), num_rows])

    # Get the results
    results = cur.fetchall()

    # Separate the results into two lists
    values = [row[0] for row in results]
    scores = [row[1] for row in results]

    # Return the results
    return values, scores

Now we are ready to query the data:

values, scores = search_reviews(
    query = "I have ordered these raisins",
    num_rows = 5
)

We can tabulate the results as follows:

from tabulate import tabulate

# Combine the values and scores lists into a list of tuples
# Each tuple contains a value and its corresponding score
table_data = list(zip([value[:50] for value in values], scores))

# Add a rank column to the table data
table_data = [(i + 1,) + data for i, data in enumerate(table_data)]

# Create the table
table = tabulate(table_data, headers=["Rank", "Combined", "Score"])

# Print the table
print(table)

The output should be similar to the following:

Rank    Combined                                               Score
------  --------------------------------------------------  --------
     1  Title: Delicious!; Content: I have ordered these r  0.880185
     2  Title: Just what I expected!; Content: I bought th  0.838868
     3  Title: its delicious...; Content: I was surprised   0.820137
     4  Title: Perfect Gift; Content: I got these to give   0.814143
     5  Title: a great product and deal; Content: I looked  0.811726

A quick visual inspection shows that the results are similar to those in the original article.

Summary

In this short article, we have seen another example where SingleStoreDB can store and query vectors. In future articles, we'll look at further examples and scenarios where we might wish to perform additional queries on the database beyond just querying the vector data.

DEV Community

Quick tip: Using OpenAI with SingleStoreDB to store and query vectors of Fine Food Reviews

Abstract

Introduction

Create a SingleStoreDB Cloud account

Fill out the Notebook

Summary

Top comments (0)

Read next

A História do Linux [0/7]: As Primeiras Linguagens de Programação e a Origem dos Primeiros Sistemas Operacionais

Easy to install Python -> Mac | Fix $PATH

Lidando com valores negativos no JS

Clinical Management System