DEV Community

Akmal Chaudhri for SingleStore

Posted on • Updated on

Quick tip: Using the new VECTOR data type and Infix Operators in SingleStoreDB

Abstract

The new SingleStoreDB release v8.5 provides a number of new vector features. In this short article, we'll evaluate some of these new features, such as the new VECTOR data type and the Infix Operators. We'll test these features using the Iris data set.

Create a SingleStoreDB Cloud account

A previous article showed the steps to create a free SingleStoreDB Cloud account. We'll use the following settings:

  • Workspace Group Name: Iris Demo Group
  • Cloud Provider: AWS
  • Region: US East 1 (N. Virginia)
  • Workspace Name: iris-demo
  • Size: S-00

Create a Database and Table

In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this iris_db, as follows:

CREATE DATABASE IF NOT EXISTS iris_db;
Enter fullscreen mode Exit fullscreen mode

We'll also create the iris table using the new VECTOR data type, as follows:

USE iris_db;

CREATE TABLE IF NOT EXISTS iris (
     vector VECTOR(4),
     species VARCHAR(20)
);
Enter fullscreen mode Exit fullscreen mode

The Iris data set contains 150 rows of data for three different species of flowers. Each flower has four columns of data: sepal_length, sepal_width, petal_length and petal_width. We can store these four column values together, as follows:

INSERT INTO iris VALUES
('[5.1,3.5,1.4,0.2]','Iris-setosa'),
('[4.9,3,1.4,0.2]','Iris-setosa'),
('[4.7,3.2,1.3,0.2]','Iris-setosa'),
('[4.6,3.1,1.5,0.2]','Iris-setosa'),
('[5,3.6,1.4,0.2]','Iris-setosa'),
...
('[6.7,3,5.2,2.3]','Iris-virginica'),
('[6.3,2.5,5,1.9]','Iris-virginica'),
('[6.5,3,5.2,2]','Iris-virginica'),
('[6.2,3.4,5.4,2.3]','Iris-virginica'),
('[5.9,3,5.1,1.8]','Iris-virginica');
Enter fullscreen mode Exit fullscreen mode

Only the first five and last five rows are shown above. The complete INSERT code listing is available in a GitHub Gist.

Query 1

We'll use the queries described in a previous article and replace DOT_PRODUCT and EUCLIDEAN_DISTANCE with the new Infix Operators <*> and <->, respectively.

First, let's try a query where we want to find the name of the flower species using an exact match for the sepal_length, sepal_width, petal_length and petal_width. We'll use the values [5.9,3,5.1,1.8] from the last row of the iris table, shown above.

Dot Product

SELECT species
FROM iris
WHERE vector <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4)) = vector <*> vector;
Enter fullscreen mode Exit fullscreen mode

Euclidean Distance

SELECT species
FROM iris
WHERE vector <-> ('[5.9,3,5.1,1.8]' :> VECTOR(4)) = 0;
Enter fullscreen mode Exit fullscreen mode

Cosine Similarity

SELECT species
FROM iris
WHERE (vector <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4))) / SQRT((vector <*> vector) * (('[5.9,3,5.1,1.8]' :> VECTOR(4)) <*> ('[5.9,3,5.1,1.8]' :> VECTOR(4)))) = 1;
Enter fullscreen mode Exit fullscreen mode

The result in each case should be:

+----------------+
| species        |
+----------------+
| Iris-virginica |
+----------------+
Enter fullscreen mode Exit fullscreen mode

Query 2

Now, let's use some fictitious data values [5.2,3.6,1.5,0.3] to make a prediction.

Dot Product

SELECT species
FROM iris
ORDER BY vector <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4)) DESC
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result should be:

+----------------+
| species        |
+----------------+
| Iris-virginica |
+----------------+
Enter fullscreen mode Exit fullscreen mode

Euclidean Distance

SELECT species
FROM iris
ORDER BY vector <-> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4))
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result should be:

+-------------+
| species     |
+-------------+
| Iris-setosa |
+-------------+
Enter fullscreen mode Exit fullscreen mode

Cosine Similarity

SELECT species
FROM iris
ORDER BY (vector <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4))) / SQRT((vector <*> vector) * (('[5.2,3.6,1.5,0.3]' :> VECTOR(4)) <*> ('[5.2,3.6,1.5,0.3]' :> VECTOR(4)))) DESC
LIMIT 1;
Enter fullscreen mode Exit fullscreen mode

The result should be:

+-------------+
| species     |
+-------------+
| Iris-setosa |
+-------------+
Enter fullscreen mode Exit fullscreen mode

Summary

In this short article, we've seen several examples of how to use SingleStoreDB's new VECTOR data type and the new Infix Operators <*> and <-> for DOT_PRODUCT and EUCLIDEAN_DISTANCE, respectively. We've also seen how we can easily represent Cosine Similarity by combining <*> and SQRT.

We've used the new features to find an exact match and made predictions based on new, previously unknown, values.

Top comments (0)