Shloka

Posted on Jan 9

Vector Databases (with OpenAI and Supabase) - Part 1

#ai #database #openai

It’s the early 2010s, Apple Music (iTunes) is strongly dominating the market. And services like Pandora, Rhapsody, and Last.fm are all fighting for user attention.

The music streaming space is already crowded, and competing with a tech giant like Apple seems almost impossible.

But then, a tiny European startup emerges.

In just five years, it goes from zero to tens of millions of users, eventually becoming the number one music streaming platform in the world.

Today, on the same platform, created by the European startup, when you click on shuffle, it recommends the exact beats you’re looking for, and you love it. Somehow, it just gets your taste.

And towards the end of every year, you receive a wrapped that tells you you’ve spent 60,000 minutes listening to music.

Yep. The “tiny European startup” is… Spotify!!!

But how did Spotify do that? How did it beat a tech giant?
How did it recommend music you didn’t even know you wanted?

Well, one of the secret sauces behind Spotify’s success is vector databases.

Now, before going into details about vector databases, though, I want to talk about relational databases. So that we know where we were and where we are at now.

Relational Database

I want you to think for a second about, how you would store the song APT by Bruno mars and Rose in a relational database?

Of course ,you would store the audio file itself, under the name apt.mp3., along with metadata such as the artist (Bruno Mars, Rose), the release date, the genre, and maybe a few tags like Pop, R&B, or Dance.

And how would you search for the song APT in the database? You would perform a lexical search. It might look something like this:

WHERE artist = 'Bruno Mars'

WHERE title ILIKE '%APT%'

This works fine, as long as you know exactly what you’re looking for.

BUT what if you want to search for “late-night city drive vibes” or “smooth, confident, feel-good energy”?

Your query will fail!

WHY?!

Because your database only understands hard-coded labels like Pop or R&B. It has no way of knowing that APT feels like a "feel-good energy" song.

Relational databases don’t understand similarity beyond what you explicitly define in columns, tags, or foreign keys.

This limitation is called the semantic gap.

So, how is this solved? Well, with the help of a vector embedings.

Vector Embeddings

Let’s start with an example.

Consider some popular the songs you may have come across on the radio or tik tok. Like, fate of ophelia, manchild, end of beginning, etc.

Now imagine plotting these songs on a graph based on two characteristics:

Danceability (x-axis)
Energy (y-axis)

After plotting them, each song gets a pair of numerical values, its position on the graph.

Song Title	Artist	Danceability (X)	Energy (Y)
The Fate of Ophelia	Taylor Swift	0.42	0.38
Golden	HUNTR/X feat. EJAE	0.81	0.76
Ordinary	Alex Warren	0.35	0.44
Manchild	Sabrina Carpenter	0.78	0.68
Luther	Kendrick Lamar & SZA	0.65	0.52
End of Beginning	Djo	0.68	0.45

So now, every song is represented as a pair of numbers i.e [danceability, energy]

For example:

Manchild → [0.78, 0.68]
Ordinary → [0.35, 0.44]

That pair of numbers is a vector and the process of converting a song, with something abstract like danceability and energy into numbers is called an embedding.

Putting it together

Embeddings are numerical representations of data
Vectors are those numbers organized into arrays (like [x, y], or even hundreds of dimensions)

So a vector embedding is simply:

Storing real-world data in the form of numbers, arranged as vectors, so that similar things end up close to each other in space.

In this space, songs with similar energy and danceability appear near each other, making it possible to recommend music based on "vibe", not just tags or keywords.

An important thing to note is though, in reality, Spotify doesn’t use just two dimensions like energy and danceability.

A real embedding might have hundreds or even thousands of dimensions, capturing tempo, rhythm patterns, vocal style, instrumentation, and more.

But the idea remains the same.

Creating Embedding

Whenever math is involved, I get a little scared

Plotting in 2D, is fine 3D is manageable but, thousands of dimensions?!! That’s where my brain is like "Oh hell no"

Thankfully, we don’t actually need to visualize or manually compute vectors. There are embedding models that do all the heavy lifting for us.

Some common embedding models are:

CLIP for Images (and image–text similarity)
GloVe for Text
Wav2Vec for Audio

Frameworks like Hugging Face provide many pre-trained embedding models for text, images, audio, and more.

But no matter the data type, the process remains the same:

Take your data (text, image, audio, etc.)
Pass it through an embedding model
Get back a vector embedding (an array of numbers)

Example: Text Embeddings with OpenAI

This code snippet, in JavaScript, from OpenAI converts text into a vector embedding.
(Don't get scared of it, we will get into the code in the next blog.)

You can use other languages as well.

Now, as you can see, the text “Your text string goes here” is converted into a vector with 1536 dimensions.

Remember, no matter how long or short your text is, the output vector always has the same fixed size of 1536.

Vector Databases

Once you’ve created vector embeddings, the next step is storing them in a vector database.

When data is stored as embeddings, similar vectors naturally form clusters in vector space. These clusters are then indexed. The process is known as vector indexing. Indexing allows the database to retrieve similar vectors quickly and efficiently, even as the dataset grows.

At a high level, a vector database does three things: