DEV Community

Cover image for What are vector embeddings?
Kirk Kirkconnell for Momento

Posted on • Updated on

What are vector embeddings?

An embedding is a representation of an object (words, sentences, images, etc.) summarized as an array of numbers (a vector - for you mathematicians out there). Together, these numbers capture the semantic information about the analyzed items and their notable features. By summarizing the object’s bulky representation in numeric form, using a model like GPT4 or others, we enable efficient searches across vast volumes of information. These searches can inform and guide us - give us hints, context, and relationships. But what are they? What do they look like?

vector1 = [ 34.5, 832.89, 21.41 ]
vector2 = [ 98.45, 145.03, 21.42 ]
Enter fullscreen mode Exit fullscreen mode

These are two vectors, each with three embeddings. A vector is an array of floating-point numbers. Embeddings are numbers. Even with this limited example, we can see quickly that one embedding in each array is close numerically to another. That means these vectors are related. Now imagine this, but each vector has hundreds of embeddings, and there are hundreds of millions of vectors.

That’s all well and good, but let’s think about this more with a concrete example.

Image description

Say you had an image of a man from a fashion shoot on location in Casablanca. When inserting the vector embeddings generated from that image into a vector index, this image is immediately related to the city of Casablanca as well to photoshoots because the floating-point numbers are close to each other. But, what other existing data in the index are other vector embeddings from the image automatically related to? Things like Morocco, Africa, Hollywood films from the 1940s, etc., but you get the idea.

Now that you know how AI models create vector embeddings from text, images, etc. What data do you have that you or your customers would benefit from a similarity search. For example, “Give me a list of writings similar to text written by Maya Angelou.”

If you had a vector index to store all of this data, what would you create?

Top comments (0)