Skip to content

DEV Community

Suraj Bera

Posted on May 3 • Edited on May 9

Day 2: Vector Embeddings in simplest terms

#ai #beginners #machinelearning #nlp

This is my Day 2 of learning AI fundamentals where I will be covering the following concepts:

Vector Embeddings
How Tokenisation and Vector Embeddings relate to each other

Vector embeddings:

Vector embeddings is the process of turning each token id(generated during tokenisation) into high dimensional vector where semantic similarity results into geometric closeness. Think of it like this: dog is closer to puppy, also closer to dog food. But dog is not closer to car or petrol.

When we use embeddings?

Recommendations: Suggest similar songs, videos, movies, products
Search: Get search results when keywords don't match
Cluster: Grouping related concepts together.

A beginner might be confused in terms like: Vector, High Dimensional.

This is an example of a vector: [0.9, 0.8, 0.1]. Array/List/Vector all mean the same thing. 'List' is just a plain english, 'array' is the programming term, vector is the math/ml term.
High Dimensional: Multi-dimensional just means more than 1 - could be 2D, 3D, 10D,... too vague. But high dimensional specifically means hundreds of thousands of dimensions(Open AI's text-embedding-3-small = 1536 dimensions)

There are 2 types of Search:

Lexical Search: Exact words/characters search
Semantic Search: Meaning/Intent search

Vector embedding enables semantic search

How tokenisation & vector embeddings are connected together?

text-tokenisation --> token Ids-embedding lookup --> vectors --> transformers
"hello" --> [221728] --> [0.21,-0.44,...]

Vector Embedding Visualizer

Here is the link to part 1

Top comments (0)

Subscribe