DEV Community

Cover image for Part 2: Vector Embeddings in simplest terms
Suraj Bera
Suraj Bera

Posted on

Part 2: Vector Embeddings in simplest terms

This is my Day 2 of learning AI fundamentals where I will be covering the following concepts:

  • Vector Embeddings
  • How Tokenisation and Vector Embeddings relate to each other

Vector embeddings:

  • Vector embeddings is the process of turning each token id(generated during tokenisation) into high dimensional vector where semantic similarity results into geometric closeness. Think of it like this: dog is closer to puppy, also closer to dog food. But dog is not closer to car or petrol.

When we use embeddings?

  1. Recommendations: Suggest similar songs, videos, movies, products
  2. Search: Get search results when keywords don't match
  3. Cluster: Grouping related concepts together.

A beginner might be confused in terms like: Vector, High Dimensional.

  • This is an example of a vector: [0.9, 0.8, 0.1]. Array/List/Vector all mean the same thing. 'List' is just a plain english, 'array' is the programming term, vector is the math/ml term.
  • High Dimensional: Multi-dimensional just means more than 1 - could be 2D, 3D, 10D,... too vague. But high dimensional specifically means hundreds of thousands of dimensions(Open AI's text-embedding-3-small = 1536 dimensions)

There are 2 types of Search:

  1. Lexical Search: Exact words/characters search
  2. Semantic Search: Meaning/Intent search

Vector embedding enables semantic search

How tokenisation & vector embeddings are connected together?

text-tokenisation --> token Ids-embedding lookup --> vectors --> transformers
"hello" --> [221728] --> [0.21,-0.44,...]

Vector Embedding Visualizer

Top comments (0)