This is my Day 2 of learning AI fundamentals where I will be covering the following concepts:
- Vector Embeddings
- How Tokenisation and Vector Embeddings relate to each other
Vector embeddings:
- Vector embeddings is the process of turning each token id(generated during tokenisation) into high dimensional vector where semantic similarity results into geometric closeness. Think of it like this: dog is closer to puppy, also closer to dog food. But dog is not closer to car or petrol.
When we use embeddings?
- Recommendations: Suggest similar songs, videos, movies, products
- Search: Get search results when keywords don't match
- Cluster: Grouping related concepts together.
A beginner might be confused in terms like: Vector, High Dimensional.
- This is an example of a vector: [0.9, 0.8, 0.1]. Array/List/Vector all mean the same thing. 'List' is just a plain english, 'array' is the programming term, vector is the math/ml term.
- High Dimensional: Multi-dimensional just means more than 1 - could be 2D, 3D, 10D,... too vague. But high dimensional specifically means hundreds of thousands of dimensions(Open AI's text-embedding-3-small = 1536 dimensions)
There are 2 types of Search:
- Lexical Search: Exact words/characters search
- Semantic Search: Meaning/Intent search
Vector embedding enables semantic search
How tokenisation & vector embeddings are connected together?
text-tokenisation --> token Ids-embedding lookup --> vectors --> transformers
"hello" --> [221728] --> [0.21,-0.44,...]
Top comments (0)