What is Embedding?
After text is split into chunks, the next process is called embedding. In this step, each chunk is converted into vectors (points in vector space). In vector-based RAG systems, chunks are converted into vectors so that semantic search can be performed efficiently.
Why Do We Need to Convert Chunks into Vectors?
The main goal of a RAG application is to achieve semantic search.
Semantic
Example
The word feline is related to the cat family, even though the words are different. Understanding that “feline” and “cat” are related is called semantic understanding.
Similarity
When a user asks a query, semantically related chunks are returned even though the exact words in the chunks may be different.
Semantic Similarity
Semantic similarity combines:
- Intent
- Context
- Meaning
The purpose is to establish relationships between the user query and the documents stored in the RAG system. This allows the system to retrieve relevant information from the database and provide it to the LLM for further processing.
Words that are semantically related are usually stored closer together in multi-dimensional vector space.
Cosine Similarity
To determine how close vectors are to each other, cosine similarity is commonly used.
When a user query arrives:
- The query is converted into a vector
- Cosine similarity is calculated between the query vector and stored vectors
- The closest vectors are retrieved
Retrieval Methodologies
Two major retrieval methodologies are used:
1. KNN (K-Nearest Neighbors)
KNN compares the query vector with all stored vectors one by one to find the nearest neighbors.
Advantage
More accurate retrieval
Disadvantage
Slow for very large datasets
2. ANN (Approximate Nearest Neighbors)
ANN approximately finds the nearest vectors instead of comparing every single point.
This method is mainly used when:
- The document volume is huge
- Faster retrieval is required
- Time constraints exist
ANN improves retrieval speed while sacrificing a small amount of accuracy.
Why Cosine Similarity Instead of Sine or Tangent?
Cosine similarity works effectively because:
If two vectors are very close and highly related, the cosine similarity value approaches 1. If the angle between vectors increases, the cosine similarity value decreases, meaning the vectors are less related
Why Not Sine or Tangent?
For small angles:
- Sine values remain close to 0
- Tangent values can fluctuate significantly
These measurements are not stable for semantic comparison. Cosine similarity provides a more reliable way to measure semantic closeness between vectors.
Embedding Dimensions
Embedding models can generate vectors with dimensions ranging from 256 to 3000 or more.
The dimension size depends on the embedding model and the amount of contextual information it captures.
Generally:
- Higher dimensions capture richer semantic information
- Lower dimensions are faster and cheaper but may lose context
Types of Embedding Models
Choosing an embedding model completely depends on the application scenario.
1. Based on Query Type
Symmetric Models
Symmetric embedding models are used when the query and the documents are similar in structure and length.
Examples
nomic-embed-text
Qwen embeddings
These are commonly used in semantic search systems.
Asymmetric Models
Asymmetric embedding models are used when:
- Queries are short
- Documents are long
Example
Google Gemini embedding models
These models are optimized for retrieving long documents from short queries.
2. Based on Retrieval Type
Dense Embeddings
Dense embeddings mainly focus on semantic meaning.
These embeddings generate dense vectors where most values contain meaningful information.
Examples
Cohere embedding models
ChatGPT OSS 120B embeddings
Advantage
Better semantic understanding
Sparse Embeddings
Sparse embeddings mainly focus on exact keyword matching.
They commonly use the BM25 (Best Match 25) algorithm, which is based on:
- TF (Term Frequency)
- IDF (Inverse Document Frequency)
TF-IDF Concepts
TF (Term Frequency)
Measures how many times a word appears in a document.
IDF (Inverse Document Frequency)
Measures how important a word is across the entire document collection.Words that appear too frequently across all documents are considered less important.
Transformer Architecture
The transformer architecture was a major breakthrough for LLMs.
Transformers mainly contain:
- Encoder
- Decoder
Encoder
The encoder converts text into embeddings (vectors).
Decoder
The decoder converts embeddings back into human-readable text after processing.
This architecture enables modern LLMs to understand and generate natural language effectively.
Choosing a Vector Database
Chroma
Open source
Easy to set up
Suitable for basic and small-scale applications
FAISS
Better for large document collections
Optimized for high-performance semantic search
Commonly used in production-scale retrieval systems
Top comments (0)