Dense embedding have continuous numeric values. i.e after decimal point values will be present. Chunk will be converted to embeddings, each embedding point will have number like [0.3455566 ,0.6777779, ...]. Generated vectors will be plotted in a space called latent space. Discrete values like 0 won't be present.
Sparse embedding will mostly have values like 0. Rather than semantic meaning, it considers frequency or importance of words in a text.
Ex: one hot encoding
Models for Dense embedding
1. LLM
- Embed only LLMs are also available. Sole purpose of these LLMs is to generate embedding. Ex: Nomic embed, BGE.
- We can also give a prompt to general purpose LLM to generate embedding. But this is costly operation.
2. Transformers (encoder)
Ex: Minilm, nomic transformers
These models are available in hugging face, ollama.It also hosts other models as well.
How can we evaluate the performance of RAG system ?
For a given user query, RAG system will return some set of matching documents. If the returned documents matches with our expectations, we can say it is yielding good results. Say, if our expectation from RAG is to return a, b, c, d, e documents for a user query and in reality it returns a, b, d docs alone. Out of 5, 3 is returned. It is meeting expectation to half right ? Like how we write unit test cases for a software code, we need to write test cases for user query for evaluating the RAG systems.
Should the same embedding model be used throughout the RAG pipeline?
Yes. If we use nomic embed text for document vectorisation then the same model should be used for query vectorisation as well. Suppose if we different models(one for doc. vectorisation and one for query vectorisation), then there is a chance that the documents vectors will be plotted in one space and query vector will be plotted in another space. To avoid this we need to use the same embedding throughout the pipeline.
Top comments (0)