RAG stands for Retrieval Augmented Generation. Why do we even need RAG?? To answer this lets take a look at What LLMs and SLMs are.
LLM(Large Language Model). Data on several categories(generalized) will be given as input. From that, a model would be created. What is a model ? To understand this, lets take mathematical equation of a straight line
y = mx +c
Lets take x values to be 1, 2, 3, ... and y values to be 2, 4, 6, 8, 10. We can use whatever values for m and c to get our desired y value(like 2, 4 etc). Instead of a simple linear equation, we can also consider double, cubic or equations(order of the variables like x^2, x^3 etc...). When we say a model is os of 4b parametrs, 120b parameters and all , it refers to a big equation. Using the input data, a mathematical equation is being created. Larger the equation, more better the result will be. i.e if model is exposed and trained on several amount of data, results generated will also be more relevant and good.
LLMs predict the next word. If we give hello, it may give hello world. We can control how the output should be generated by LLM. like factual or imaginative type. This is determined by a factor used in LLM called Temperature. Higher the temperature, more factual it will be. Lower the temperature, output will be more imaginative.
Temperature is meant for a single query
SLM(Small Language model)
Instead of training the data on vast amount of data across all categories, training a model on the data of specific domain to solve a set of tasks from that domain (like speech to text generation) is referred to as small language model.
Think of it like this, LLMs are generic and SLMs are specific
If we ask a question to LLM based on the data it was trained, we will be getting a good result. But, if we ask a question which is out of the scope of trained data, it will try to answer it i.e makes up answer on its own. This is called hallucination. (wont say like i dont know it, unless we explicitly prompt it).
Analogy: Lets take GPT-OSS model (released at around 2025). If we ask the model now about the Iran-Isreal war, it wont know about it. As the war did not happen at 2025.
In the sameway think about this, In our company, we have some set of data stored in doc, wikis etc. Models out there (gemini, claude) wont know about it. Somehow, if were able to link the LLMs with our private data, we can use that LLM for our internal usage in our company/personal use. This is called RAG. i.e Linking LLM with our data and asking LLM some questions about our data is what RAG is.
One of the approach to achieve LLM to answer our queries on private data is to train the LLMs with the private data. This is one way but not the only way.
Another way is, uploading documents into a vector DB. Before getting into deep in this. Lets first What is vector ? one that has direction and magnitude. For our case, we wont be dealing with direction only dealing with magnitude.
We will be breaking the document into several chunks and convert it into points and plot it in a graph. Lets just plot apple, orange, pear, doctor as points in a graph. Which two are points are releveant here? apple and doctor(apple a day keeps a doctor away), how more relevant
Top comments (0)