The main steps to build a RAG pipeline are divided into two major processes:
RAG Indexing
RAG Query
RAG INDEXING
The indexing phase converts raw documents into structured vector representations so they can be efficiently retrieved using similarity search later.
Architecture diagram
1) Document ingestion and preprocessing
The first process starts with ingestion, cleaning, and converting the data into a proper format. This involves transforming raw data from the Bronze layer to the Gold layer.
This is the very first and most crucial step, and it requires proper care before moving to the next stages.
Ex- Suppose raw data is in bullet points like this
RAW DATA
INTRODUCTION TO DATA SCIENCE!!!
• DATA is everywhere in today's world
• MACHINE learning helps in prediction
• tools like PYTHON , R , SQL are used
AFTER PREPROCCESSING AND NORMALISATION
Section: Introduction to Data Science
Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used.
2) Chunking
Chunking means breaking large text into smaller pieces so the computer can understand and search it more effectively.
Imagine you have a 5000-page book and you want to perform Q&A on top of it. To process the context properly, you split the text content based on:
- Topics.
- Headings.
- Paragraphs.
- Recursive patterns using delimiters like "\n\n" and "."
- Other chunking strategies.
Once the chunks are ready with metadata such as:
-
chunk_id. -
chunk_index.
The chunking.json file (or Parquet for large-scale data) is stored, or it can be directly fed from memory into embedding models.
Ex - chunk.json file.
[
{
"chunk_id": "ml_intro_chunk_0001",
"chunk_index": 0,
"doc_id": "machine_learning_basics",
"section": "Introduction to Machine Learning",
"content": "What is AI, Types of Algorithms",
"page_start": 1,
"page_end": 1,
"char_start": 0,
"word_count": 6,
"language": "en"
}
]
3) Embeddings
The real juice lies here where all the data is converted into numbers so computers can understand its meaning.
Let’s say we embed the following sentence into a 3D space (in real-world scenarios, this can be 4000+ dimensions):
The dog and cat are friends
As shown in the image, the vectors for word "dog" and "cat" point in a similar direction (the cosine angle between them is small). However, the vector for the word "cricket" points in a different direction compared to "dog" or "cat".
At this stage, all document chunks are converted into vector embeddings and stored inside the vector database.
The indexing phase is now complete. The system has built a searchable semantic space.
Now let’s understand how the system responds when a user submits a query.
RAG QUERY
When a user submits a query, it is first converted into an embedding using the same model used during indexing. The retrieved results are then passed to an LLM for output generation and reasoning
Query Processing Diagram
Step 1: Convert User Query to Embedding
Convert the user query into vector embeddings using the same model that was used during document vector storage.
Step 2: Similarity Search
Once the user query is converted into a vector, it is compared with all the stored document vectors in the vector database.
Using cosine similarity, the system measures how close the query vector is to each document vector. The closest ones (top-k results) are selected and sent to the LLM.
Ex - Suppose the user asks:
What are the types of algorithms ?
The system compares this query vector with stored chunks like:
What is AI
Types of Algorithms
History of Computers
The chunk Types of Algorithms will have the highest similarity score, so it gets selected and passed to the LLM for generating the final answer.
Step 3: LLM Response Generation
The LLM takes:
The original user query
The retrieved document chunks (if found)
It appends the retrieved content to the query context and generates the final output answer
I’m currently learning more about RAG and Agentic AI step by step. If this helped you understand the pipeline better, feel free to like or follow for more as I share my journey.



Top comments (0)