From Documents to Answers: How RAG Works

IFRAH ASHRAF — Sun, 22 Feb 2026 18:48:17 +0000

The main steps to build a RAG pipeline are divided into two major processes:

RAG Indexing
RAG Query

RAG INDEXING

The indexing phase converts raw documents into structured vector representations so they can be efficiently retrieved using similarity search later.

Architecture diagram

1) Document ingestion and preprocessing

The first process starts with ingestion, cleaning, and converting the data into a proper format. This involves transforming raw data from the Bronze layer to the Gold layer.

This is the very first and most crucial step, and it requires proper care before moving to the next stages.

Ex- Suppose raw data is in bullet points like this

RAW DATA

INTRODUCTION TO DATA SCIENCE!!!
• DATA is everywhere in today's world  
• MACHINE learning helps in prediction  
• tools like PYTHON , R , SQL are used

AFTER PREPROCCESSING AND NORMALISATION

Section: Introduction to Data Science  
Content: Data is everywhere in today's world. Machine learning helps in prediction. Tools like Python, R, and SQL are used.

2) Chunking

Chunking means breaking large text into smaller pieces so the computer can understand and search it more effectively.

Imagine you have a 5000-page book and you want to perform Q&A on top of it. To process the context properly, you split the text content based on:

Topics.
Headings.
Paragraphs.
Recursive patterns using delimiters like "\n\n" and "."
Other chunking strategies.

Once the chunks are ready with metadata such as:

chunk_id.
chunk_index.

The chunking.json file (or Parquet for large-scale data) is stored, or it can be directly fed from memory into embedding models.

Ex - chunk.json file.

[
  {
    "chunk_id": "ml_intro_chunk_0001",
    "chunk_index": 0,
    "doc_id": "machine_learning_basics",
    "section": "Introduction to Machine Learning",
    "content": "What is AI, Types of Algorithms",
    "page_start": 1,
    "page_end": 1,
    "char_start": 0,
    "word_count": 6,
    "language": "en"
  }
]

3) Embeddings

The real juice lies here where all the data is converted into numbers so computers can understand its meaning.

Let’s say we embed the following sentence into a 3D space (in real-world scenarios, this can be 4000+ dimensions):

The dog and cat are friends

As shown in the image, the vectors for word "dog" and "cat" point in a similar direction (the cosine angle between them is small). However, the vector for the word "cricket" points in a different direction compared to "dog" or "cat".

At this stage, all document chunks are converted into vector embeddings and stored inside the vector database.

The indexing phase is now complete. The system has built a searchable semantic space.

Now let’s understand how the system responds when a user submits a query.

RAG QUERY

When a user submits a query, it is first converted into an embedding using the same model used during indexing. The retrieved results are then passed to an LLM for output generation and reasoning

Query Processing Diagram

Step 1: Convert User Query to Embedding

Convert the user query into vector embeddings using the same model that was used during document vector storage.

Step 2: Similarity Search

Once the user query is converted into a vector, it is compared with all the stored document vectors in the vector database.

Using cosine similarity, the system measures how close the query vector is to each document vector. The closest ones (top-k results) are selected and sent to the LLM.

Ex - Suppose the user asks:

What are the types of algorithms ?

The system compares this query vector with stored chunks like:

What is AI
Types of Algorithms
History of Computers

The chunk Types of Algorithms will have the highest similarity score, so it gets selected and passed to the LLM for generating the final answer.

Step 3: LLM Response Generation

The LLM takes:

The original user query
The retrieved document chunks (if found)

It appends the retrieved content to the query context and generates the final output answer

I’m currently learning more about RAG and Agentic AI step by step. If this helped you understand the pipeline better, feel free to like or follow for more as I share my journey.

DEV Community: IFRAH ASHRAF