RAGs for beginners

#ai #machinelearning #deeplearning

Advancement in Artificial Intelligence has made it possible for humans to interact with AI in different ways such as holding conversations that make sense for example using LLMs like GPT-4.

Despite the strides achieved, one thing that can be frustrating when dealing with LLMs is querying and getting responses that make no sense. For instance, in the medical field, you can ask about the mode of action of a new drug in the market. Instead of answering correctly, the model gives you information that is outdated or made up.

This is one of the limitations of traditional LLMs: they have knowledge gaps because they were trained on totally different data. LLMs also tend to provide generic answers and they hallucinate, answering questions out of context. To solve this, RAG systems were introduced.

RAG

RAG refers to Retrieval-Augmented Generation systems which provide support to LLMs. RAG systems work by enabling the LLMs to access information from an external database. As a result, the models are able to generate responses that have more context and are up-to-date thus addressing the limitations of traditional Large Language Models.

How does it work?

A simple RAG system flow has six stages: data extraction, chunking extracted data, embedding the chunks, storing the data in a vector database, retrieval when a user queries the database and generating a response.

i) Data collection

This is the first stage which involves extracting data from a source, such as a pdf, that will be fed into the model. After extraction, the data is loaded ready to go to the next step.

ii) Chunking

Data is extracted as a single string. Chunking involves breaking down the data into chunks using text splitters such as the RecursiveCharacterTextSplitter. Define the chunk size (characters in each chunk) and the chunk overlap(by how many characters the chunks overlap) in order to preserve context.

iii) Embedding

The chunks are not interpretable by the machine in their extracted state. Thus an embedding model is used to transform the characters into vector representations which are understood by the machine. There are different embedding models from SentenceTransformers that can be used based on your need.

iv) Vector Database

Once the chunks have been embedded, they need to be stored in a vector database such as Chroma. The vector database stores the embeddings as vectors thus easier for it conduct a similarity search when queried.

v) Retrieval

The user interacts with the LLM at this point by querying it. The queries are embedded into vectors then a similarity search occurs in the data base, comparing the vectors of the query to what is stored in the vector database. The most relevant response is retrieved and it proceeds to the next step.

vi) Generating a response

After retrieving information from the database, a response is generated to the user based on the information available to the system.

RAG systems can be used in various ways such as:

Providing customer support that is helpful to the users rather than frustrating.
In market research where it has access to a vast pool of data.
Useful in recommendation systems

It is incredible how much incorporating a RAG to a LLM improves the response quality of the model as well as the satisfaction of the user. In our ever growing tech space, it will be great to see how RAG systems will be improved upon and incorporated more in our day-to-day activities.

DEV Community