DEV Community

Cover image for RAG for Dummies
NgetichB
NgetichB

Posted on

RAG for Dummies

Retrieval Augmented Generation(RAG) is a machine learning technique that enhances the capabilities of Large Language Models to provide more accurate and up-to date responses.

How RAG works:

(i)The user asks the Large Language Models (LLM) a question

(ii)Retrieval is the second step whereby the RAG system uses the question asked to search an external knowledge base for relevant information. The RAG system uses these three techniques; chunking, embedding and vector database. Chunking- the information in the knowledge base is broken into smaller pieces for efficient searching, the chunks are then converted into numerical representations that capture their meanings, finally the system searches a vector database to find the chunks that are more likely similar to the question asked.

(iii)Augmentation-The most relevant information from the retrieval process is then added to the original question to form an ‘augmented prompt’

(iv)The LLM receives the prompt and uses the original question and the retrieved context to generate a more comprehensive and accurate response

Models used in RAG

(i)Retrieval Models- these act as a detective that gathers relevant documents from the external knowledge base before the LLM generates an answer. The two types of retriever models are Sparse- examples BM25 and TF-IDF and Dense retrievers -eg Llamaindex & Haystack.

(ii) Language Models (LLMs)- the generation component takes the users original prompt and the retrieved information and uses its learned knowledge to create a coherent, natural language response. The examples are- Transformer-based models( GPT-2, GPT-3, and BART (Bidirectional and Auto-Regressive Transformers) and Flan T5 used for the generation part

RAG is applied in Medical AI, chatbots, chat engines and legal assistance. It serves the purpose of bridging the gap between static information and dynamic knowledge hence reduces ambiguity and increases precision, transparency and accuracy.

Top comments (0)