Retrieval augmented generation (RAG) is an artificial intelligence (AI) architecture that incorporates external knowledge sources to enhance the capabilities of large language models (LLMs). RAG pulls in relevant information from outside databases and amplifies input from LLM so that the output can include more relevant, accurate, and contextually appropriate responses. This in turn makes LLMs more powerful by combining them with the ability to retrieve real-time data.
How do rags work?
The process begins with an input query or prompt. This could be a question, a statement, or any text that requires a response. The model first analyzes this input to understand its context and intent.
When prompted, the system searches a large set of documents (such as PDFs, FAQs, web pages, or databases) using a retriever model, often based on semantic similarity or keyword matching and selects the most relevant pieces of content.
RAG integrates this information and original input query and the embeddings of the retrieved documents are combined to form a comprehensive context for the generative model.
The retrieved documents are passed to a generator model which uses them to craft a coherent, contextually accurate response. In this way, the responses are plausible and grounded in real data.
Top comments (0)