Arum Puri

Posted on Oct 24

What is RAG in AI? How It Combines Retrieval with Generation for Accurate Results

#rag #genai #llm #ai

Imagine a world where AI-generated legal arguments are so convincing that even seasoned lawyers are fooled. This isn't science fiction; it's a reality that lawyer Steven Schwartz experienced firsthand when he unknowingly submitted six fake cases generated by ChatGPT.

To be fair, ChatGPT was a new technology at that time. Even the judge was uncertain how to handle the situation, as nothing like this had happened before. In the end, Mr. Schwartz was fined $5,000 and had to issue letters of apology to the real judges whose names were falsely cited in the GPT-generated cases.
Schwartz’s story is a stark reminder of the potential risks of using AI without fully understanding its limitations. While ChatGPT is undeniably useful—helping many professionals with day-to-day tasks—it’s not without flaws. Careless use, as we saw with Schwartz, can lead to serious consequences.

The reason behind ChatGPT giving Schwartz fake cases lies in a well-known limitation of large language models (LLMs) like ChatGPT called hallucinations. This occurs when the AI generates answers that seem convincing but actually fabricated. Another flaw of LLMs like ChatGPT isn't always up to date. For example, if you asked ChatGPT 4 about recent events, such as one of the biggest pop singers' death, Liam Payne on 16 October 2024, it wouldn't know, because ChatGPT training data only goes up to December 2023.

What is RAG?

One promising solution to the problem of AI hallucinations is RAG. RAG or retrieval augmented generation is a technique to improve LLM performance. RAG uses external data and combines it with LLM for better results. This process starts by transforming external data into high-dimensional vectors, storing it in a vector database, and then retrieving the most relevant information when needed. RAG would depend mainly on two things: the retriever and the generator.

By Turtlecrown - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=150390279

Retriever Component

The retriever is responsible for fetching the most relevant information from external data. The result of this retrieval would directly affect the generator and several techniques available to make sure the most relevant data was chosen. Here are some examples of the techniques:

- Self-Query Retriever
The self-query Retriever is a method where the system reformulates the original user query to better match the documents in the database. For example, the query “impact of AI on job markets,” sounds vague on the word "impact". The phrase “impact” has many meanings, so it might automatically generate additional queries to retrieve more relevant documents.

- BM25Retriever
BM25 is a retrieval technique that uses term frequency and inverse document frequency (TF-IDF) principles. It gives score to documents based on how often query terms appear in them. Unfortunately, this retriever does not always capture the semantic meaning of queries, meaning this technique is the best only for Keyword-based search, search engines, e-commerce platforms

- VectorStore Retriever
A VectorStore Retriever uses vector-based retrieval methods. The queries and documents are changed into high-dimensional vectors in a semantic space. The retriever fetches documents based on the closeness (cosine similarity or Euclidean distance) between the query vector and the document vectors. This approach is effective when understanding the context and meaning behind words is crucial. For example, In a recommendation system, where a user queries describing a book, the system will suggest similar books by measuring the vector similarity with a database of book embeddings and query embeddings.

- EnsembleRetriever
An EnsembleRetriever combines the results from multiple retrieval methods to improve the overall quality of the retrieved documents. It uses several retrieval techniques (such as BM25, dense retrieval, and others) and merges their results.

- MultiQuery Retriever
The Multi-Query Retriever will break the query into several distinct subqueries based on its structure. This retriever breaks down the query into separate topics related to the query itself.

Choosing which retriever to use depends on what kind of task this system will do in the future. Some techniques work better with structured data, while others excel with unstructured text. The list of retrievers above isn't the only retrievers available, there are still a lot of choices of retrieval out there.

Generator Component

The generator component comes into play once the retriever selects the relevant documents. By the time this article was written, GPT-4 and Gemini Pro were still favorites of many developers. The generator combines its response with the retrieved content and mixes it into a new coherent and contextually appropriate answer.

Advantages & Challenges of RAG

Using RAG minimizes the risk of incidents like Steven Schwartz’s courtroom disaster happening. By retrieving real-time and relevant information, RAG significantly improves the accuracy of the generated content, especially in fields where current knowledge is critical. The ability to use up-to-date information can open up many possibilities in various fields such as medical and scientific research.

Using RAG to build chatbots is also common these days. The ability to retrieve domain-specific information from external knowledge sources makes responses more accurate and context-aware. This ability significantly improves the user experience, making interactions with chatbots more natural and satisfying.

With so much abundance advantage RAG is not without challenges.
Ensuring that the retrieved documents are highly relevant to the user query while still generating fluent and contextually correct responses is a tough challenge.

The quality of the retrieved documents also matters. If the retriever pulls irrelevant or noisy data, the generator’s output will suffer. Over time the dataset also grows, and then the computational load for retrieval also gets bigger. Ensuring the system is efficient while processing large-scale information is an ongoing challenge in RAG research.

Conclusion

Retrieval Augmented Generation (RAG) represents a significant advancement in the field of artificial intelligence. By combining the power of large language models (LLMs) with external data, RAG addresses the limitations of traditional AI systems and offers a more reliable and informative approach. As researchers continue to refine RAG techniques and explore new applications, we can anticipate a future where AI plays an even more significant role in our lives, providing valuable assistance and insights across a wide range of domains.

DEV Community

What is RAG in AI? How It Combines Retrieval with Generation for Accurate Results

What is RAG?

Retriever Component

Generator Component

Advantages & Challenges of RAG

Conclusion

Top comments (0)

Read next

How to Create Your Own RAG with Free LLM Models and a Knowledge Base

Democratizing AI: Transforming Industries with AI Power

Amazon Q Developer Tips: No.16 How to tackle LLM training data cutoff

Software knowledge