What is RAG? An innovative technique that is transforming language models.

#rag #ai #programming #tutorial

Author's Note: Greetings everyone! I am currently involved in an exciting project with a company that has strategic partnerships with market leaders, including Nvidia. In addition, I plan to write an article addressing the high-performance hardware industry in the near future. One of my responsibilities in this project is to develop artificial intelligence to assist our internal team in understanding and applying company policies and standards, as well as learning and contributing to international procedures. The purpose of this article is to share my recent research, aiming to improve the natural language model we are developing, and also to discuss, theoretically, a technology that is being widely adopted by giants such as OpenAI, Microsoft, and Tesla.

What is RAG?

RAG, or Retrieval-Augmented Generation, at a simple level, is an information retrieval model that aims to increase the accuracy of responses based on a specific domain. For example, when using the GPT chat API to train a model that obtains constantly updated information, RAG can be a solution. It operates as a mechanism that searches for data in a knowledge repository – similar to a vast digital library – to offer answers or fulfill specific requests. RAG works in three simple steps:

Retrieval: In this step, RAG examines a specific knowledge base or domain, and can even access external sources, such as Wikipedia pages, for example.
Prompt Analysis: Here, an analysis of the initial text entered by the user is performed to better understand their intent.
Generation: Finally, detailed information is generated based on the previous steps and the context provided by the user.
Essentially, RAG integrates a search engine with text generation capabilities to provide more accurate and relevant answers in specific contexts.

Vector database: The key to efficient data retrieval.

Vector databases are commonly used to power vector search use cases, such as visual, semantic, and multimodal search. More recently, they have been combined with generative artificial intelligence (AI) text models to create intelligent agents that provide conversational search experiences. They can also prevent generative AI models from hallucinating, which can cause chatbots to provide non-factual but reliable answers.

The vector database is crucial among these components, providing critical support for the various use cases. Researchers quickly found themselves limited when trying to capture the complex relationships and meanings of the data. How to explain that "football" and "basketball" are both sports, but are distinct from each other? Or how to demonstrate that "red" and "blue" are colors, but do not share the same hue? The approach of adding new dimensions for each new category soon proved unfeasible due to its increasing complexity.

The solution came in the form of dense vectors, where each concept, such as "sport," "color," or "feeling," would be represented by a single vector with multiple distinct values, i.e., attributes. For example, instead of [1, 0, 0] for "football," the vector could be [0.8, 0.6, -0.2, ...], capturing a wide range of characteristics of the concept. However, manually creating these dense vectors for all possible categories was impractical due to their diversity and complexity.

Furthermore, these dense vectors initially lacked a clear meaning. Although the machine could distinguish between different concepts, how could it be taught that "football" and "basketball" are more similar than "soccer" and "tennis ball"? The application of notions of similarity between concepts is fundamental to understanding our world.

As an example, I will provide JavaScript code executing a LanceDB query: LanceDB is an open-source vector database with easy installation and configuration.

// Embedded in your app, no servers to manage!
const lancedb = require("vectordb");

// Persist your embeddings, metadata, text, images, video, audio & more
const db = await lancedb.connect("./data/my_db");
const table = await db.openTable("my_table");

// Production-ready, scalable vector search with optional SQL filters
const query = await table.search([0.1, 0.3, 0.2])
                         .where("item != 'item foo'")
                         .limit(2)
                         .execute();

This JavaScript code uses the vectordb library to interact with a database of vector embeddings. Initially, it connects to the database located at "./data/my_db" and opens a table called "my_table" within that database. This table can contain a variety of data associated with vector embeddings, metadata, text, images, videos, audio, and more.

Next, the code performs a search on the table using a specific vector [0.1, 0.3, 0.2], which likely represents a vector embedding of some data type. Additionally, it applies optional filters using SQL-like syntax, where it excludes results where the value of the "item" column equals "item foo".

Use Cases

RAG has many applications in the business and research fields; I will mention some that I researched and found quite useful.

Using RAG for repository updates

Alice is a developer who wants to learn about specific changes in a GitHub repository (in this case, the TimescaleDB repository). Instead of manually inspecting the code or commit history, she finds an application called "Talk to my GitHub repository" and uses it to streamline her research. Alice wants to know about recent changes to a feature called "continuous aggregates" and asks the following question on November 29, 2023:

What's new with the continuous aggregates feature in the last 6 months?

Answer from a time-conscious, non-RAG LLM 👎

The feature called continuous aggregates has undergone some changes. The most recent change was made by Sven Klemm on January 24, 2020. This change added a new mode called real-time aggregates, which combines materialized data with new data received after the last update. This new mode is now the default behavior for newly created continuous aggregates. To update existing continuous aggregates to the new behavior, a specific command needs to be run. LLM Answer with Time-Aware RAG ✅

In the last 6 months, two changes were made to the feature called continuous aggregates. The first change was made on August 7, 2023, by Fabrízio de Royes Mello. This change relaxed the strong table locking when updating a continuous aggregate, allowing the update procedure to run across multiple sessions with fewer locks. The second change was made on August 29, 2023, by Jan Nidzwetzki. This change made the update/downgrade test deterministic by adding an ORDER BY specification to two queries in post.continuous_aggs.v3.sql.

The answer using time-aware RAG is much more helpful—it's within the timeframe specified by Alice and is relevant to the topic. The difference between the two answers lies in the retrieval step.

RAG + Notion

Assuming a company's database isn't stored in a database or some other technology requiring coding, but rather something simpler to use like Notion or Google Docs, it's possible to integrate it with information from other applications. I found a very interesting article on Medium that addresses this issue.

Conclusion and References

Well, that's it, folks. I'm still in the testing phase of artificial intelligence-related technologies so I can actually implement them in code at work. AI in general, some companies are quite disappointed with its use. I believe this is common for any innovation entering the market! In fact, artificial intelligence follows the Gartner hype cycle.

But to be honest, I'm quite excited about the things we'll see in the coming years :D

https://medium.com/@johntday/creating-a-custom-ai-rag-from-your-notion-database-openai-python-langchain-notion-qdrant-f778e2bee3b8

https://medium.com/@jeremyjgriffith/retrieval-augmented-generation-rag-application-using-snowflake-cortex-and-streamlit-9cb261e81c2e

https://lancedb.com/

https://medium.com/totvsdevelopers/introdu%C3%A7%C3%A3o-ao-rag-retrieval-augmented-generation-parte-2-2f936b8e04df