In the ever-evolving world of Large Language Models (LLMs), Retrieval Augmented Generation (RAG) has emerged as a technique for combining search and generation. Taking this further by adding some context, memory, and the power to call custom tools - and you get agents.
Understanding RAG: Combining Search and Generation
Quick recap of techniques that come into play here:
- Large language models - both as specific embedding models used to vectorize text, which enables..
- Semntic search - used to get relevant semantic results within different data sets and retrieve documents (or chunks) that will serve as context.
- Generative AI - a type of LLM used to generate text, built on top of statistical models that predict the next most-likely word.
- RAG - Retrieval Augmented Generation - putting the techniques above together to build retrieval-based answering tools.
Why use RAG above simply asking an LLM a question?
- LLMs are frozen in time and domain to the latest training data. They don't know about data not within their training. You would need to re-train an LLM to include new information all the time. This would not only be very costly in terms of infrastructure and time, but also:
- Data privacy - some private data simply cannot be included in LLM training - instead we want to reference this information within trusted architecture.
Beyond RAG: Agentic Workflows and Their Potential
Going further, even RAG will have its limitations. Namely:
- Conversational flow and memory - as you continue to explore a solution you may need to add context of previous conversations with the LLM, or your single prompt will get too large for a single token call.
- Tools - rather than just search, in order to answer our question, the LLMs may need to also perform some computations or tasks, or independetly collect other data.
Introducing Tools
We can define tools as pieces of code that agents can use (for example: a search query, a database lookup, running plugins, executing small code snippets, making calculations, sending messages or even generating graphics). We can build access data control or certain API best practices into these tool - also giving us more control over how the LLM will perform these tasks.
At the same time, it is the LLM's job to choose which of these tools to run and in what order. So you don't have to hardcode the entire logic of the process. Rather we can rely on the LLM's understanding of what needs to be done and detecting the intent from the prompt.
We also introduce memory or context - agents are able to continue conversations and remember user preferences or previous discussions.
To summerize, agents are the latest development within the NLP field, standing on the shoulders of other impressive techniques such as semantic search or RAG. They can change how we interact with LLMs, or even how we browse the internet or perform automatable tasks.
In this new paradigm, we're seeing LLMs able to make API calls, search and retrieve data, perform aggregations or simple tasks, and becom a new form of User Interface between us and a lot of other tools and programs.

Top comments (1)
I previously wrote some more about RAG over a year and a half ago: dev.to/iuliaferoli/what-is-rag-a-q...