rosidotidev

Posted on Jul 20 • Edited on Jul 23

CrewAI meets RAG: built-in and custom solutions

#crewai #genai #openai #ai

(Originally published on my blog: https://roby73.hashnode.dev/crewai-meets-rag-built-in-and-custom-solutions)

Generative AI offers immense possibilities, however to truly be effective in real-world applications, it needs to overcome key hurdles: ensuring accuracy and integrating with private, up-to-date company information. A standard Large Language Model (LLM) on its own often isn't enough for tasks demanding specific facts and deep domain knowledge. LLMs generate responses based solely on their training data. If they lack a direct answer, they might still attempt one, sometimes leading to convincing, yet fabricated, outputs (often referred to as hallucinations). For serious applications that demand reliable, grounded results, a smarter approach is essential.

Understanding RAG: The secret to accurate AI

This is where Retrieval Augmented Generation (RAG) comes in. Think of RAG as giving the LLM a set of specific notes to read before it answers a question. This makes its responses accurate and based on real data. The RAG process can be broadly divided into two main phases: Ingestion and Retrieval, both crucial for augmenting the LLM's knowledge.

RAG Ingestion Phase

This initial phase focuses on preparing your knowledge base.

Document Loading: Documents from various sources (PDFs, text files, database schemas, etc.) are loaded into the system.

Transformation (Chunking): Loaded documents are often too large to be processed all at once. They are broken down into smaller, manageable "chunks" or segments. This process might also involve cleaning or pre-processing the text.

Embedding: Each text chunk is then converted into a numerical representation called an embedding. This transformation is performed by an embedding model (e.g., a Sentence Transformer or a model like OpenAI's text-embedding-3-small). Embeddings capture the semantic meaning of the text, allowing for comparisons based on context rather than just keywords.

Persistence (Vector Database): These embeddings, along with references back to their original text chunks, are stored in a Vector Database (Vector DB). A Vector DB is optimized for efficiently storing and querying high-dimensional vectors, enabling fast similarity searches.

RAG Retrieval Phase

This phase occurs when a user submits a query to the Generative AI application.

Query Embedding: The user's input query is also converted into an embedding using the same embedding model that was used during the Ingestion phase. This ensures consistency in the vector space.

Similarity Search: The query embedding is then used to perform a similarity search within the Vector DB. The goal is to find the "closest" or most semantically similar text chunks (and their corresponding original content) to the user's query.

Contextual Augmentation: The relevant text chunks retrieved from the Vector DB serve as precise, up-to-date information. This retrieved context is then used to enrich the original user query, forming a new, more informed prompt.

LLM Interrogation: Finally, this augmented prompt (containing both the user's original request and the relevant retrieved context) is fed to the Large Language Model. The LLM then generates a factual and grounded response, drawing directly from the specific knowledge extracted from your knowledge base, significantly reducing hallucinations and increasing reliability.

Why RAG is a Game-Changer

RAG isn't just about accuracy; it solves critical problems:

Precision and Reliability: RAG ensures the LLM's answers are factually correct and you can trace them back to your trusted data. This significantly reduces the chance of hallucinations.

Cost Optimization: Giving the LLM the right context with RAG means you don't need super long, complex prompts. Getting data (like for embeddings) is usually much cheaper than writing huge prompts or paying for more LLM tokens.

Overcoming Token Limits: All LLMs have a maximum amount of text they can handle at once (their context window). RAG smartly gets around this by only giving the most relevant bits of information, instead of trying to stuff an entire knowledge base into the prompt. This lets LLMs work with much bigger and more complex data sets.

CrewAI: Orchestrating Smart Agents for flexible RAG

CrewAI is a powerful Python framework. It lets you create and manage teams of autonomous AI agents. With CrewAI, you can define specialized agents (like a "Researcher" or an "Analyst") with specific roles, goals, and tools. CrewAI is key for RAG because it allows these agents to work together and share tasks. This makes the RAG process strong and efficient. The "Tools" are the crucial link, connecting your agents to all your different knowledge sources. Flexibility is key: CrewAI lets you use both built-in RAG features that come with the framework (like PDFSearchTool and Knowledge) and custom tools you build yourself. This means you can connect to any specific data source you need, which is essential for real-world projects.

Implementing RAG with CrewAI: Three Practical Approaches

Here's how you can use CrewAI to implement RAG, showing its versatility.

Native RAG Tools

This is how CrewAI handles RAG natively using its built-in tools. The PDFSearchTool is a prime example, designed for semantic searches directly within PDF content. A CrewAI agent can use this out-of-the-box tool to efficiently find and retrieve specific passages based on a search query within a PDF document. By default, it uses OpenAI for both embeddings (to understand meaning) and summarization, though this can be customized. This automates knowledge extraction from your existing PDF archives directly within the CrewAI ecosystem.

Knowledge for Document-Based RAG

CrewAI offers a native "Knowledge" feature that allows agents and crews to directly access and utilize various external information sources. This acts as a built-in reference library for your agents. You can provide knowledge sources in formats like raw strings, .txt, .pdf, .csv, .xlsx, and .json documents by placing them in specified directories. CrewAI handles the storage (using ChromaDB by default) and embedding automatically. This approach empowers agents with direct access to a curated document set, ensuring their responses are grounded in your specific data.

Custom tool that uses Vector Search (FAISS in our case)

FAISS (Facebook AI Similarity Search) is great for very fast searches based on meaning, across huge sets of data (vector embeddings). Here, a custom tool lets CrewAI agents query a FAISS index. They find information that's semantically similar and feed it to the LLM. This provides fast and scalable semantic search for large, changing knowledge bases.

Real-World Example: Demonstrating RAG Flexibility with Multi-Crew Setup

In a real-world project, we implemented a system to analyze PDF documents. This setup, whose code is available in this repository: CrewAI-RAG-Sample, effectively demonstrated CrewAI's versatility.

├── db
│   ├── 455d85ec-a7f9-4c9c-82d8-d3567d9263aa
│   └── chroma.sqlite3
├── Easy_recipes.pdf
├── knowledge
│   └── Easy_recipes.pdf
├── LICENSE
├── main.py
├── Pipfile
├── Pipfile.lock
├── ragcrew
│   ├── config
│   │   ├── agents.yaml
│   │   └── tasks.yaml
│   ├── faiss_rag_crew.py
│   ├── pdf_knowledge_crew.py
│   ├── tool_rag_crew.py
│   └── tools
│       └── custom_tool.py
├── README.md
├── report.md
└── setup.txt

We achieved this by employing three distinct crews, executed one after the other, to reach the same analytical goal. What made each crew unique was the specific RAG approach it utilized, while the core agents and tasks remained consistent. This allowed for a direct comparison of RAG strategies in action.

Each of these three crews was designed with the same set of specialized agents, for instance, a PDF Researcher and a Content Analyst. Similarly, the tasks they performed were identical across all crews: first, Information Retrieval to locate and extract relevant data from the PDF based on a user query; then, Content Refinement to process and summarize the retrieved information for clarity and conciseness; and finally, Output Generation to format and save the final, refined content into a Markdown (.md) file.

The key differentiator lay in how each crew performed its Information research task declared within the tasks.yaml file. This task relies on pdf_researcher Agent.

research_task:
  description: >
    Search for specific information requested in the {topic}.
  expected_output: >
    Most relevant information about {topic},
  agent: pdf_researcher

content_task:
  description: >
    Review the context you got.
    Make sure the report is detailed and contains any and all relevant information.
  expected_output: >
    A fully fledged report with the main topics. The language to use is the language of the request "{topic}",

Let’s see how the three crews manage RAG different approaches.

ToolRagCrew: This crew leveraged CrewAI's out-of-the-box PDFSearchTool for its RAG mechanism. Agents directly queried PDF documents, showcasing a straightforward, native approach to unstructured document RAG. Here a Python code snippet for tool_rag_crew.py

    . . . 
    from crewai_tools import PDFSearchTool
    . . .
    @CrewBase
    class ToolRagCrew:
        agents: List[BaseAgent]
        tasks: List[Task]
        . . .
        def getPDFRagTool(self):
            if self.pdf_tool is None:
                self.pdf_tool = PDFSearchTool(pdf=self.pdf_path,
                                              #chunker=dict(chunk_size=2000,chunk_overlap=50)
                                              )
            return self.pdf_tool
        @agent
        def pdf_researcher(self) -> Agent:
            return Agent(
                config=self.agents_config['pdf_researcher'],
                verbose=True,
                tools=[self.getPDFRagTool()],
            )
        . . .

How you can see, a built in PDFSearchTool is configured as a tool for the PDF Researcher Agent

PDFKnowledgeCrew: Here, the crew utilized CrewAI's native "Knowledge" feature. PDF documents were pre-loaded into the agents' knowledge base by placing them in a designated directory, allowing agents to access and integrate this curated information during their tasks. Here a Python code snippet for pdf_knowledge_crew.py

    . . . 
    from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
    . . .
    @CrewBase
    class PDFKnowledgeCrew:
        agents: List[BaseAgent]
        tasks: List[Task]
        . . .
        def get_knowledge_sources(self):
            if self.pdf_source is None:
                self.pdf_source = PDFKnowledgeSource(file_paths=self.pdf_paths)
            return self.pdf_source

        @agent
        def pdf_researcher(self) -> Agent:
            return Agent(
                config=self.agents_config['pdf_researcher'],  # type: ignore[index]
                verbose=True,
                knowledge_sources=[self.get_knowledge_sources()]
            )
        . . .

How you can see, it is very simple to use Knowkedge feature with Crew. You have to use the attribute knowledge_sources within the Agent declaration. It accepts an array of instances of PDFKnowledgeSource. That’s it!

FAISSRagCrew: For this crew, a custom tool integrated with FAISS was implemented. This represented a more scalable solution for large datasets, where agents used this custom tool to perform semantic searches on vector embeddings derived from the PDF content. I’ve used FAISS (Facebook AI Similarity Search) because this library is very useful for similarity search and clustering of vectors. I also wanted to share how CrewAI framework is “opened” for integration with 3rd party libraries and tools. Here a Python code snippet for faiss_rag_crew.py

    . . .
    import pdfplumber
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from langchain_community.vectorstores import FAISS
    from openai import OpenAI
    from typing import List
    from langchain_openai import OpenAIEmbeddings  
    from ragcrew.tools.custom_tool import PDFFAISSTool

    @CrewBase
    class FAISSRagCrew:

        agents: List[BaseAgent]
        tasks: List[Task]

        def __init__(self,pdf_path):
            self.pdf_path=pdf_path
            self.vector_store= None
            self.client=OpenAI()
            self.search_tool=None

        def get_openai_embedding(self,text):
            response = self.client.embeddings.create(
                input=text,
                model="text-embedding-3-small"
            )
            return response.data[0].embedding

        def load_pdf(self):
            with pdfplumber.open(self.pdf_path) as pdf:
                return " ".join(page.extract_text() for page in pdf.pages)

        def prepare_rag(self,pdf_text):
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=2000,
                chunk_overlap=200
            )
            chunks = text_splitter.split_text(pdf_text)
            return FAISS.from_texts(chunks, OpenAIEmbeddings())

        def initFAISS(self):
            pdf_text = self.load_pdf()
            self.vector_store = self.prepare_rag(pdf_text)
            self.search_tool = PDFFAISSTool(self.vector_store)

        @agent
        def pdf_researcher(self) -> Agent:
            self.initFAISS()
            return Agent(
                config=self.agents_config['pdf_researcher'],  # type: ignore[index]
                verbose=True,
                tools=[self.search_tool]
            )
        . . .

How you can see, within the Agent declaration the initFAISS() method is invoked. It executes the ingestion phase of the RAG process: loading of the PDF file, persisting embedded data into the vector DB, and initialization of the PDFFAISSTool which is a custom tool defined in the project.

This multi-crew sample effectively highlights CrewAI's ability to maintain consistent operational logic (same agents, same tasks) while allowing for flexible and interchangeable RAG strategies.

Conclusion: Takeaways of the reading

In this article, we looked at different ways to set up Retrieval-Augmented Generation (RAG) within a CrewAI agent system. We saw how each method: using a special PDFSearchTool, the built-in Knowledge feature of CrewAI, and making a Custom Tool with FAISS, offers different ways to help our agents find and use outside information. We learned that all three methods aim to give Large Language Models (LLMs) the right information to create accurate answers and avoid making things up. But they are different in their purpose, how the agent uses the data, and how much control you have. What is clear is that CrewAI is a capable framework for building systems with many agents. Alongside other agent frameworks like LangChain, AutoGen, or LlamaIndex, CrewAI provides a valuable alternative. It is easy to use and focuses on how agents work together, what their jobs are, and what they want to achieve.

DEV Community