Hemant Jawale

Posted on Nov 17

Achieve RAG in 15 Minutes: A Step-by-Step Agentforce Data Library Tutorial

#salesforce #agentforce #rag #ai

Generative AI is a revolution, but let's be honest: out-of-the-box Large Language Models (LLMs) are "book smart," not "company smart." They can write a sonnet about a sales process, but they don't know your sales process, your specific product SKUs, or your internal HR policies. This gap leads to generic answers and, worse, confident-sounding "hallucinations".

The solution to this is Retrieval-Augmented Generation (RAG). RAG is the technique of "grounding" an LLM by feeding it your own trusted, proprietary data before it answers a question. This ensures the AI provides relevant, accurate, and safe responses based on your company's truth.

Until recently, building a RAG pipeline was a massive data engineering project. You had to manually connect data sources, "chunk" (split) the text, create vector embeddings, store them in a vector database, and build a custom retriever.

This post will show you how to build a production-ready RAG solution in Salesforce in about 15 minutes, with zero code. We're going to use the Agentforce Data Library (ADL) from Salesforce to build an AI agent that can instantly and accurately answer questions from a PDF document.

Ready to build? Let's get started.

What is RAG, and Why is it Suddenly So Easy in Salesforce?

First, let's demystify the magic.
Retrieval-Augmented Generation (RAG) is a simple, two-step process

Retrieve: When you ask a question (e.g., "What's our international travel policy?"), the system first retrieves relevant snippets of information from your private data (like your company's employee handbook PDF).
Augment: It then augments the LLM's prompt, stuffing that retrieved context in alongside your original question.

The prompt to the AI effectively becomes: "Using only the following information: ' ', answer this question: What's our international travel policy?" This forces the LLM to base its answer on your data, not its general, pre-trained knowledge.

So, what is the Agentforce Data Library (ADL)?

The ADL is the low-code "easy button" that Salesforce has built for RAG. It's a deliberate abstraction layer that hides all the complex data engineering.

The ADL is powered by Data 360 (formerly Data Cloud). When you create a new Data Library and point it at a data source (like an uploaded PDF), you are unknowingly triggering a complex automated pipeline. In the background, Data 360 automatically:

Ingests the content
Chunks the text into small, searchable pieces
Vectorizes those chunks (converts them into numeric representations)
Indexes them in a vector store, creating a search index and a retrieve

In effect, this becomes the fastest way to set up your RAG solution. It automates all the components like data streams, objects, vector store, search index, retriever, prompt template, and even the agent action that you would otherwise have to build manually.

The Goal: Build a PDF-Based Q&A Agent

Let's define our simple use case.
Our Mission: We will build an Agentforce Service Agent.
The Task: This agent's job is to answer one common question: "What is the policy for international travel?"
The Test: We will create one PDF document on our computer that contains our specific, fictional international travel policy.
The Result: When we ask the agent, "How can I book international travel?", it will bypass generic LLM answers and give us the exact policy from our PDF, citing its source.

The Build: A Step-by-Step Hands-On Tutorial

This is the full, end-to-end process. Follow along, and you'll have a working RAG agent in minutes.

Part 1: Get Setup

Make sure you have a Salesforce instance that has Agentforce licenses. Set Up Einstein Generative AI and Agentforce

Part 2: Create Your Source of Truth (The PDF)

The RAG agent needs trusted data to "read" from. Let's create it.

Open any text editor (like Notepad, TextEdit, Microsoft Word, or Google Docs).

In the blank document, paste this exact text:

Official International Travel Policy:

All international travel requires managerial approval at least 30 days in advance. Employees must book flights and hotels through the company's official portal, 'ConcurTravel'. Use of personal credit cards for international bookings is not permitted and will not be reimbursed. For visa requirements, contact the HR Operations team at least 6 weeks before your travel date.

Use the "Save As" or "Export" function of your editor to save this file to your desktop.

Name the file: Corporate_Travel_Policy.pdf.

Pro Tip: The Agentforce Data Library can handle PDF files up to 100 MB. It also supports .txt and .html files (up to 4 MB).

Part 3: The "RAG" Step – Create the Agentforce Data Library

Now we'll build a RAG pipeline by merely uploading our file.

*Step 1: Create an Agent *

Go to Setup > Agentforce Agents. Click "+ New Agent".
You can select a template or Create with Gen AI. I chose the "Agentforce Service Agent"
Select the default topics. Make sure to select the "General FAQ" topic which ensures our question is routed correctly.
Add details in the Customize your agent page.
For the "Select data sources (Optional)" keep it blank for now and hit the "Create" button.

Step 2: Navigate to the Data Library
When you are in the Agent Builder view of the Agent you just created, click in to the "Data" subtab on the left panel.

Step 3: Leverage an Existing Library or Create Your New Library
If you are using a Service Agent, it should create a library by default called "Agentforce Service Agent Library". You can click into that or create a new library if you prefer. I selected the default one.

Step 4: Choose Your Data Source (The Key Moment)
The system will now ask you to choose a data source.

Select "Files" for the Data Type and keep the default Data Space
Under the "Add Files Data" click **"Upload Files" **button and upload select the Corporate_Travel_Policy.pdf file you just created earlier.
Click Done. You will see your file in the list.

Pro Tip: An agent feature (like the Q&A action) can only use one data library at a time. By selecting our library here, we are telling the "General FAQ" topic to get its answers only from our new Knowledge index.

It should look something like this:

Step 5: Wait for Indexing
You will see the status "In progress" next to your file. Wait for this status to change to "Ready". This is the indexing process in action.

What's Happening Now? The moment you uploaded that file, you kicked off the "Auto-RAG" pipeline. In the background, Data 360 is reading your PDF, breaking it into chunks, and building a search index.

Pro Tip: This indexing is not instantaneous. The first time you create a library and upload a file, the initial indexing can take 15 to 30 minutes. If you test your agent too soon, it will fail.

Once the uploaded file's status updates to Ready, click the Activate button on the top right in the Agent Builder to make your agent live.

Part 4: The "Moment of Truth": Test in the Builder

Let's see it in action.

In the test chat window on the right type in your question prompt:
**How can I book international travel?**

If it worked: You won't get a generic "To book travel, you typically..." answer. You will get the specific, grounded text from your PDF that includes information about the use of ConcurTravel and requirements around Manager Approval, Payment, and Visa Requirements.

If it failed: It will say it can't help or give a generic, non-grounded answer. Troubleshooting: If it failed, 99% of the time it's one of these three things

You didn't publish the Knowledge article. It's still a Draft. Go back and publish it.
You didn't wait long enough. The initial Data 360 indexing is still in progress. Wait another 15 minutes and try again.
Permissions. The Answer Questions with Knowledge action respects all Salesforce sharing settings. The Agent User (or your test user) must have permission to see the Knowledge article.

How Did That Work? A Look Under the Hood

For the technical folks in the room, let's break down what actually just happened.

Agentforce is an orchestrator (the "brain") that uses Actions (the "hands") to do work.

Here is the data flow you just triggered :

Query: You typed: "How can I book international travel?"
Intent Classification: The Atlas Reasoning Engine analyzed your query and matched it to the "General FAQ" Topic.
Action Invocation: The "General FAQ" Topic is pre-configured to invoke the standard "Answer Questions with Knowledge action". (Yes, it's called "with Knowledge" even when using files!).
RAG Retrieval: Because we assigned our Data Library to the agent, the AnswerQuestionsWithKnowledge action automatically used its ADL Retriever.
Data 360 Search: The ADL Retriever queried the Data 360 Search Index (which contains the "chunks" from our PDF) and found the most relevant text.
Prompt Augmentation: The system built an augmented prompt for the LLM, which looked something like: "Context: [...All international travel requires managerial approval...] Question: How can I book international travel?".
Grounded Response: The LLM generated the answer based only on that context and returned the knowledgeSummary, which you saw in the chat window.

You didn't write a single line of code, but you just executed a complex, end-to-end Agentic RAG pattern.

The ROI: From Weeks to Minutes : This is why the ADL is a game-changer

Productivity Gains

Building a custom RAG pipeline that includes ingestion, chunking, vector DB, retriever, LLM integration is a multi-week project for a team of data engineers and developers. You just did it in 15 minutes!

The Business Value

Stop Hallucinations: Your AI is now grounded in your trusted data, making it safe for employees and customers.
Instant Case Deflection: This agent can now handle all common FAQs, deflecting simple support tickets.
Trust and Accuracy: The AI can now provide accurate, specific, and trustworthy answers.

Future Enhancements: Where to Go From Here

This was the "Hello, World!" of RAG in Salesforce. Now you can get serious.

Use Salesforce Knowledge

The process is identical. Create a new ADL, but this time select "Salesforce Knowledge" as the data source. Now your agent can read both your PDF library and your Knowledge Base (by assigning the new library to a different topic).

Use Web Search

Create another ADL and select "Web Search". Now your agent can answer questions with live, up-to-the-minute data from the internet.

Customizations

The Answer Questions with Knowledge action is just one of many actions. You can have your agent call a Salesforce Flow or an Apex Class. For example, the agent could understand "I need to book international travel," and instead of telling you how, it could launch a Flow that starts the approval process.

You now have a RAG-powered AI agent running securely on your private Salesforce data. This is the new frontier of automation, and you've just seen how fast it is to get started.

What's the first RAG-powered agent you're going to build? Share your use cases in the comments below!

Checkout my other posts:

DEV Community