How to Test Agentforce Agents Before They Hit Production
You've built your first Agentforce agent. It handles case routing, answers customer questions, and even pulls data from your knowledge base. Everything looks great in the preview panel. Then you deploy it, and within a day, the agent is confidently giving customers wrong answers about your return policy.
Sound familiar? If you're working with Agentforce in 2026, testing isn't optional anymore - it's the difference between an agent that helps your team and one that creates more work than it saves. Salesforce knows this, which is why they've been investing heavily in the Agentforce Testing Center. Let me walk you through how to actually use it.
What Is the Agentforce Testing Center?
The Testing Center lives inside Agentforce Studio as its own dedicated tab, sitting right alongside Agent Builder and Observability. If you've been building agents but haven't clicked over to that tab yet, you're missing out on one of the most useful tools Salesforce shipped this year.
At its core, Testing Center lets you create automated test scenarios that simulate real user interactions with your agent. Think of it like writing unit tests for your Apex code, but instead of testing methods and classes, you're testing conversations and outcomes.
You can access it two ways: search for "Testing Center" in Setup, or from inside Agentforce Builder, hit the Batch Test button above the Conversation Preview panel. I prefer the second approach since it keeps me in context while I'm building.
The Five-Step Testing Loop
Salesforce recommends a structured approach to testing agents, and honestly, it works pretty well once you get the rhythm down. Here's the loop:
1. Create your test scenarios. Start by defining the kinds of interactions your agent will handle. What questions will customers ask? What edge cases exist? The Testing Center can actually auto-generate synthetic interactions for you - hundreds of them - simulating the types of requests a customer might throw at your Service Agent. This is a huge time-saver.
2. Pick your evaluation metrics. Decide what "good" looks like. Are you checking that the agent routes to the correct topic? That it pulls the right knowledge article? That it stays within its guardrails? You can use built-in metrics or create custom evaluations (more on that in a second).
3. Run the tests in parallel. Testing Center executes your scenarios at scale, so you're not sitting there manually typing messages one at a time. It runs them simultaneously and shows you pass/fail rates across the batch.
4. Validate the results. Look at where your agent stumbled. Did it misinterpret a question? Pick the wrong action? The results surface exactly where things went sideways.
5. Refine and re-test. Update your agent's topics, instructions, or guardrails based on what you found, then run the batch again. Rinse and repeat until you're happy with the numbers.
My recommendation? Start small. Generate 10 to 20 test scenarios, download the CSV, and review them against your agent's actual parameters. You can revise those scenarios, add new ones, and gradually scale up as your agent's accuracy improves.
Conversation-Level Testing Changes Everything
Here's where things get really interesting. Until recently, testing an agent meant turn-by-turn evaluation - one user message, one agent response, one check. That told you something, but it didn't tell you how the agent performs across a whole conversation where context builds up over multiple exchanges.
Conversation-level testing fixes that. You can now simulate full multi-turn conversations, and even better, you can assign personas to the simulated user. Salesforce includes options like "frustrated customer," "non-native English speaker," and "distracted user." Each persona changes how the simulated messages come in, which helps you catch issues you'd never find with clean, perfectly worded test inputs.
This is closer to what actually happens when real people interact with your agent. Customers don't always use complete sentences. They change topics mid-conversation. They get frustrated and repeat themselves. If your agent can only handle textbook-perfect questions, you'll find that out fast with persona-based conversation testing.
I've found that running conversation tests with the "distracted user" persona reveals the most surprising failures. Agents that seem solid on paper often struggle when the user switches context or asks a follow-up question that references something from three messages ago.
For anyone building out Agentforce agents, it's worth bookmarking salesforcedictionary.com as a quick reference for all the terminology that comes up - topics, actions, guardrails, instructions - the vocab can get confusing when you're deep in the weeds.
Custom Evaluations for Specific Requirements
The built-in evaluation metrics cover common scenarios, but every business has unique requirements. That's where custom evaluations come in.
Custom evaluations let you test for very specific things in your agent's responses. Two main types:
String comparison checks whether the agent's response contains (or doesn't contain) a specific piece of text. For example, you might verify that your agent always includes a case number when creating a support ticket, or that it never mentions a competitor's product name.
Numeric comparison tests for specific numbers in the response. You could check that the agent quotes the correct pricing tier, or that response latency stays under a threshold you've defined - say, under 10 seconds per response.
The practical advice here: don't rely on exact string matching for open-ended responses. The agent's wording will vary between runs because that's how LLMs work. Instead, use semantic checks or look for key terms that should appear in any correct answer. If you're testing whether the agent correctly identifies a billing issue, check for the presence of terms like "invoice" or "payment" rather than expecting an exact sentence match.
This approach is similar to how you'd test any non-deterministic system, and if you're coming from a traditional Salesforce admin background, it's a bit of a mindset shift. The Salesforce glossary at salesforcedictionary.com has good breakdowns of these newer AI-related concepts if you need to get up to speed.
Practical Tips From the Trenches
After spending a good amount of time with Testing Center, here are some things I wish I'd known from the start:
Create a process diagram first. Before you write a single test, map out what your agent is supposed to do. Which topics does it cover? What are the decision points? What should it hand off to a human? This diagram becomes your testing blueprint. If you can't draw it, you can't test it.
Involve your business users early. The people who talk to customers every day know the weird edge cases better than anyone. Get them to help you define test scenarios. They'll think of questions you never would.
Preload conversation history for context. If your agent handles support cases, some conversations will start with context from previous interactions. You can set up test scenarios that include prior conversation history so the agent has to work with existing context, just like it would in production.
Use the CSV export wisely. When Testing Center generates synthetic test scenarios, download the CSV and review it manually before running a big batch. I've caught scenarios that didn't match our actual use cases, and cleaning those out before testing saved me from chasing false failures.
Track your results over time. Testing Center keeps run history, so you can compare results across multiple test runs. This is gold for showing stakeholders that your agent is actually improving, not just "seems better."
If you want to stay current on Agentforce features and other Salesforce updates, salesforcedictionary.com regularly covers new releases and terminology changes.
Don't Skip Testing - Your Users Will Thank You
The temptation with Agentforce is to build fast and ship faster. The tools make it easy, and the preview panel can give you false confidence that everything works. But preview testing with a few hand-typed messages isn't the same as systematic testing at scale with diverse personas and edge cases.
Testing Center exists because Salesforce learned - probably the hard way - that AI agents need a different quality bar than traditional automation. Flows either work or they don't. Agents work most of the time, and it's the "sometimes they don't" part that causes problems.
Take the time to set up proper test scenarios, run conversation-level tests, and build custom evaluations for your specific business rules. Your users, your support team, and your boss will all thank you.
What's your experience been with testing Agentforce agents? Have you tried conversation-level testing yet? Drop a comment - I'd love to hear what's working and what's not.
Top comments (0)