Picture a four-agent pipeline left running over a long weekend. Two of the agents start calling each other in a loop, Analyzer calling Verifier, Verifier calling Analyzer, because their stopping conditions are vague and their budget enforcement is asynchronous. Nobody is watching the meter. By the time anyone notices on Monday morning, the bill is well into five figures.
The tools existed. The agents existed. The pattern was wrong.
Multi-agent orchestration isn't hard because the API is complicated. It's hard because picking the right pattern matters more than writing the right code. The Laravel AI SDK, and specifically the sub-agent feature added on May 12, 2026, gives you the PHP-native primitives for each pattern. But it doesn't tell you when to reach for which one, what the costs look like under the hood, or what breaks in production that doesn't break in a demo.
That's what this article is for.
How Prism's Agentic Loop Works Under the Hood
Before you can understand sub-agents, you need to understand the loop they run inside.
When you call ->prompt() on a Laravel AI agent (or ->asText() on a Prism request), what you get back isn't necessarily the response from the first API call. If you've configured #[MaxSteps(10)] on your agent class, the SDK will loop.
Here's what that loop looks like at each cycle:
- The model receives your prompt and system instructions.
- If the model decides to call a tool, it returns a
tool_useblock instead of a final text response. - Prism executes the tool, or all concurrent tools in parallel if you've marked them
.concurrent(). - Prism appends the result as a
ToolResultMessageto the conversation. - The full conversation (original prompt, tool calls, and tool results) is re-sent to the model.
- Repeat until the model returns a text response, or you hit
MaxSteps.
In Prism directly:
use Prism\Prism\Prism;
use Prism\Prism\Enums\Provider;
use Prism\Prism\Facades\Tool;
$searchTool = Tool::as('search_docs')
->for('Search the documentation for a PHP topic')
->withStringParameter('query', 'The search query')
->using(fn(string $query) => $this->searchDocs($query));
$response = Prism::text()
->using(Provider::Anthropic, 'claude-sonnet-4-6')
->withMaxSteps(5)
->withPrompt('What are PHP 8.4 property hooks?')
->withTools([$searchTool])
->asText();
foreach ($response->steps as $step) {
// Each step: one model call + all its tool executions + all results
}
Two things to understand here. First, withMaxSteps(5) is the total number of request-response cycles, not the number of tool calls. Each step includes one model call, all its tool calls, and all the results from those tools. Second, the growing conversation history is re-sent on every step. By step 5, the model is receiving the full thread of every tool call and every result from the previous four steps. Your token count doesn't grow linearly. It compounds with each step.
The Laravel AI SDK's class-based agents use the same loop, controlled by PHP attributes:
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Timeout;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Enums\Lab;
#[Provider(Lab::Anthropic)]
#[Model('claude-sonnet-4-6')]
#[MaxSteps(5)]
#[MaxTokens(2048)]
#[Timeout(60)]
class DocumentationAgent implements Agent, HasTools
{
public function instructions(): string
{
return 'You are a PHP documentation assistant. Search and synthesize answers from the docs.';
}
public function tools(): iterable
{
return [new SearchDocsTool, new FetchPageTool];
}
}
That loop is the foundation of every agent in this SDK. Sub-agents are just a way to delegate one step of it to another agent, one with its own instructions, its own tools, and its own context.
Turning Agents Into Tools
The sub-agent feature is deceptively simple: you return an agent class instance from another agent's tools() method.
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Enums\Lab;
#[Provider(Lab::Anthropic)]
#[Model('claude-opus-4-8')]
#[MaxSteps(10)]
class SupportOrchestratorAgent implements Agent, HasTools
{
public function instructions(): string
{
return <<<PROMPT
You are a customer support orchestrator. Analyze the customer's request and
delegate it to the appropriate specialist agent. Always call the right specialist
before composing your final response to the customer.
PROMPT;
}
public function tools(): iterable
{
return [
new RefundsAgent,
new ShippingAgent,
new BillingAgent,
];
}
}
The orchestrator receives a customer message. The model reads the instructions and the tool definitions, including the name and description of each sub-agent, and decides which specialist to invoke. The sub-agent runs, returns its response, and that response comes back to the orchestrator as a tool result. The orchestrator incorporates it into its final reply.
For the SDK to know what to call the sub-agent and when to call it, the sub-agent implements CanActAsTool:
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\CanActAsTool;
use Laravel\Ai\Contracts\HasTools;
use Laravel\Ai\Enums\Lab;
#[Provider(Lab::Anthropic)]
#[Model('claude-haiku-4-5-20251001')]
#[MaxSteps(3)]
class RefundsAgent implements Agent, HasTools, CanActAsTool
{
public function name(): string
{
return 'process_refund';
}
public function description(): string
{
return 'Handles refund requests. Call with the order ID and the reason for the refund. Returns confirmation or a list of issues preventing the refund.';
}
public function instructions(): string
{
return 'You are a refunds specialist. Check order eligibility, calculate refund amounts, and process or decline the request with a clear explanation.';
}
public function tools(): iterable
{
return [new LookupOrderTool, new ProcessRefundTool, new CheckEligibilityTool];
}
}
The name() and description() methods are what the orchestrator's model sees in its tool list. This is where most developers get tripped up: if the description is vague ("Handles refunds"), the model won't know what input to pass or what to expect back. The description should read like a well-written docstring: what does this agent take, what does it return, when should you call it instead of a sibling tool.
There's one design decision that catches people off guard: a sub-agent doesn't receive the parent's conversation history. When the orchestrator calls RefundsAgent, it gets a clean context. Whatever the customer said, whatever the orchestrator has already learned, none of that flows down automatically. The orchestrator must extract the relevant context and pass it as a self-contained task description when it calls the sub-agent.
This is intentional. If the orchestrator has accumulated 20,000 tokens of conversation history and calls three sub-agents, blindly passing the full history to each one triples your context spend before a single worker has done any work. The isolation keeps costs predictable. But it shifts responsibility: your orchestrator's prompts need to be explicit about what it's handing off.
Each sub-agent is fully independent in terms of configuration. You can pin RefundsAgent to Anthropic while the orchestrator runs on OpenAI. You can give BillingAgent a generous #[MaxSteps(5)] for complex lookups while keeping ShippingAgent at #[MaxSteps(2)] for simple queries. The settings don't inherit and don't interfere.
The Orchestrator-Workers Pattern
The orchestrator-workers pattern is what you reach for when the task's execution path isn't known upfront.
A customer support ticket might require checking order status, processing a partial refund, escalating to a human, or all three. You won't know which until you've read the ticket. A static code path can't handle that. A hardcoded match ($type) chain falls apart the moment requests get ambiguous or require multiple steps. The orchestrator-workers pattern delegates both the routing decision and the sequencing decision to the model.
The orchestrator gets a high-level goal. It has access to a set of worker agents as tools. It plans and executes: "Call LookupOrderTool first, then based on the result, call RefundsAgent or escalate." The workers don't know the plan. They each receive a task, execute it, and return a result.
Here's a complete orchestrator for a code review workflow:
#[Provider(Lab::Anthropic)]
#[Model('claude-opus-4-8')]
#[MaxSteps(8)]
#[MaxTokens(4096)]
class CodeReviewOrchestratorAgent implements Agent, HasTools
{
public function instructions(): string
{
return <<<PROMPT
You are a code review orchestrator. When given a pull request diff:
1. Run security, performance, and style reviews.
2. Synthesize the findings into a structured report.
3. Flag critical security issues separately from style suggestions.
PROMPT;
}
public function tools(): iterable
{
return [
new SecurityReviewAgent,
new PerformanceReviewAgent,
new StyleReviewAgent,
];
}
}
Each worker is its own agent class, pinned to the model that fits its job:
#[Provider(Lab::Anthropic)]
#[Model('claude-haiku-4-5-20251001')]
#[MaxSteps(2)]
class SecurityReviewAgent implements Agent, CanActAsTool
{
public function name(): string
{
return 'review_security';
}
public function description(): string
{
return 'Review the provided code diff for security vulnerabilities: SQL injection, auth bypasses, unsafe deserialization, exposed secrets. Return a list of issues with severity and line numbers.';
}
public function instructions(): string
{
return 'You are a security code reviewer. Focus only on security issues. Be specific about file names and line numbers. Do not comment on style or performance.';
}
public function tools(): iterable
{
return [];
}
}
Haiku on each worker, Opus on the orchestrator: the workers do narrow, focused tasks; the orchestrator does the planning and synthesis. That model hierarchy matters for cost. Opus is roughly 5x more expensive per token than Haiku. Running the workers on Haiku and reserving Opus for the orchestrator keeps the expensive model doing only the work that needs it.
When the three reviews are independent and you want them to run in parallel rather than waiting for the orchestrator to sequence them, you can pull the parallelization out of the agentic loop and into your PHP code directly:
use Illuminate\Support\Facades\Concurrency;
[$security, $performance, $style] = Concurrency::run([
fn() => (new SecurityReviewAgent)->prompt($diff),
fn() => (new PerformanceReviewAgent)->prompt($diff),
fn() => (new StyleReviewAgent)->prompt($diff),
]);
$report = (new SynthesisAgent)->prompt(
"Security:\n{$security}\n\nPerformance:\n{$performance}\n\nStyle:\n{$style}"
);
This is faster for static, pre-planned workflows. The orchestrator-workers pattern, where the orchestrator calls workers via tools, is for when the plan itself needs to be computed at runtime.
The Evaluator-Optimizer Pattern
The evaluator-optimizer pattern solves a different problem. Instead of asking "which agent should handle this?" it asks "is this output good enough to use?"
The structure is a loop: generate something, evaluate it against explicit criteria, and if it doesn't pass, regenerate with the evaluator's specific feedback. Repeat until it passes or you hit a maximum. It works well for tasks with measurable quality: code that must pass tests, content that must meet editorial standards, translations that must preserve technical accuracy. It falls apart when your criteria are vague or when iterating doesn't actually improve the output.
A content generation loop with structured evaluation:
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Contracts\HasStructuredOutput;
use Laravel\Ai\Schema\JsonSchema;
#[Provider(Lab::Anthropic)]
#[Model('claude-sonnet-4-6')]
#[MaxSteps(3)]
class TechnicalWriterAgent implements Agent, HasStructuredOutput
{
public function instructions(): string
{
return 'Write a clear, accurate technical explanation of the given topic. Use concrete code examples. Avoid jargon.';
}
public function schema(JsonSchema $schema): array
{
return [
'content' => $schema->string()->required(),
'title' => $schema->string()->required(),
];
}
}
#[Provider(Lab::Anthropic)]
#[Model('claude-haiku-4-5-20251001')]
#[MaxSteps(1)]
class EditorialEvaluatorAgent implements Agent, HasStructuredOutput
{
public function instructions(): string
{
return <<<PROMPT
You are an editorial reviewer for a technical blog. Evaluate submissions against these criteria:
- Technical accuracy: no invented APIs, no incorrect syntax
- Clarity: a senior engineer unfamiliar with the topic can follow it
- Concrete examples: at least one code example per major concept
If any criterion fails, list the specific, actionable issues on separate lines.
PROMPT;
}
public function schema(JsonSchema $schema): array
{
return [
'approved' => $schema->boolean()->required(),
'score' => $schema->integer()->min(1)->max(10)->required(),
'issues' => $schema->string()->required(),
];
}
}
And the control loop:
$writer = new TechnicalWriterAgent;
$evaluator = new EditorialEvaluatorAgent;
$draft = $writer->prompt('Explain PHP 8.4 property hooks.');
$maxIterations = 3;
$iteration = 0;
while ($iteration < $maxIterations) {
$evaluation = $evaluator->prompt(
"Evaluate this article draft:\n\n{$draft['content']}"
);
if ($evaluation['approved']) {
break;
}
$draft = $writer->prompt(
"Rewrite this article, fixing these issues:\n{$evaluation['issues']}\n\nOriginal:\n{$draft['content']}"
);
$iteration++;
}
A few things make this work. The evaluator returns structured output: approved: bool gives a clean exit condition, and issues: string gives the writer actionable feedback rather than vague criticism. The max iterations cap prevents runaway loops even if the evaluator never approves. The evaluator runs on Haiku (#[MaxSteps(1)]) because scoring doesn't require the same reasoning depth as writing.
The failure mode is vague evaluation criteria. If your evaluator's instructions say "evaluate whether it's good," you get feedback like "needs improvement," which produces a slightly different but equally mediocre second draft. Every quality criterion needs to be falsifiable: "at least one code example per major concept" is falsifiable; "clear and engaging" is not. Vague criteria are the most common reason evaluator-optimizer loops spiral: they iterate without converging.
Orchestrator-Workers vs Evaluator-Optimizer: When to Use Which
The two patterns solve different problems. Picking the wrong one hurts quality and cost simultaneously.
Use orchestrator-workers when the execution path is unknown upfront and the model needs to plan it. The customer support example is the textbook case: different inputs route to different specialists in an order you can't determine statically. Also use it when your subtasks have genuinely different requirements (different models, different instruction sets, different tool access) and one fat agent with fifty tools would cause more problems than it solves (more on that shortly).
Use evaluator-optimizer when you have a single task with measurable quality criteria and the task benefits from iteration. Writing and translation improve with feedback loops. Code generation improves when the evaluator can check that the output actually compiles or passes tests. If you can write three falsifiable acceptance criteria before you run the loop, evaluator-optimizer is worth trying. If you can't, the loop will iterate without converging.
Where they complement each other: An orchestrator-workers setup can include evaluator-optimizer loops inside individual workers. The orchestrator routes the refund request to RefundsAgent; internally, RefundsAgent uses an evaluator loop to verify its eligibility check before returning. Patterns compose.
Where they diverge: Orchestrator-workers scales horizontally: more specialized workers, more routing paths, more complex tasks. Evaluator-optimizer scales vertically: more refinement passes, higher quality per output. The cost structures are different: orchestrator-workers multiplies by the number of workers called per request; evaluator-optimizer multiplies by the number of iterations until approval. If your evaluator rejects 80% of first drafts and your loop runs up to three times, you're paying for 2.6x the generation cost on average.
One practical guide for choosing: if you can draw the complete flowchart before running the system ("first call A, then call B, then call C"), you don't need model-driven orchestration at all. Just use Concurrency::run() for parallel tasks or Laravel's Pipeline for sequential ones. Reserve the orchestrator-workers pattern for workflows where the model's judgment about which step to take next is actually the hard part.
The Non-Obvious Costs
That runaway loop from the opening happened because the agents had budget alerts, not budget enforcement. There's a gap between those two things, and during that gap, the meter runs.
In the Laravel AI SDK, you have three hard limits per agent class:
#[MaxSteps(5)] // caps the agentic loop at 5 request-response cycles
#[MaxTokens(2048)] // caps generation length per model call
#[Timeout(60)] // terminates requests that stall
But here's the footgun: none of these prevent cost multiplication from sub-agent calls. If your orchestrator has #[MaxSteps(10)] and calls three sub-agents per step, and each sub-agent has #[MaxSteps(3)], you're looking at up to 10 x 3 x 3 = 90 model calls in the worst case, each billed as a separate API call. The orchestrator's #[MaxTokens] caps its own generation, not the sub-agents' tokens. Every sub-agent invocation is an independent billing event.
Taylor Otwell put the cost plainly when announcing the SDK: "You shouldn't have like 50, 60, 70 tools exposed to the LLM because you get what's called context bloat. You also have to send all of the tool definitions, what they do, on every message." Every tool definition, including the name and description of every sub-agent returned from tools(), consumes tokens on every request, regardless of whether that tool gets called. An orchestrator with ten sub-agents in its tools() sends all ten descriptions to the model at every step of the loop.
The practical limit before this becomes expensive sits around 10-15 tools per agent, depending on description length. If you're routing between 20+ specialists, a two-level hierarchy keeps it manageable: a dispatcher that routes to one of three category orchestrators, each of which exposes its own narrower set of specialists.
Three more constraints worth knowing before you commit to this stack:
PHP 8.3 and Laravel 12 are required. The laravel/ai package is a hard minimum. No backward compatibility with Laravel 10 or 11. If your app hasn't upgraded, you're working directly with Prism and building the agent loop manually, which is doable, but you lose the agent class abstractions, CanActAsTool, RemembersConversations, and the testing fakes.
Vector stores require PostgreSQL. If you're building RAG into a sub-agent (embedding search before responding), you need PostgreSQL with the pgvector extension. The SDK has no MySQL vector support out of the box.
Long-running loops block HTTP workers. PHP's synchronous execution model means a 10-step orchestrator loop ties up a worker process for its full duration. For anything non-trivial, use ->queue() instead of ->prompt() to push the agentic work to the queue. For crash-safe execution with mid-loop checkpoint recovery, the community-built Laravel Workflow package can wrap your orchestration so a server restart doesn't mean re-billing for completed steps.
When This Belongs in PHP
Before the laravel/ai package launched in February 2026, PHP developers building AI orchestration had three options: learn Python, call a Python microservice over HTTP, or accept that their agent layer would be flat tool-calling with no real delegation. The SDK closes most of that gap for web-application use cases.
Where PHP orchestration wins is straightforward: your existing business logic, Eloquent models, queues, events, and auth are already in Laravel. A Python sidecar means duplicating or exposing all of that over an internal HTTP API: two codebases, two deploy pipelines, two places where your data access logic can drift. In PHP, your sub-agents have direct access to your models, your queues, your cache, and your notification infrastructure. The orchestration layer and the application layer are the same layer.
Tight Laravel integration also means your agents can stream to the frontend using the Vercel AI SDK protocol via ->asStream(), push long-running work to Horizon via ->queue(), and broadcast intermediate progress over WebSockets, none of which requires any Python bridge.
Where you still need Python:
- Direct PyTorch or Hugging Face model inference. The SDK calls LLM provider APIs; it doesn't run models locally.
- Observability tooling that's Python-native. LangSmith and Weave have no PHP equivalent. Neuron AI's Inspector.dev integration is the closest PHP option for agent tracing, but the ecosystem is a fraction of Python's.
- Workflows that depend on Python's scientific stack (NumPy, pandas, scikit-learn) alongside the agent.
The practical test is about where your tools live. If your agent's tools call your own database, your own queues, or your own internal APIs, build the orchestration in PHP. If your agent's tools need to execute Python code, train models, or connect to Python-native ML infrastructure, build that piece in Python and expose it over HTTP for the PHP orchestrator to call.
Multi-agent orchestration in PHP stopped being an experiment in May 2026. Sub-agents, structured output, parallel tool execution, conversation persistence, model failover: it's all in the official first-party SDK. The question isn't whether you can build it. It's whether you understand the patterns well enough to build it in a way that doesn't wake you up to a five-figure bill.
Originally published at nazarboyko.com.


Top comments (0)