DEV Community

Cover image for Building Agentic Laravel Apps with Prism PHP
Dewald Hugo
Dewald Hugo

Posted on • Originally published at origin-main.com

Building Agentic Laravel Apps with Prism PHP

Stack: Laravel 13, PHP 8.2+, PostgreSQL 16+ with pgvector, Prism PHP

Laravel 13 ships a capable first-party AI SDK (laravel/ai) covering text generation, embeddings, and basic completions across OpenAI, Gemini, and Anthropic. For most prompt-to-text workflows, it is the right starting point and the right long-term answer. Prism PHP is the layer you reach for when your requirements move past that baseline: multi-provider tool calling, agentic loop control with step limits, RAG pipelines with pgvector, and SSE streaming across providers that laravel/ai does not yet expose at full depth. These are complementary tools, not competing ones.

This guide covers the agentic side of that boundary. Everything here sits inside the broader Laravel AI architecture module, which frames how these implementation decisions relate to one another across the full stack.

laravel/ai vs Prism PHP: Knowing Which Tool to Reach For

Before adding Prism PHP to your dependency list, be clear on what you are solving for. The Laravel 13 AI features breakdown covers the first-party SDK in full. The short version:

Feature laravel/ai Prism PHP
Text generation
Embeddings
Tool calling Limited ✓ Full
Agentic loop control (withMaxSteps)
Multi-provider support OpenAI, Gemini, Anthropic + Ollama, Mistral
SSE streaming (unified across providers) Provider-dependent
Broadcast via WebSocket (asBroadcast)

Single provider, no tool calling, straightforward prompt-to-text: use laravel/ai. Agents that call real APIs, route across providers, or stream to a frontend: Prism PHP closes those gaps. The contract-based abstraction approach that makes provider-swapping deterministic in production applies regardless of which layer sits underneath.

Installation

composer require prism-php/prism
Enter fullscreen mode Exit fullscreen mode
php artisan vendor:publish --tag=prism-config
Enter fullscreen mode Exit fullscreen mode

This drops a config/prism.php file. Source your provider API keys from .env and reference them via config. If you plan to register a shared Prism instance through Laravel’s Service Container, your singleton binding goes in bootstrap/app.php:

// bootstrap/app.php
->withProviders([
    App\Providers\AiServiceProvider::class,
])
Enter fullscreen mode Exit fullscreen mode

Keep provider resolution out of controllers. Bind once, inject everywhere.

Multi-Provider Configuration

Hardcoding a single AI provider is a production risk. OpenAI has outages. Anthropic rate-limits at peak. A single config change should be enough to swap providers without touching application code.

OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-api03-...
GEMINI_API_KEY=...
Enter fullscreen mode Exit fullscreen mode

Switching provider is one line:

Prism::text()->using('anthropic', 'claude-sonnet-4-6')
Prism::text()->using('openai', 'gpt-4o')
Prism::text()->using('gemini', 'gemini-2.0-flash')
Enter fullscreen mode Exit fullscreen mode

[Architect’s Note] Build against a provider-agnostic interface from day one. Store the active provider and model string in config, not scattered across call sites. When you need to run a cost comparison between claude-sonnet-4-6 and gpt-4o in production, you will thank yourself for having one place to change. The Laravel AI service layer guide covers the interface pattern that makes this clean.

Basic Text Generation

use Prism\Prism\Facades\Prism;

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withMaxTokens(1024)
    ->withSystemPrompt('You are a luxury fashion consultant with access to our product catalogue.')
    ->withPrompt($userMessage)
    ->asText();

return $response->text;
Enter fullscreen mode Exit fullscreen mode

asText() blocks until the full response arrives. Use it for short, server-side generation tasks. Stream anything user-facing.

[Production Pitfall] Not setting withMaxTokens() means Prism defers to the provider default, which for some models is the full context window. On a busy application, a handful of runaway completions can consume a significant portion of your monthly budget in minutes. Set an explicit ceiling on every call and monitor token consumption via your middleware layer.

Building a Tool-Calling Agent with Prism PHP

Agents in Prism are text requests that include tools. The model decides when to invoke a tool, Prism executes the closure, injects the result back into the conversation, and the loop continues until the model reaches a final answer or withMaxSteps() fires. The diagram below shows the full cycle.

LLM Inference

The Stripe Refund Tool

use Prism\Prism\Facades\Tool;
use Stripe\StripeClient;

$refundTool = Tool::as('process_refund')
    ->for('Refund the most recent charge for a customer by their email address.')
    ->withStringParameter('email', 'The customer email address.')
    ->using(function (string $email): string {
        try {
            $stripe = new StripeClient(config('services.stripe.secret'));

            $customer = $stripe->customers->all(['email' => $email, 'limit' => 1])->data[0] ?? null;
            if (!$customer) {
                return 'Error: Customer not found.';
            }

            $charge = $stripe->charges->all(['customer' => $customer->id, 'limit' => 1])->data[0] ?? null;
            if (!$charge) {
                return 'Error: No charges found for this account.';
            }

            $stripe->refunds->create(['charge' => $charge->id]);

            return "Success: Refunded charge {$charge->id} for {$email}.";

        } catch (\Stripe\Exception\ApiErrorException $e) {
            \Log::error('Stripe refund failed', ['email' => $email, 'error' => $e->getMessage()]);
            return 'Error: Payment processor unavailable. Please try again later.';
        }
    });
Enter fullscreen mode Exit fullscreen mode

[Production Pitfall] This tool executes a live financial transaction. As written, it fires without any human confirmation step. Never let an LLM trigger a refund autonomously in production. Wire in a confirmation queue, a webhook callback, or at minimum a human-approval flag before this runs against real Stripe credentials.

The Twilio SMS Tool

use Prism\Prism\Facades\Tool;
use Twilio\Rest\Client;

$smsTool = Tool::as('send_sms')
    ->for('Send an SMS notification to a phone number.')
    ->withStringParameter('to', 'Recipient phone number in E.164 format.')
    ->withStringParameter('message', 'The message body.')
    ->using(function (string $to, string $message): string {
        try {
            $twilio = new Client(
                config('services.twilio.sid'),
                config('services.twilio.token')
            );

            $twilio->messages->create($to, [
                'from' => config('services.twilio.from'),
                'body' => $message,
            ]);

            return "SMS sent to {$to}.";

        } catch (\Twilio\Exceptions\RestException $e) {
            \Log::error('Twilio SMS failed', ['to' => $to, 'error' => $e->getMessage()]);
            return 'Error: Could not deliver SMS. Notification skipped.';
        }
    });
Enter fullscreen mode Exit fullscreen mode

Running the Agent

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withMaxTokens(2048)
    ->withSystemPrompt('You are a customer support agent. You can process refunds and send SMS notifications.')
    ->withTools([$refundTool, $smsTool])
    ->withMaxSteps(5)
    ->withPrompt($userMessage)
    ->asText();

return $response->text;
Enter fullscreen mode Exit fullscreen mode

withMaxSteps(5) is your safety valve. It caps the agent loop at five tool-call rounds before Prism forces a final answer. For most support workflows, that is more than enough. At step counts above ten, production teams consistently see token costs spike when the model enters retry loops on tool failures. Bump it only with a verified reason.

RAG with pgvector

Standard LLMs hallucinate on data they were not trained on. RAG solves this by pulling semantically relevant chunks from your own database and injecting them into the prompt before the model sees the question. The conceptual foundation, how embeddings work and why pgvector is the right store for production RAG, is in the Laravel embeddings and vector database guide. For direct Anthropic API integration without the Prism abstraction layer, covering raw HTTP, streaming, and token accounting, the Laravel Claude API integration guide covers that approach in full.

Step 1: Enable the Extension

-- Run once against your database
CREATE EXTENSION IF NOT EXISTS vector;
Enter fullscreen mode Exit fullscreen mode

Step 2: Migration

OpenAI’s text-embedding-3-small outputs 1,536 dimensions. Your vector column must match exactly. Dimensions are not interchangeable across embedding models.

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
use Illuminate\Support\Facades\DB;

return new class extends Migration
{
    public function up(): void
    {
        DB::statement('CREATE EXTENSION IF NOT EXISTS vector');

        Schema::create('products', function (Blueprint $table) {
            $table->id();
            $table->string('name');
            $table->text('description');
            $table->string('category');
            $table->decimal('price', 10, 2);
            $table->timestamps();
        });

        // Blueprint does not support vector columns natively; add with a raw statement
        DB::statement('ALTER TABLE products ADD COLUMN embedding vector(1536)');

        // HNSW index for fast approximate nearest-neighbour search
        DB::statement(
            'CREATE INDEX products_embedding_idx ON products USING hnsw (embedding vector_cosine_ops)'
        );
    }
};
Enter fullscreen mode Exit fullscreen mode

Step 3: Generating and Storing Embeddings

Wrap embedding generation in a dedicated service class. This makes it injectable, mockable, and swappable when you change embedding providers.

// app/Services/EmbeddingService.php

namespace App\Services;

use Illuminate\Support\Facades\Http;
use RuntimeException;

class EmbeddingService
{
    public function generate(string $text): array
    {
        $response = Http::withToken(config('services.openai.api_key'))
            ->timeout(30)
            ->retry(3, 1000)
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $text,
            ]);

        if ($response->failed()) {
            throw new RuntimeException(
                'Embedding generation failed: ' . $response->status()
            );
        }

        $embedding = $response->json('data.0.embedding');

        if (!is_array($embedding) || empty($embedding)) {
            throw new RuntimeException('Invalid embedding response from OpenAI.');
        }

        return $embedding;
    }
}
Enter fullscreen mode Exit fullscreen mode

[Efficiency Gain] Embedding generation costs tokens and adds latency. Always dispatch it as a queued job, never inline during a web request. If you re-embed the same product after a minor edit, cache the embedding hash and skip the API call when the source text has not changed.

Dispatch the job rather than calling the service directly from your controller:

// Controller or action class
$product = Product::create([
    'name'        => $name,
    'description' => $description,
    'category'    => $category,
    'price'       => $price,
]);

GenerateProductEmbeddingJob::dispatch($product);
Enter fullscreen mode Exit fullscreen mode
// app/Jobs/GenerateProductEmbeddingJob.php

public int $tries = 3;
public int $backoff = 60;

public function handle(EmbeddingService $embeddingService): void
{
    $text = "{$this->product->name} {$this->product->description} {$this->product->category}";

    $vector = $embeddingService->generate($text);

    DB::statement(
        'UPDATE products SET embedding = ?::vector WHERE id = ?',
        ['[' . implode(',', $vector) . ']', $this->product->id]
    );
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Similarity Search

$queryVector = $embeddingService->generate("What are your best waterproof jackets?");
$vectorString = '[' . implode(',', $queryVector) . ']';

$products = DB::select(
    "SELECT id, name, description, price,
            embedding <=> ?::vector AS distance
     FROM products
     ORDER BY distance
     LIMIT 3",
    [$vectorString]
);
Enter fullscreen mode Exit fullscreen mode

[Edge Case Alert] The <=> operator performs cosine distance: lower values are closer. If results look semantically wrong, verify you are using vector_cosine_ops in your HNSW index, not vector_l2_ops. Mixing operators and index types silently falls back to a sequential scan on large tables, which destroys query performance without surfacing an error.

Step 5: Inject Context into the Prompt

$context = collect($products)
    ->map(fn($p) => "{$p->name}: {$p->description} | R{$p->price}")
    ->implode("\n");

$response = Prism::text()
    ->using('anthropic', 'claude-sonnet-4-6')
    ->withMaxTokens(1024)
    ->withSystemPrompt('You are a product assistant. Use only the provided product context to answer questions. Do not speculate beyond it.')
    ->withPrompt("Context:\n{$context}\n\nQuestion: What are your best waterproof jackets?")
    ->asText();
Enter fullscreen mode Exit fullscreen mode

Constraining the model to the provided context is not just a prompt best practice. It is the operational difference between a useful RAG system and a confidently wrong one.

Real-Time Streaming with SSE

asEventStreamResponse() returns a streamed HTTP response over Server-Sent Events. Before choosing SSE, read the breakdown of Livewire, SSE, and WebSockets for AI streaming transports. SSE is the right default for one-way AI output streams, but not every use case fits that pattern.

Backend

use Prism\Prism\Facades\Prism;
use Illuminate\Http\Request;

Route::get('/chat/stream', function (Request $request) {
    $validated = $request->validate([
        'message' => 'required|string|max:2000',
    ]);

    return Prism::text()
        ->using('anthropic', 'claude-sonnet-4-6')
        ->withMaxTokens(1024)
        ->withSystemPrompt('You are a helpful product assistant.')
        ->withPrompt($validated['message'])
        ->asEventStreamResponse();
});
Enter fullscreen mode Exit fullscreen mode

The validate() call here is non-optional in production. Raw input to a prompt is an injection vector.

Frontend

The scoping issue in a naive implementation: this.output referenced inside a plain <script> block does not bind to the Alpine component. It points to window and silently swallows your streamed content. The corrected version below wraps everything in the Alpine component factory.

<div x-data="chatStream()">
    <textarea x-model="output" class="w-full h-64 p-4 bg-gray-100 rounded" readonly></textarea>
    <input x-model="message" type="text" placeholder="Ask something..."
           class="w-full p-2 border rounded mt-2" />
    <button @click="startStream()"
            class="mt-2 px-4 py-2 bg-blue-600 text-white rounded">Ask</button>
</div>

<script>
    function chatStream() {
        return {
            output: '',
            message: '',
            source: null,
            startStream() {
                if (this.source) this.source.close();
                this.output = '';
                this.source = new EventSource(
                    `/chat/stream?message=${encodeURIComponent(this.message)}`
                );
                this.source.onmessage = (event) => {
                    if (event.data === '[DONE]') {
                        this.source.close();
                        return;
                    }
                    try {
                        const parsed = JSON.parse(event.data);
                        this.output += parsed.text ?? '';
                    } catch (e) {
                        console.error('SSE parse error:', e);
                    }
                };
                this.source.onerror = () => {
                    this.source.close();
                };
            }
        };
    }
</script>
Enter fullscreen mode Exit fullscreen mode

For pushing AI output to a specific authenticated user rather than a public GET endpoint, Prism also supports asBroadcast() via Laravel Reverb. That is the pattern to reach for when you need session-scoped streaming.

Caching Repeated Prompts

AI API calls are expensive. If you are generating the same output repeatedly (summarising a static policy document, answering a common FAQ), cache the result.

$policy = cache()->remember('ai.return_policy.v1', now()->addDay(), function () {
    return Prism::text()
        ->using('anthropic', 'claude-sonnet-4-6')
        ->withMaxTokens(512)
        ->withPrompt('Summarise our return policy in three sentences.')
        ->asText()
        ->text;
});
Enter fullscreen mode Exit fullscreen mode

[Word to the Wise] Cache keying strategy matters more than most developers realise until they have served stale AI output to the wrong user. Only cache responses generated from prompts with no user-specific context. The moment you interpolate a user ID, a session variable, or any personalised data into the prompt, that response must never be shared across users. Tag your cache keys clearly. ai.policy.v1 is a better key than policy because it lets you bust the cache cleanly when your underlying document changes.

Operational Metrics You Should Track in Production

AI systems fail operationally long before they fail technically. Monitoring latency, tool reliability, token usage, and streaming performance becomes essential once AI workloads move into production.

Metric What It Measures Suggested Target
P50 Latency Median end-to-end AI request latency < 2.5s
P95 Latency Tail latency under peak load < 7s
Time To First Token (TTFT) How quickly streaming responses begin < 1.2s
Tool Call Success Rate Percentage of valid tool executions > 95%
Tokens Per Request Input + output token consumption Monitor Trends
Cache Hit Rate Requests served from cache > 30%
Cost Per 1K Requests Normalized AI infrastructure cost Continuously Optimize

Production AI systems should be treated like distributed infrastructure, not simple API integrations. Reliability, latency, observability, and cost management become architectural concerns very quickly.


References and Further Reading

  • Prism PHP Documentation – Official docs covering all providers, tool schemas, and streaming options.
  • OpenAI Embeddings API Reference – Model dimensions, pricing, and batching guidance for text-embedding-3-small and text-embedding-3-large.
  • Laravel Reverb – Laravel’s first-party WebSocket server, relevant if you move beyond SSE to bidirectional AI communication.

Top comments (0)