<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dewald Hugo</title>
    <description>The latest articles on DEV Community by Dewald Hugo (@dewaldhugo).</description>
    <link>https://dev.to/dewaldhugo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3773709%2F196ae154-46de-4b83-adb1-087bb96959d7.jpg</url>
      <title>DEV Community: Dewald Hugo</title>
      <link>https://dev.to/dewaldhugo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dewaldhugo"/>
    <language>en</language>
    <item>
      <title>Laracon EU 2026: What Amsterdam told us about the future of Laravel</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Fri, 15 May 2026 06:22:54 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laracon-eu-2026-what-amsterdam-told-us-about-the-future-of-laravel-1b59</link>
      <guid>https://dev.to/dewaldhugo/laracon-eu-2026-what-amsterdam-told-us-about-the-future-of-laravel-1b59</guid>
      <description>&lt;p&gt;Laracon EU 2026 was the largest European Laravel conference on record. Roughly 1,000 developers descended on the Passenger Terminal Amsterdam on March 2 and 3, a glass-and-steel venue perched directly on the River IJ. What they witnessed across those two days was not a routine framework update. It was a restatement of intent.&lt;/p&gt;

&lt;p&gt;Taylor Otwell didn’t just announce Laravel 13 at Laracon EU 2026. He released a new vision: “The clean stack for Artisans and agents.” That framing matters. “Agents” is not marketing copy here. It is a concrete architectural direction. If you work on AI-adjacent Laravel applications (and increasingly, that means most of us) what happened in Amsterdam directly shapes the decisions you make this quarter. This piece covers every headline announcement, the talks worth watching, and the practical implications for production teams.&lt;/p&gt;

&lt;p&gt;This article is also part of our broader &lt;a href="https://origin-main.com/laravel-ai-architecture/" rel="noopener noreferrer"&gt;AI Architecture coverage&lt;/a&gt;. If you’re thinking through how AI fits your Laravel stack at a structural level, that is where to start.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Venue, the Scale, and the Energy
&lt;/h2&gt;

&lt;p&gt;The Passenger Terminal is not a conventional conference hall. It is a working cruise terminal with a main hall large enough to seat a thousand people without feeling cramped. The energy was palpable from the moment Nuno Maduro opened the conference. This year’s Laracon EU, with around 1,000 attendees, was the largest European Laravel conference ever.&lt;/p&gt;

&lt;p&gt;That attendance figure matters beyond the headline number. The community that showed up was not uniform. Teams from agencies, product companies, and freelancers. Developers who had never attended a Laracon before alongside people who have been at every edition. As always with Laracons, Laracon EU 2026 was more than a conference. It’s not just about the talks. It’s about spending time with like-minded people and sharing an experience as well as celebrating the Laravel community. The pre-party the evening before was where half the real conversations happened. The dinner conversations after Taylor’s keynote ran until midnight.&lt;/p&gt;

&lt;p&gt;The practical takeaway here: if you are weighing whether to attend in 2027, the value is not just the stage content. It is the corridor conversations with people debugging the same problems you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Taylor Otwell’s Keynote: The “Merge It” Moment
&lt;/h2&gt;

&lt;p&gt;By far the most discussed moment of the conference was Taylor Otwell’s keynote on the evening of the first day. It was scheduled to close Day 1, and by 9pm the main hall was still packed. What followed was genuinely hard to describe without sounding like you are exaggerating.&lt;/p&gt;

&lt;p&gt;If there was one single moment that made the vision behind the entire Laravel ecosystem tangible, it was Taylor’s live demo on stage: The complete Software Development Lifecycle, from error detection to deployment, in under two minutes, live in front of 1,000 attendees.&lt;/p&gt;

&lt;p&gt;The sequence was deliberate. A bug occurs in the application. Nightwatch detects the error automatically. An AI agent with Nightwatch MCP integration analyses and fixes the error autonomously. A pull request is automatically opened. Laravel Cloud creates a preview environment. Taylor reviews, says “Merge it”. The change goes live.&lt;/p&gt;

&lt;p&gt;Voice-driven. No manually written code. Two minutes.&lt;/p&gt;

&lt;p&gt;The jawdropping moment was when the AI agent called Taylor and told him “There’s a PR in the repo, do you want me to merge it?” And then it happened: bug fixed and auto-deployed, all with just voice, without writing any code.&lt;/p&gt;

&lt;p&gt;The room reaction was not the usual polite applause of a conference keynote. It was the particular silence that precedes a standing ovation, because people needed a few seconds to process what they had just seen. We’ve all talked about AI-assisted development in the abstract. This was a functioning, live demonstration of the entire loop closing: monitoring, diagnosis, remediation, review, and deployment, as a single integrated workflow.&lt;/p&gt;

&lt;p&gt;Taylor himself put it succinctly: Laravel, with its clear conventions, comprehensive documentation, and 15 years of example code, is ideally positioned for an era in which AI agents are increasingly becoming part of the development process.&lt;/p&gt;

&lt;p&gt;That is not a boast. It is a defensible structural argument. The reason LLM agents work well with Laravel is precisely because the framework has consistent, well-documented conventions that agents can reason about reliably. Convention over configuration turns out to be excellent training data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Laravel 13: What Actually Shipped
&lt;/h2&gt;

&lt;p&gt;Laravel 13 was released on March 17, 2026, announced by Taylor Otwell at Laracon EU 2026, with zero breaking changes, making it the smoothest upgrade in Laravel’s history. The only requirement change is PHP 8.3 minimum.&lt;/p&gt;

&lt;p&gt;The headline features are worth taking individually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PHP Attributes across 15+ locations.&lt;/strong&gt; Laravel 13 will include PHP attributes across 15+ locations. These are optional, so nothing breaks if you don’t adopt them immediately. Middleware registration, authorisation policies, and job configuration can all now be expressed declaratively on the class itself rather than in separate registration files. Whether you adopt them on Day 1 or wait for the ecosystem to settle is a judgement call, but the direction is clear: Laravel is moving toward the attribute syntax as its idiomatic configuration style. If you are still registering middleware in long arrays, your code is already going to read as dated within a release cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reverb database driver.&lt;/strong&gt; A Reverb database driver arrives with Laravel 13, eliminating the need for Redis in smaller broadcasting setups. This is a significant quality-of-life improvement for teams that run Reverb but do not need Redis for anything else. It reduces infrastructure surface area. Whether it holds up under real concurrency pressure is something we will find out over the next few months as teams hit it in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Passkey authentication.&lt;/strong&gt; Passkey authentication is coming to all starter kits and Laravel Fortify, complemented by a client-side companion package. This is the right move, and it is overdue. Password-based authentication has a long tail of support burden attached to it. Passkeys remove that tail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cache::touch().&lt;/strong&gt; A small addition, but one that solves a genuine annoyance: extending a cache entry’s TTL without re-fetching or re-computing the value. We have implemented this manually more than once using custom cache wrappers. Having it built in is welcome.&lt;/p&gt;

&lt;p&gt;For the complete feature analysis from an AI development perspective, see &lt;a href="https://origin-main.com/laravel/laravel-13-ai-sdk-features-breakdown/" rel="noopener noreferrer"&gt;what Laravel 13 actually changes for AI development&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Laravel AI SDK Goes Stable
&lt;/h2&gt;

&lt;p&gt;This is the announcement with the longest tail. On the same date as Laravel 13’s release, the Laravel AI SDK was tagged as officially stable and out of beta.&lt;/p&gt;

&lt;p&gt;The SDK ships with several capabilities that matter in production:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provider-agnostic design.&lt;/strong&gt; The SDK abstracts the interface, allowing you to switch between different LLM providers without rewriting application logic. &lt;strong&gt;Model failover:&lt;/strong&gt; if one model goes down, a fallback automatically kicks in, critical for production applications. &lt;strong&gt;Agent Classes with schema-defined output:&lt;/strong&gt; structured AI responses that integrate in a type-safe manner into existing application logic. &lt;strong&gt;Vector embedding and similarity search:&lt;/strong&gt; integrated directly into Eloquent, significantly simplifying semantic search in Laravel applications. &lt;strong&gt;Queuing of AI tasks:&lt;/strong&gt; AI tasks can be processed through Laravel’s queue system as usual.&lt;/p&gt;

&lt;p&gt;The provider-agnostic design is the decision that matters most here. Any team currently hard-coding an OpenAI or Anthropic SDK directly into their service layer is building technical debt. The Laravel AI SDK gives you the abstraction layer that most teams were building manually anyway. Our &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-integration-architecture-guide/" rel="noopener noreferrer"&gt;production guide to provider-agnostic AI integration in Laravel&lt;/a&gt; covers the architectural patterns that the SDK now formalises at the framework level. If you are migrating existing AI integration code, that is the reference.&lt;/p&gt;

&lt;p&gt;[Architect’s Note] The SDK’s Eloquent-integrated vector search is the quiet headline most teams will underestimate. Semantic search without a separate vector database layer is not a complete RAG replacement for every use case, but for the majority of content-search scenarios most Laravel apps actually need, it removes an entire infrastructure dependency. Assess whether you still need pgvector or Pinecone before reaching for them by default.&lt;/p&gt;

&lt;h2&gt;
  
  
  NativePHP Super Native: Blade Goes Truly Native
&lt;/h2&gt;

&lt;p&gt;If one talk generated the most feverish corridor conversations after it ended, it was the NativePHP thread across both conference days.&lt;/p&gt;

&lt;p&gt;Simon Hamp presented the current state of NativePHP for Mobile: the modular plugin architecture, the free open-source core “NativePHP Air,” and the tooling around Jump and Bifrost for App Store deployments. That was already interesting. Then Shane Rosenthal took the stage on Day 2 and went further.&lt;/p&gt;

&lt;p&gt;“Super Native” is an entirely new approach where Blade components and Tailwind CSS classes are translated directly into native UI elements through a custom parser. No WebView, no bridge, no overhead. The benchmarks showed UI interactions in the microsecond range, in some tests faster than React Native and Flutter. The live demo showed a functional to-do app with Eloquent, various native UI components, and app clones generated by an LLM in a single pass that rendered as 100% native UI.&lt;/p&gt;

&lt;p&gt;We were sceptical on first pass. Then the demo ran. The implications are real: PHP teams could build native mobile and desktop applications without leaving the Laravel ecosystem. Whether Super Native holds up under complex UI requirements in production apps is an open question. The honest answer is that nobody knows yet, because the benchmarks shown were controlled environments. But the proof of concept is convincing enough that it warrants serious evaluation for internal tools and MVPs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Talks That Will Shape Your Next Sprint
&lt;/h2&gt;

&lt;p&gt;Beyond the keynote, Laracon EU 2026 had one of the strongest community talk lineups in recent memory. A few that are worth watching back:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Yannick Kupferschmidt: AI Won’t Fail Loudly, It’ll Fail Quietly.&lt;/strong&gt; The danger of AI agents doesn’t lie in obvious errors, but in subtle bugs and gradual architecture drift. The antidote is “Context Engineering”: not just better prompts, but structured guidelines files, reusable skills, and deliberate control over what an agent receives as context. Larger context windows don’t automatically mean better results. Targeted context does. This is the talk most teams building AI features need to watch first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Peter Suhm: Unblocking Your Users with AI.&lt;/strong&gt; A pragmatic framework for integrating AI into existing Laravel applications rather than greenfield projects. Three practical entry points depending on where users get stuck, with real-world examples. No vague “AI-first” positioning. Concrete integration points.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marcel Pociot: Refactoring to Parallel.&lt;/strong&gt; Marcel’s talk on asynchronous refactoring patterns paired well with the AI SDK’s queue integration story. If you are building AI pipelines and not yet thinking about concurrency, this one is relevant immediately.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tobias Petry: One Billion Records with Laravel.&lt;/strong&gt; One of the more technically dense talks of the conference. Query optimisation, indexing strategy, and Eloquent behaviour at scale. If your application is approaching data volumes where default Eloquent patterns start to break, this is required watching.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inertia v3.&lt;/strong&gt; The beta for Inertia v3 was due to be tagged at any time following the conference. Simpler, slimmer, and with reduced client-side overhead. If your team runs Inertia-based frontends, the upgrade path is worth evaluating now rather than waiting for it to become urgent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dan Harrin: Abstractions through description, not instruction.&lt;/strong&gt; One of the underrated talks of the event. The core argument: better abstractions emerge from describing the problem well, not from instructing your way to a solution. Directly applicable to AI prompt design and to API contract design in equal measure.&lt;/p&gt;

&lt;p&gt;For teams looking to &lt;a href="https://origin-main.com/ai-agents/laravel-prism-php-agentic-apps/" rel="noopener noreferrer"&gt;build agentic Laravel applications&lt;/a&gt; that hold up in production, Kupferschmidt’s talk on quiet AI failure is the one to pair with your architecture decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI as Infrastructure, Not a Feature
&lt;/h2&gt;

&lt;p&gt;The most important thing about Laracon EU 2026 was not any single announcement. It was the tone.&lt;/p&gt;

&lt;p&gt;AI was not presented as an add-on or a differentiator you bolt onto an existing application. It was treated as infrastructure, the same category as queues, caching, and broadcasting. The Laravel AI SDK ships with queue integration. Vector search ships inside Eloquent. Model failover ships as a first-class SDK concern.&lt;/p&gt;

&lt;p&gt;That framing carries a practical implication: teams that are building AI features as isolated packages or service classes bolted onto otherwise conventional Laravel applications are going to find themselves refactoring again in twelve months. The framework is standardising around conventions for AI integration. Teams that adopt those conventions early will carry less technical debt.&lt;/p&gt;

&lt;p&gt;If you are currently evaluating &lt;a href="https://origin-main.com/stack/laravel-ai-development-stack-setup/" rel="noopener noreferrer"&gt;the current AI development stack for Laravel&lt;/a&gt;, the SDK stable release changes several of the decisions that were previously in play.&lt;/p&gt;

&lt;p&gt;The production deployment implications deserve equal attention. The “Merge It” demo was not just about AI. It was about what a closed-loop deployment pipeline looks like when every component in the ecosystem integrates at the API level: Nightwatch, Cloud, the AI agent, the queue system. If you are still running manual deployment workflows, the &lt;a href="https://origin-main.com/guides/deploy-laravel-to-production/" rel="noopener noreferrer"&gt;production deployment guide&lt;/a&gt; is the right place to close that gap before the tooling gets further ahead of your process.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Verdict
&lt;/h2&gt;

&lt;p&gt;Laracon EU 2026 was not a conference about what might be possible. What unfolded in real time wasn’t some future scenario. It’s the concrete direction the Laravel ecosystem is heading: A fully integrated workflow where monitoring, AI-powered debugging, code review, and deployment seamlessly interlock.&lt;/p&gt;

&lt;p&gt;The announcements are real. The SDK is stable. Laravel 13 is live. The “Merge It” demo ran without edits in front of a thousand people.&lt;/p&gt;

&lt;p&gt;What Laracon EU 2026 made clear is that the question for production Laravel teams is no longer “should we integrate AI?” The question is “how close to the framework’s conventions are we building when we do it?” The teams that answer that question well in the next six months will be significantly further ahead by the time Laracon EU 2027 arrives.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>php</category>
      <category>eu</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Laravel Horizon in Production: Configuring AI Queue Workloads That Actually Hold</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Fri, 15 May 2026 00:02:44 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-horizon-in-production-configuring-ai-queue-workloads-that-actually-hold-53do</link>
      <guid>https://dev.to/dewaldhugo/laravel-horizon-in-production-configuring-ai-queue-workloads-that-actually-hold-53do</guid>
      <description>&lt;p&gt;Laravel Horizon in production looks deceptively simple until your first LLM inference job times out silently and your users start receiving empty responses. Standard queue jobs (sending emails, processing images, syncing records), complete in milliseconds. AI inference jobs do not. A cold &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; call with a dense system prompt can run for 45 seconds. A &lt;code&gt;gemini-2.5-pro&lt;/code&gt; batch summarisation job can breach two minutes under load. Horizon’s defaults were not built for this, and the failure modes are nasty: jobs that disappear without landing in &lt;code&gt;failed_jobs&lt;/code&gt;, rate limit retries that exhaust the &lt;code&gt;tries&lt;/code&gt; budget in under 30 seconds, and expensive inference work discarded mid-completion.&lt;/p&gt;

&lt;p&gt;This guide is part of the &lt;a href="https://origin-main.com/ai-deployment-production-operations/" rel="noopener noreferrer"&gt;AI Deployment &amp;amp; Production Operations module&lt;/a&gt;, which covers the full surface area of running Laravel AI applications in production. If you are still wiring up the surrounding deployment infrastructure, the &lt;a href="https://origin-main.com/guides/deploy-laravel-to-production/" rel="noopener noreferrer"&gt;complete production deployment guide&lt;/a&gt; is the right starting point.&lt;/p&gt;

&lt;p&gt;What follows covers the three layers where AI queue workloads require deliberate configuration: supervisor setup, job class design, and operational monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Jobs Break Standard Horizon Assumptions
&lt;/h2&gt;

&lt;p&gt;Standard Horizon configuration assumes workers cycle through jobs in seconds. The defaults reflect that: 60-second timeouts, three retries with no backoff configuration, and supervisor settings tuned for throughput. Those assumptions collapse the moment you start queuing LLM inference.&lt;/p&gt;

&lt;p&gt;Three failure modes come up repeatedly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent timeout kills.&lt;/strong&gt; Horizon’s default &lt;code&gt;timeout&lt;/code&gt; of 60 seconds is aggressive for AI inference. A &lt;code&gt;gpt-4o&lt;/code&gt; call with a large context window can sit at 50 seconds before returning its first token. Add network variance and the worker process receives a &lt;code&gt;SIGKILL&lt;/code&gt; mid-call. No exception is logged. The job does not land in &lt;code&gt;failed_jobs&lt;/code&gt;. It just vanishes. This is the most common support ticket pattern we see from teams that have not tuned Horizon for AI: “jobs are disappearing.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limit mishandling.&lt;/strong&gt; Provider 429 responses from OpenAI, Anthropic, and Google are not errors in the traditional sense. They are expected, temporary, and recoverable. Retrying immediately burns through the &lt;code&gt;tries&lt;/code&gt; budget in seconds. Without a &lt;code&gt;backoff&lt;/code&gt; array defined on the job, Laravel uses zero delay between retries by default. A job hitting a rate limit five times in 15 seconds has failed just as permanently as one that hit a genuine error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Partial output loss.&lt;/strong&gt; AI jobs often do useful work before failing. A document summarisation job might process 80% of its input before hitting a context limit. Standard job failure handling discards that state entirely. For expensive inference workloads on long documents, that is a measurable cost.&lt;/p&gt;

&lt;p&gt;The fix requires changes at three levels: supervisor configuration, job class design, and monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Laravel Horizon in Production for AI Workloads
&lt;/h2&gt;

&lt;p&gt;Install Horizon if you have not already:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require laravel/horizon
php artisan horizon:install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The critical configuration lives in &lt;code&gt;config/horizon.php&lt;/code&gt;. The default supervisor configuration is intentionally generic. For AI workloads, you need a dedicated supervisor pool with materially different settings.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/horizon.php&lt;/span&gt;

&lt;span class="s1"&gt;'environments'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'production'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;

        &lt;span class="s1"&gt;'supervisor-ai-inference'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'connection'&lt;/span&gt;          &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'redis'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'queue'&lt;/span&gt;               &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'ai-high'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ai-default'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ai-low'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'balance'&lt;/span&gt;             &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'auto'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'autoScalingStrategy'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'time'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'minProcesses'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'maxProcesses'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'balanceMaxShift'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'balanceCooldown'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'timeout'&lt;/span&gt;             &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 5 minutes — covers streaming completions&lt;/span&gt;
            &lt;span class="s1"&gt;'sleep'&lt;/span&gt;               &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'tries'&lt;/span&gt;               &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'nice'&lt;/span&gt;                &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;

        &lt;span class="s1"&gt;'supervisor-default'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'connection'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'redis'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'queue'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'default'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'notifications'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mail'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'balance'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'simple'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'minProcesses'&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'maxProcesses'&lt;/span&gt;&lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'timeout'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'sleep'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'tries'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few decisions here that are worth explaining.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;autoScalingStrategy: time&lt;/code&gt; scales workers based on queue wait time rather than queue size. For AI workloads, queue size is a poor signal: three jobs waiting sounds manageable, but if each takes 90 seconds, you are looking at a 4-minute tail for the last user. Time-based scaling catches this earlier.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;timeout: 300&lt;/code&gt; gives generous headroom for streaming completions and large context calls. This is not a ceiling you should approach routinely; it is a safety net. If jobs are regularly running past 120 seconds, that is a prompt engineering problem, not a timeout problem.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;balanceCooldown: 5&lt;/code&gt; prevents the auto-balancer from thrashing worker counts during a burst of short AI calls followed by a trough. Default is 3 seconds, which is too reactive for inference workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supervisord server configuration&lt;/strong&gt; is equally important. The &lt;code&gt;stopwaitsecs&lt;/code&gt; value must exceed Horizon’s &lt;code&gt;timeout&lt;/code&gt; value, or the process manager will kill a running Horizon worker before it finishes a long inference job during deployments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[program:laravel-horizon]&lt;/span&gt;
&lt;span class="py"&gt;process_name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;%(program_name)s&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;php /var/www/html/artisan horizon&lt;/span&gt;
&lt;span class="py"&gt;autostart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;autorestart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;user&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;www-data&lt;/span&gt;
&lt;span class="py"&gt;redirect_stderr&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;stdout_logfile&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/var/www/html/storage/logs/horizon.log&lt;/span&gt;
&lt;span class="py"&gt;stopwaitsecs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;360&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;stopwaitsecs&lt;/code&gt; to at least &lt;code&gt;timeout + 60&lt;/code&gt;. We have seen rolling deployments silently truncate in-flight inference calls because this was left at the default 10 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designing the Job Class: Timeout, Retry, and Rate Limit Handling
&lt;/h2&gt;

&lt;p&gt;The supervisor configuration sets the outer boundary. The job class defines behaviour within it. For AI inference jobs, three properties are non-negotiable: &lt;code&gt;$timeout&lt;/code&gt;, &lt;code&gt;$tries&lt;/code&gt;, and &lt;code&gt;$backoff&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Jobs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Bus\Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Queue\ShouldQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Foundation\Bus\Dispatchable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\Middleware\RateLimited&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenerateAIInsightJob&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Dispatchable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Queueable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Hard kill threshold. Horizon's supervisor timeout is the outer wall;
     * this property is the job's own declaration to the queue system.
     * Set it below the supervisor timeout to allow graceful error handling.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Total attempts before the job is moved to failed_jobs.
     * 5 attempts with exponential backoff covers transient provider outages.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Seconds to wait before each retry attempt.
     * Indices correspond to attempt number: attempt 1 waits 30s, attempt 2 waits 60s, etc.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$backoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;180&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'claude-sonnet-4-6'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RateLimited&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-inference'&lt;/span&gt;&lt;span class="p"&gt;)];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;findOrFail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;\Anthropic&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'model'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'max_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'messages'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'ai_insight'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'insight_model'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'insight_token_count'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;inputTokens&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;isRateLimitException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Release back to queue with the appropriate backoff delay.&lt;/span&gt;
                &lt;span class="c1"&gt;// Do NOT throw — throwing counts as a failed attempt.&lt;/span&gt;
                &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;backoff&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;240&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AI insight generation failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'document_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'attempt'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'error'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Preserve whatever partial work exists rather than nulling it.&lt;/span&gt;
        &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'ai_insight_status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'ai_insight_error'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AI insight job exhausted all retries'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'document_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'model'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;retryUntil&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;\DateTime&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Absolute deadline. Even with $tries remaining, the job will not&lt;/span&gt;
        &lt;span class="c1"&gt;// retry after this point. Critical for time-sensitive inference pipelines.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addHours&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;isRateLimitException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bool&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'429'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'rate_limit'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'Too Many Requests'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;$this-&amp;gt;release()&lt;/code&gt; pattern inside &lt;code&gt;isRateLimitException&lt;/code&gt; is the correct approach for provider rate limits. Throwing the exception counts as a failed attempt and triggers a retry cycle. Calling &lt;code&gt;release()&lt;/code&gt; puts the job back on the queue with a delay and does not decrement the &lt;code&gt;tries&lt;/code&gt; counter. Rate limits are not job failures; they are scheduling signals.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;retryUntil()&lt;/code&gt; is the safety valve. Without it, a job sitting in exponential backoff across five attempts could theoretically retry for hours after the result is no longer needed. Set this to match the actual business requirement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Registering the Rate Limiter
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;RateLimited&lt;/code&gt; middleware on the job references a named rate limiter. Register it in your &lt;code&gt;AppServiceProvider&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Providers/AppServiceProvider.php&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Cache\RateLimiting\Limit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\RateLimiter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-inference'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;object&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Adjust this to match your provider tier.&lt;/span&gt;
        &lt;span class="c1"&gt;// Anthropic Tier 2: ~1,000 RPM. OpenAI Tier 3: ~5,000 RPM.&lt;/span&gt;
        &lt;span class="c1"&gt;// Start conservative and increase as you verify throughput.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'global'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For multi-tenant applications with per-tenant provider keys, scope the limiter by tenant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-inference'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;object&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$tenantId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;tenantId&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;'global'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"tenant:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$tenantId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are building the broader governance layer around token budgets and per-tenant limits, the &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;AI middleware article covering token tracking&lt;/a&gt; covers the HTTP request layer equivalent of this pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Job Duration: Why This Visualisation Matters
&lt;/h2&gt;

&lt;p&gt;The chart below illustrates why AI inference jobs cannot share a supervisor pool with standard queue jobs. Standard jobs cluster almost entirely below 200ms. AI inference jobs distribute across a 5-to-90-second range, with a meaningful tail. A shared 60-second timeout kills the tail of the AI distribution silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bar chart comparing job duration distribution between standard queue jobs and AI inference jobs, illustrating why a shared 60-second timeout is insufficient for AI workloads.
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F102oonzqavuc49zx253j.PNG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F102oonzqavuc49zx253j.PNG" alt="Job duration distribution between standard queue jobs and AI inference jobs" width="769" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring What Actually Matters for AI Queues
&lt;/h2&gt;

&lt;p&gt;The Horizon dashboard gives you throughput, wait time, and recent job runtime out of the box. For standard workloads, those three numbers tell you most of what you need. For AI inference workloads, the signal you need most is not surfaced by default: the ratio of jobs that exit via &lt;code&gt;SIGKILL&lt;/code&gt; versus jobs that complete normally.&lt;/p&gt;

&lt;p&gt;The table below outlines which Horizon metrics carry real weight for AI workloads, and the thresholds worth building alerts around.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvhpxas7auey12bgwjsb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnvhpxas7auey12bgwjsb.png" alt="Horizon metrics" width="752" height="516"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the silent timeout kill problem, there is no native Horizon alert. The symptom is a job that leaves no trace in &lt;code&gt;failed_jobs&lt;/code&gt; despite not completing. You can detect this indirectly by tracking job dispatch counts against completion counts in your application layer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Dispatch side — record intent&lt;/span&gt;
&lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"jobs:dispatched:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;GenerateAIInsightJob&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;onQueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-default'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Handle side — record completion&lt;/span&gt;
&lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;increment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"jobs:completed:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A growing gap between those two counters, without a corresponding growth in &lt;code&gt;failed_jobs&lt;/code&gt;, is the fingerprint of silent SIGKILL kills. Add a scheduled command to &lt;code&gt;routes/console.php&lt;/code&gt; that alerts when the gap exceeds a threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/console.php&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Schedule&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Schedule&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$dispatched&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'jobs:dispatched:total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$completed&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'jobs:completed:total'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$failed&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;DB&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'failed_jobs'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                     &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'queue'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'like'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ai-%'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                     &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'failed_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;gt;='&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;subHour&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
                     &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;count&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nv"&gt;$missing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$dispatched&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$completed&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nv"&gt;$failed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$missing&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AI jobs disappearing without trace'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'dispatched'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$dispatched&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completed'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$completed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'failed'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$failed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'missing'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$missing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;everyFiveMinutes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Production Pitfall]&lt;/strong&gt; The silent SIGKILL kill is by far the most dangerous failure mode in AI queue workloads, because it produces no actionable output. Teams routinely run in this state for weeks without realising it, attributing the missing outputs to “the AI being slow.” Check your &lt;code&gt;stopwaitsecs&lt;/code&gt; and &lt;code&gt;timeout&lt;/code&gt; alignment before anything else. If &lt;code&gt;stopwaitsecs&lt;/code&gt; in your supervisord config is lower than Horizon’s &lt;code&gt;timeout&lt;/code&gt;, every deployment is silently truncating in-flight inference calls.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you are building the broader observability layer around your AI architecture, the governance and telemetry patterns in the &lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;production AI architecture guide&lt;/a&gt; cover how to centralise this kind of cross-cutting operational signal across providers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Failed Job Strategy for LLM Inference
&lt;/h2&gt;

&lt;p&gt;When a job does land in &lt;code&gt;failed_jobs&lt;/code&gt;, the default response is to re-dispatch it manually via &lt;code&gt;php artisan queue:retry&lt;/code&gt; and hope the error was transient. For AI inference, that is rarely sufficient. Inference failures tend to cluster around specific causes (provider outages, malformed prompts, context window overflows, or inference parameters that produce invalid output), and each warrants a different response.&lt;/p&gt;

&lt;p&gt;Structure your failed job handling to capture enough context to triage correctly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$reason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'context_length_exceeded'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'context_overflow'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'429'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'rate_limit_exhausted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nf"&gt;str_contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'invalid_request_error'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'bad_prompt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;default&lt;/span&gt;                                                           &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'unknown'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'ai_insight_status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'ai_failure_reason'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'ai_insight_error'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AI inference job failed permanently'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'document_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;documentId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'model'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'reason'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$reason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'attempts'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;attempts&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;reason&lt;/code&gt; field is the important addition. &lt;code&gt;context_overflow&lt;/code&gt; failures need prompt truncation logic, not a retry. &lt;code&gt;bad_prompt&lt;/code&gt; failures need a developer looking at the prompt template, not an automated re-queue. Retrying them blindly burns provider quota for no benefit.&lt;/p&gt;

&lt;p&gt;For agentic pipelines where the inference job is one step in a multi-step chain, the failure handling becomes more complex. The question of what to do with downstream jobs that depend on the failed step, and how to validate the partial output that does exist, is covered in depth in the &lt;a href="https://origin-main.com/laravel-architecture/laravel-agentic-workflow-schema-validation/" rel="noopener noreferrer"&gt;agentic workflow schema validation guide&lt;/a&gt;. The core principle applies here too: validate what you have before deciding whether a retry is warranted.&lt;/p&gt;

&lt;p&gt;One final note on the Horizon dashboard and Laravel Telescope together: Telescope’s job watcher captures the full job payload, exception stack trace, and timing for every failed job. For AI workloads, the job payload includes the prompt, which makes post-mortem analysis significantly faster. Enable the job watcher in non-production environments at minimum, and consider enabling it in production with payload scrubbing for anything containing PII. See the &lt;a href="https://laravel.com/docs/horizon" rel="noopener noreferrer"&gt;Laravel Horizon documentation&lt;/a&gt; for tag-based filtering, which lets you isolate AI job failures without noise from other queue traffic. Anthropic’s &lt;a href="https://docs.anthropic.com/en/api/rate-limits" rel="noopener noreferrer"&gt;rate limit documentation&lt;/a&gt; is the authoritative reference for per-tier RPM and token-per-minute limits if you are calibrating your &lt;code&gt;RateLimiter::for('ai-inference')&lt;/code&gt; ceiling.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>php</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Building Fail-Safes for Incomplete LLM Responses in Laravel Echo</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Thu, 14 May 2026 18:35:48 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/building-fail-safes-for-incomplete-llm-responses-in-laravel-echo-31mg</link>
      <guid>https://dev.to/dewaldhugo/building-fail-safes-for-incomplete-llm-responses-in-laravel-echo-31mg</guid>
      <description>&lt;p&gt;Broadcasting LLM token streams through Laravel Echo feels elegant right up until the moment it silently dies halfway through a response. No error. No terminal event. Just a client sitting there, waiting for tokens that will never arrive.&lt;/p&gt;

&lt;p&gt;We hit this in production during a multi-user document generation feature. The Pusher connection degraded during a particularly long Anthropic response. The queue job had no idea the client had disconnected, and the user stared at a spinner for three minutes before refreshing. The partial response was gone. No retry surface. No recovery path. The incident report had three action items and none of them were obvious beforehand.&lt;/p&gt;

&lt;p&gt;That experience shaped the *&lt;strong&gt;&lt;em&gt;Laravel LLM streaming fail-safe&lt;/em&gt;&lt;/strong&gt;* architecture this article covers. Every pattern below is oriented toward the specific failure modes that broadcasting-based LLM streams introduce, and in several cases those failure modes are different from what you encounter with SSE.&lt;/p&gt;

&lt;p&gt;One framing note before we start. Laravel Echo names a client-side JavaScript library for subscribing to broadcast channels. It is not an SSE wrapper. The architecture here is: a queued job calls the LLM API, iterates the stream, and broadcasts each token as a private channel event. Echo subscribes on the client. This pattern earns its weight when you need multiple subscribers on the same stream (team collaboration, agent monitoring, admin oversight), or when you are already running Reverb or Pusher in your stack. &lt;a href="https://origin-main.com/ai-foundations/laravel-ai-streaming-livewire-sse-websockets/" rel="noopener noreferrer"&gt;The tradeoffs between Livewire, SSE, and WebSockets for AI streaming&lt;/a&gt; are worth understanding before committing to this path. If you only have one client per stream, SSE is simpler and has materially lower infrastructure overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Incomplete Streams Are More Dangerous in Echo
&lt;/h2&gt;

&lt;p&gt;SSE gives you a persistent, unidirectional HTTP connection with browser-native reconnect semantics. When the connection drops, the browser reconnects and sends &lt;code&gt;Last-Event-ID&lt;/code&gt;. You have backpressure. You know whether the connection is alive.&lt;/p&gt;

&lt;p&gt;Echo over WebSockets gives you none of that for free. The WebSocket channel stays open even when the underlying LLM stream on the server has failed. The client shows “connected” while the queue job is dead or retrying. Pusher and Reverb will accept broadcast events and deliver them on arrival, but they will not tell your client that no more events are coming.&lt;/p&gt;

&lt;p&gt;That gap is exactly where incomplete responses live. Four categories to prepare for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token limit truncation.&lt;/strong&gt; The LLM hits &lt;code&gt;max_tokens&lt;/code&gt; and stops mid-sentence. The &lt;code&gt;finish_reason&lt;/code&gt; comes back as &lt;code&gt;max_tokens&lt;/code&gt;, not &lt;code&gt;end_turn&lt;/code&gt;. If your client only listens for a generic “done” signal without inspecting the finish reason, a truncated response renders as complete. Users won’t know. You won’t know until they complain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Queue job failure.&lt;/strong&gt; The job throws after token 47. Laravel retries it. The client has already rendered tokens 1 through 47 and will now receive them again from the retry run, duplicated, in the same Echo channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Silent connection drop.&lt;/strong&gt; The WebSocket drops between the server and Pusher, or between Pusher and the client. Events broadcast during the gap are gone. The job may complete successfully on the server and broadcast a terminal event that the client never receives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orphaned streams.&lt;/strong&gt; The queue worker is killed mid-flight by an OOM event or a forced deployment restart. The stream record in your database stays in &lt;code&gt;streaming&lt;/code&gt; status indefinitely, because nothing transitions it out.&lt;/p&gt;

&lt;p&gt;Each one needs a different fix. Build the layers in order.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture Baseline
&lt;/h2&gt;

&lt;p&gt;The core pattern is a queued job that streams from the LLM API and broadcasts each token as a private channel event scoped to a unique &lt;code&gt;stream_id&lt;/code&gt;. Every event carries a monotonically increasing sequence number. That sequence number is the foundation of every fail-safe that follows.&lt;/p&gt;

&lt;p&gt;Start with the broadcast event. The &lt;a href="https://laravel.com/docs/broadcasting" rel="noopener noreferrer"&gt;Laravel Broadcasting documentation&lt;/a&gt; covers private channel authorization in full, but the event structure below is what drives the client-side recovery logic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Events&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Broadcasting\PrivateChannel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Broadcasting\ShouldBroadcast&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldBroadcast&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="nv"&gt;$streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;     &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;  &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;// 'streaming' | 'complete' | 'truncated' | 'error' | 'dead'&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$finishReason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;broadcastOn&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;PrivateChannel&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PrivateChannel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"stream.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;broadcastAs&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'token.received'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;broadcastWith&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'stream_id'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'token'&lt;/span&gt;         &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'sequence'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'status'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'finish_reason'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;finishReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;status&lt;/code&gt; field carries the terminal state explicitly. &lt;code&gt;complete&lt;/code&gt; means the LLM finished naturally (&lt;code&gt;end_turn&lt;/code&gt;). &lt;code&gt;truncated&lt;/code&gt; means it hit a token limit. &lt;code&gt;error&lt;/code&gt; and &lt;code&gt;dead&lt;/code&gt; mean infrastructure or retry failure. The client does not guess. The server tells it.&lt;/p&gt;

&lt;p&gt;The database layer makes recovery possible at all:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Database\Migrations\Migration&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Database\Schema\Blueprint&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Schema&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Migration&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;up&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'llm_streams'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;foreignId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;constrained&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;cascadeOnDelete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'streaming'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'complete'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'truncated'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dead'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;longText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'partial_content'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;longText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'final_content'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;unsignedInteger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'last_sequence'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'finish_reason'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'error_message'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'started_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'last_checkpoint_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'completed_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'failed_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'last_checkpoint_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// orphan detection&lt;/span&gt;
            &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Server-Side Fail-Safes
&lt;/h2&gt;

&lt;p&gt;The queue job carries most of the server-side protection. Three responsibilities: stream tokens and broadcast them with sequence numbers, write periodic checkpoints so recovery is possible, and guarantee a terminal broadcast event regardless of how the job ends.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://docs.anthropic.com/en/api/messages-streaming" rel="noopener noreferrer"&gt;Anthropic Messages Streaming documentation&lt;/a&gt; defines the event types used in the iterator below. The &lt;code&gt;message_stop&lt;/code&gt; event is the authoritative signal that the LLM has finished, and its &lt;code&gt;stop_reason&lt;/code&gt; field is what you map to your own &lt;code&gt;status&lt;/code&gt; enum.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Jobs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Events\LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\LlmStream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Anthropic\Laravel\Facades\Anthropic&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Bus\Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Queue\ShouldQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Foundation\Bus\Dispatchable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Throwable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StreamLlmResponseJob&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Dispatchable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Queueable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tries&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$backoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$sequence&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$buffer&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$terminated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'streaming'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'started_at'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()]);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'model'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'claude-sonnet-4-5'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'max_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'messages'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$stream&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'content_block_delta'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;isset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nv"&gt;$token&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="nv"&gt;$buffer&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s1"&gt;'streaming'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;));&lt;/span&gt;

                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$sequence&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'message_stop'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nv"&gt;$finishReason&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$event&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;stop_reason&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="s1"&gt;'unknown'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="nv"&gt;$status&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$finishReason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'end_turn'&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'complete'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'truncated'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="nv"&gt;$terminated&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                    &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;       &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;finishReason&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$finishReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;));&lt;/span&gt;

                    &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="s1"&gt;'status'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'finish_reason'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$finishReason&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'final_content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'last_sequence'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'completed_at'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="p"&gt;]);&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Throwable&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$terminated&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

                &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;       &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;finishReason&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'exception'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;));&lt;/span&gt;

                &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                    &lt;span class="s1"&gt;'status'&lt;/span&gt;          &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'error_message'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="s1"&gt;'last_sequence'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'partial_content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'failed_at'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// preserve queue retry behaviour&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;checkpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'partial_content'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'last_sequence'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'last_checkpoint_at'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Throwable&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'status'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'dead'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'error_message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$exception&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s1"&gt;'failed_at'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="c1"&gt;// Last-resort broadcast after all retries are exhausted.&lt;/span&gt;
        &lt;span class="c1"&gt;// PHP_INT_MAX sequence guarantees the client accepts this regardless&lt;/span&gt;
        &lt;span class="c1"&gt;// of how many gaps have accumulated.&lt;/span&gt;
        &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="kc"&gt;PHP_INT_MAX&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;       &lt;span class="s1"&gt;'dead'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;finishReason&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'retries_exhausted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three decisions in this code are worth calling out explicitly.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;$terminated&lt;/code&gt; flag prevents double-broadcasting. If &lt;code&gt;message_stop&lt;/code&gt; fires and then the foreach cleanup throws, you do not want a second terminal event with &lt;code&gt;status: 'error'&lt;/code&gt; landing after you have already broadcast &lt;code&gt;status: 'complete'&lt;/code&gt;. Clients will not handle that gracefully.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;throw $e&lt;/code&gt; at the end of the catch block is intentional. Removing it tells Laravel the job succeeded, killing retry behaviour. The partial broadcast for &lt;code&gt;error&lt;/code&gt; is informational to the client. The throw is what keeps the queue system honest.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;failed()&lt;/code&gt; hook runs after all retry attempts are exhausted and is your last-resort client notification. This method is called by Laravel’s Queue Worker, not by your own code.&lt;/p&gt;

&lt;p&gt;[Edge Case Alert] If you are running Laravel Reverb rather than Pusher, broadcasts inside &lt;code&gt;failed()&lt;/code&gt; may not deliver if Reverb is also under load or restarting at the same time as your queue worker. &lt;code&gt;failed()&lt;/code&gt; is best-effort in that scenario. The orphan detection command below is your backstop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Client-Side Fail-Safes
&lt;/h2&gt;

&lt;p&gt;Most teams spend three lines on the Echo subscription and zero lines on what happens when it goes quiet. The watchdog and sequence tracking below are the part that actually makes this production-safe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LlmStreamClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;streamId&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastSeq&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;watchdog&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;silenceMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;silenceMs&lt;/span&gt;  &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onToken&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onToken&lt;/span&gt;    &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onComplete&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onComplete&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onError&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onError&lt;/span&gt;    &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Echo&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;private&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`stream.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;.token.received&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resetWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handleEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;handleEvent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;sequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;finish_reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;finishReason&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sequence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastSeq&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastSeq&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;received&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sequence&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastSeq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastSeq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sequence&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;streaming&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;partial&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;complete&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onComplete&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finishReason&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finishReason&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;teardown&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;startWatchdog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;watchdog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;client_timeout&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`No event received for &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;silenceMs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;ms`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;teardown&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;silenceMs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;resetWatchdog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;clearWatchdog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;watchdog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;watchdog&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;watchdog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nf"&gt;teardown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearWatchdog&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Echo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;leave&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`stream.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;streamId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;channel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usage with the full error surface handled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmStreamClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;abc-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;silenceMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="na"&gt;onToken&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;full&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nb"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getElementById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;output&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;textContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;full&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;onComplete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finishReason&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Stream gaps detected — some tokens may be missing.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;gaps&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;saveResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;finishReason&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;savePartial&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;partial&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// always persist what arrived&lt;/span&gt;

        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;truncated&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;showRetryButton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Response was cut off. Retry to continue.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;client_timeout&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;showRetryButton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Connection timed out. Retry when ready.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;showError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Something went wrong. Your partial response has been saved.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The watchdog fires after &lt;code&gt;silenceMs&lt;/code&gt; of silence. Ten seconds without a token means something is wrong: the job died, the Pusher connection degraded, or the LLM API paused for an unusually long time between tokens. All three warrant user feedback.&lt;/p&gt;

&lt;p&gt;[Production Pitfall] Do not set &lt;code&gt;silenceMs&lt;/code&gt; below 8000. Claude’s API pauses several seconds between tokens during complex structured outputs before streaming resumes. We set ours to 5000 initially and got false timeout fires on longer document generation tasks. 10000 is the minimum safe floor for production. If you are generating very long documents, consider 15000.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;gaps&lt;/code&gt; array is diagnostic, not restorative. Pusher and Reverb do not offer event replay for private channels. Gap detection tells you whether a broadcast delivery problem contributed to an incomplete response, which helps you distinguish API truncation from infrastructure failure. Log it to your analytics or error tracker on every &lt;code&gt;onComplete&lt;/code&gt; call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Detecting and Recovering Orphaned Streams
&lt;/h2&gt;

&lt;p&gt;The orphan case does not surface at runtime. The queue worker is killed. The stream stays in &lt;code&gt;streaming&lt;/code&gt; status. The client timed out via the watchdog and showed the retry UI. But nothing tells the database the stream is dead.&lt;/p&gt;

&lt;p&gt;Checkpointing every 50 tokens gives you a heartbeat. An active stream always updates &lt;code&gt;last_checkpoint_at&lt;/code&gt; within a predictable window. Silence beyond that window is evidence of death.&lt;/p&gt;

&lt;p&gt;Register the cleanup command in &lt;code&gt;routes/console.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Schedule&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Schedule&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'streams:reap-orphans'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;everyFiveMinutes&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The command itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Console\Commands&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Events\LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\LlmStream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Console\Command&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReapOrphanedStreams&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$signature&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'streams:reap-orphans'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Mark streams with no checkpoint activity as orphaned.'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$orphans&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'streaming'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$q&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;$q&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'last_checkpoint_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;subMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                  &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;orWhere&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$q2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                      &lt;span class="nv"&gt;$q2&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;whereNull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'last_checkpoint_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                         &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'started_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;subMinutes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
                  &lt;span class="p"&gt;});&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$orphans&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$stream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$stream&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'error_message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Orphaned: no checkpoint activity for 2+ minutes.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'failed_at'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;LlmTokenReceived&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;streamId&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$stream&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;stream_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;        &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;sequence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$stream&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;last_sequence&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;       &lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;finishReason&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'orphaned'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;));&lt;/span&gt;

            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;line&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Reaped: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$stream&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;stream_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Reaped &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$orphans&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; orphaned stream(s)."&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two minutes is conservative. On a fast model with short prompts, 50 tokens arrive in seconds. If you are streaming long documents, increase the reap window to match, or reduce the checkpoint interval from 50 to 20 tokens.&lt;/p&gt;

&lt;p&gt;[Architect’s Note] The &lt;code&gt;last_checkpoint_at&lt;/code&gt; column is a poor man’s distributed heartbeat, and that is not a criticism. This is the same liveness-detection pattern event-driven systems have used for decades. If you later move to Laravel Horizon with job event hooks, you can correlate orphan detection directly with actual queue worker failures and remove the scheduler command. For most setups, the scheduled Artisan command approach costs nothing and requires no additional infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retry Strategy
&lt;/h2&gt;

&lt;p&gt;When a stream ends in &lt;code&gt;error&lt;/code&gt;, &lt;code&gt;truncated&lt;/code&gt;, or &lt;code&gt;dead&lt;/code&gt;, your user needs a retry button. Always generate a new &lt;code&gt;stream_id&lt;/code&gt; for each retry. Reusing the original &lt;code&gt;stream_id&lt;/code&gt; means the retry job broadcasts into the same channel the client already abandoned, and the partial content from attempt one gets mixed with attempt two in your database record. Neither is recoverable cleanly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Http\Controllers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Jobs\StreamLlmResponseJob&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\LlmStream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\JsonResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Str&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StreamRetryController&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Controller&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$streamId&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$original&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$streamId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;whereIn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'truncated'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'dead'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;firstOrFail&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nv"&gt;$newStreamId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nc"&gt;Str&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nc"&gt;LlmStream&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'stream_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$newStreamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'user_id'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$original&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'status'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'pending'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nc"&gt;StreamLlmResponseJob&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$newStreamId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$original&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'stream_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$newStreamId&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Route registration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Http\Controllers\StreamRetryController&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Route&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/streams/{streamId}/retry'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;StreamRetryController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original stream record stays in the database with its &lt;code&gt;status&lt;/code&gt; and &lt;code&gt;partial_content&lt;/code&gt; intact for audit purposes. The retry creates a clean record. The client receives the new &lt;code&gt;stream_id&lt;/code&gt; in the JSON response and calls &lt;code&gt;client.subscribe()&lt;/code&gt; fresh.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://origin-main.com/guides/laravel-sanctum-api-authentication/" rel="noopener noreferrer"&gt;If your streaming endpoints are not locked down with proper token-based authentication&lt;/a&gt;, the retry endpoint becomes an unauthenticated job dispatch surface. Sanctum’s API token middleware is the right guard here, not session auth, because the retry call typically originates from JavaScript, not a form submission.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Service Layer Is Not This Job’s Problem
&lt;/h2&gt;

&lt;p&gt;The job above handles the transport layer. It does not handle token budget management, provider fallback, cost attribution, or prompt construction. Those belong one layer up.&lt;/p&gt;

&lt;p&gt;If you are building this into a larger system, the &lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;production AI architecture for Laravel&lt;/a&gt; covers the contracts and driver abstraction layer that should wrap around the streaming job described here. The job should be thin: stream, checkpoint, broadcast. Everything else belongs in the service it calls.&lt;/p&gt;

&lt;p&gt;That separation also matters when your streamed output is structured. &lt;a href="https://origin-main.com/laravel-architecture/laravel-agentic-workflow-schema-validation/" rel="noopener noreferrer"&gt;Schema validation against LLM hallucinations&lt;/a&gt; belongs at the service layer. A truncated JSON response is harder to recover from than truncated prose. Catching a &lt;code&gt;max_tokens&lt;/code&gt; truncation in a partial JSON structure before it reaches your client is far cheaper than handling the parse error downstream.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring What Actually Matters
&lt;/h2&gt;

&lt;p&gt;Before going live, instrument three metrics from your &lt;code&gt;llm_streams&lt;/code&gt; table:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stream completion rate.&lt;/strong&gt; &lt;code&gt;status = 'complete'&lt;/code&gt; divided by total streams initiated, measured hourly. Below 95% in production warrants investigation. Below 90% is an incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Orphan rate.&lt;/strong&gt; Streams reaped by the Artisan command as a percentage of total streams. A spike here usually correlates with deployment restarts or queue worker OOM events. Check your Horizon metrics or system logs alongside it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gap frequency.&lt;/strong&gt; Log the &lt;code&gt;gaps&lt;/code&gt; array from client-side &lt;code&gt;onComplete&lt;/code&gt; callbacks via a lightweight analytics endpoint. Persistent gaps from specific geographic regions point to Pusher or Reverb delivery problems, not LLM API issues. The distinction matters: one is your infrastructure, the other is your vendor’s.&lt;/p&gt;

&lt;p&gt;Connecting these to &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel’s AI middleware for token tracking&lt;/a&gt; gives you cost visibility alongside reliability metrics. That combination is what you need to have an honest conversation about whether Echo-based streaming is worth its operational complexity for your specific use case.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>php</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Zero-Downtime Laravel Deployments with GitHub Actions: A Complete CI/CD Pipeline for Production</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Sat, 02 May 2026 17:30:48 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/zero-downtime-laravel-deployments-with-github-actions-a-complete-cicd-pipeline-for-production-37jf</link>
      <guid>https://dev.to/dewaldhugo/zero-downtime-laravel-deployments-with-github-actions-a-complete-cicd-pipeline-for-production-37jf</guid>
      <description>&lt;p&gt;Most Laravel teams have a deployment ritual that sounds reasonable until you think about it carefully. Push to &lt;code&gt;main&lt;/code&gt;. SSH into the server. Run &lt;code&gt;git pull &amp;amp;&amp;amp; composer install&lt;/code&gt;. Fire off &lt;code&gt;php artisan migrate&lt;/code&gt;. Restart the queue. Hope the app comes back cleanly. It often does. When it does not, you are debugging in production, in front of users, mid-request.&lt;/p&gt;

&lt;p&gt;Zero-downtime Laravel deployments with GitHub Actions break that cycle. The strategy borrows Envoyer’s core mechanic (timestamped release directories and an atomic symlink swap) and pairs it with a CI/CD pipeline that runs your full test suite before anything reaches production, verifies the live deployment against a real health endpoint, and rolls back automatically when that check fails. No Envoyer license. No Laravel Forge dependency. Full control over every step.&lt;/p&gt;

&lt;p&gt;One scoping note before we get into it: this article assumes a server that is already provisioned, Nginx-configured, and accessible over SSH with a deploy user. If you are building that foundation, start with the &lt;a href="https://origin-main.com/guides/deploy-laravel-to-production/" rel="noopener noreferrer"&gt;complete Laravel production deployment guide&lt;/a&gt; first. This article picks up where that one leaves off.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the “Pull and Pray” Approach Eventually Breaks
&lt;/h2&gt;

&lt;p&gt;The problem with a naive &lt;code&gt;git pull + composer install&lt;/code&gt; deploy is not that it is lazy. The problem is that it is a long, destructive operation performed against a live directory. Composer can take 30 to 90 seconds to resolve and install packages. During that window, your application is in an inconsistent state. Old PHP files from the previous release sit alongside whatever files git has already updated. If a new class is referenced by code that has already landed on disk but Composer has not yet installed its package, you get a fatal. Users get a 500.&lt;/p&gt;

&lt;p&gt;Migrations compound the problem. Running &lt;code&gt;php artisan migrate&lt;/code&gt; against a live database means the schema is changing while requests are in flight. An &lt;code&gt;ALTER TABLE&lt;/code&gt; that adds a &lt;code&gt;NOT NULL&lt;/code&gt; column can acquire a table lock on MySQL. Even on PostgreSQL, you are relying on the migration being fast enough that no concurrent request trips over the constraint change mid-transaction.&lt;/p&gt;

&lt;p&gt;Queue workers are the third failure mode, and the one teams notice last. Workers running old code against a migrated schema produce type errors, silent write failures, or jobs that retry indefinitely. We have seen this cause lost inference jobs (queued API calls to Anthropic or OpenAI) that were mid-retry when the schema changed underneath them. By the time someone noticed, the queue had thousands of failed jobs and no useful error message beyond a column-not-found exception.&lt;/p&gt;

&lt;p&gt;The fix is conceptually simple: prepare the new release while the old one is still serving traffic, then swap in a single atomic operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Zero-Downtime Deployments Actually Work
&lt;/h2&gt;

&lt;p&gt;Everything depends on the directory structure on the server. Instead of a single &lt;code&gt;/var/www/myapp&lt;/code&gt; directory modified in place, you maintain three:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/var/www/myapp/
├── current -&amp;gt; releases/20260502143012_a1b2c3d   (symlink)
├── releases/
│   ├── 20260502143012_a1b2c3d/
│   ├── 20260430091201_f4e5d6a/
│   └── 20260428181345_9c8b7a6/
└── shared/
    ├── .env
    └── storage/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;current&lt;/code&gt; is a symlink. Nginx’s document root points at &lt;code&gt;/var/www/myapp/current/public&lt;/code&gt;. Each deployment creates a new timestamped directory in &lt;code&gt;releases/&lt;/code&gt;, prepares it fully (dependencies installed, assets built, framework caches generated), and then swaps &lt;code&gt;current&lt;/code&gt; to point to the new directory. The swap itself is a single filesystem rename call. From Nginx’s perspective, it is instantaneous.&lt;/p&gt;

&lt;p&gt;Shared state lives outside all release directories in &lt;code&gt;shared/&lt;/code&gt;. Your &lt;code&gt;.env&lt;/code&gt; file and the &lt;code&gt;storage/&lt;/code&gt; directory are symlinked into each release. Uploads, runtime cache files, and logs persist across every deployment.&lt;/p&gt;

&lt;p&gt;The Nginx configuration for this pattern uses &lt;code&gt;$realpath_root&lt;/code&gt; rather than &lt;code&gt;$document_root&lt;/code&gt;. This is not optional:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="k"&gt;server&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;listen&lt;/span&gt; &lt;span class="mi"&gt;443&lt;/span&gt; &lt;span class="s"&gt;ssl&lt;/span&gt; &lt;span class="s"&gt;http2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;server_name&lt;/span&gt; &lt;span class="s"&gt;myapp.com&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kn"&gt;root&lt;/span&gt; &lt;span class="n"&gt;/var/www/myapp/current/public&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kn"&gt;index&lt;/span&gt; &lt;span class="s"&gt;index.php&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;try_files&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="n"&gt;/&lt;/span&gt; &lt;span class="n"&gt;/index.php?&lt;/span&gt;&lt;span class="nv"&gt;$query_string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kn"&gt;location&lt;/span&gt; &lt;span class="p"&gt;~&lt;/span&gt; &lt;span class="sr"&gt;\.php$&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kn"&gt;fastcgi_pass&lt;/span&gt; &lt;span class="s"&gt;unix:/var/run/php/php8.3-fpm.sock&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;fastcgi_param&lt;/span&gt; &lt;span class="s"&gt;SCRIPT_FILENAME&lt;/span&gt; &lt;span class="nv"&gt;$realpath_root$fastcgi_script_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kn"&gt;include&lt;/span&gt; &lt;span class="s"&gt;fastcgi_params&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;$realpath_root&lt;/code&gt; resolves the symlink before handing the path to PHP-FPM. Without it, OPcache may hold inode references to files in the old release directory after the swap, serving stale bytecode until the cache expires or the process restarts. Reloading PHP-FPM (not restarting it) after the symlink swap flushes OPcache’s resolved paths gracefully, with no dropped connections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the GitHub Actions Pipeline
&lt;/h2&gt;

&lt;p&gt;The pipeline has two jobs: &lt;code&gt;test&lt;/code&gt; and &lt;code&gt;deploy&lt;/code&gt;. The deploy job does not run unless tests pass. GitHub’s &lt;code&gt;environment&lt;/code&gt; feature gives you an optional manual approval gate for production, useful for regulated contexts. The &lt;a href="https://docs.github.com/en/actions/deployment/targeting-different-environments/using-environments-for-deployment" rel="noopener noreferrer"&gt;GitHub Actions documentation on deployment environments&lt;/a&gt; covers protection rules in detail.&lt;/p&gt;

&lt;p&gt;A reliable test suite is the actual quality gate between your code and production. &lt;a href="https://origin-main.com/laravel/building-robust-test-factories-in-laravel/" rel="noopener noreferrer"&gt;Building robust test factories&lt;/a&gt; gives you the factory-backed integration test coverage that makes a CI gate meaningful rather than ceremonial.&lt;/p&gt;

&lt;p&gt;Here is the complete workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/deploy.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Test and Deploy&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;PHP_VERSION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;8.3'&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Test Suite&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;mysql&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql:8.0&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;MYSQL_ROOT_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
          &lt;span class="na"&gt;MYSQL_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;testing&lt;/span&gt;
        &lt;span class="na"&gt;options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;-&lt;/span&gt;
          &lt;span class="s"&gt;--health-cmd="mysqladmin ping"&lt;/span&gt;
          &lt;span class="s"&gt;--health-interval=10s&lt;/span&gt;
          &lt;span class="s"&gt;--health-timeout=5s&lt;/span&gt;
          &lt;span class="s"&gt;--health-retries=3&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;3306:3306&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Setup PHP ${{ env.PHP_VERSION }}&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;shivammathur/setup-php@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;php-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ env.PHP_VERSION }}&lt;/span&gt;
          &lt;span class="na"&gt;extensions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mbstring, pdo, pdo_mysql, redis, pcov&lt;/span&gt;
          &lt;span class="na"&gt;coverage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pcov&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Cache Composer dependencies&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/cache@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vendor&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;composer-${{ hashFiles('**/composer.lock') }}&lt;/span&gt;
          &lt;span class="na"&gt;restore-keys&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;composer-&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Composer dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;composer install --no-progress --prefer-dist --no-interaction&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prepare test environment&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;cp .env.testing .env&lt;/span&gt;
          &lt;span class="s"&gt;php artisan key:generate&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run migrations against test database&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;php artisan migrate --force&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;DB_CONNECTION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
          &lt;span class="na"&gt;DB_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;DB_PORT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;
          &lt;span class="na"&gt;DB_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;testing&lt;/span&gt;
          &lt;span class="na"&gt;DB_USERNAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
          &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Execute test suite&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;php artisan test --parallel&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;DB_CONNECTION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;mysql&lt;/span&gt;
          &lt;span class="na"&gt;DB_HOST&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
          &lt;span class="na"&gt;DB_PORT&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3306&lt;/span&gt;
          &lt;span class="na"&gt;DB_DATABASE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;testing&lt;/span&gt;
          &lt;span class="na"&gt;DB_USERNAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
          &lt;span class="na"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;

  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to Production&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
    &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;group&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production-deploy&lt;/span&gt;
      &lt;span class="na"&gt;cancel-in-progress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout code&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run deployment script via SSH&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/ssh-action@v1.0.3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_USER }}&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_PORT }}&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/deploy/scripts/deploy.sh ${{ github.sha }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Verify health check&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;sleep 15&lt;/span&gt;
          &lt;span class="s"&gt;HTTP_STATUS=$(curl --silent --output /dev/null \&lt;/span&gt;
            &lt;span class="s"&gt;--write-out "%{http_code}" \&lt;/span&gt;
            &lt;span class="s"&gt;--max-time 10 \&lt;/span&gt;
            &lt;span class="s"&gt;https://${{ secrets.APP_DOMAIN }}/up)&lt;/span&gt;
          &lt;span class="s"&gt;if [ "$HTTP_STATUS" -ne 200 ]; then&lt;/span&gt;
            &lt;span class="s"&gt;echo "Health check returned HTTP $HTTP_STATUS — triggering rollback"&lt;/span&gt;
            &lt;span class="s"&gt;exit 1&lt;/span&gt;
          &lt;span class="s"&gt;fi&lt;/span&gt;
          &lt;span class="s"&gt;echo "Deployment verified: HTTP $HTTP_STATUS"&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rollback on failure&lt;/span&gt;
        &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;failure()&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/ssh-action@v1.0.3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_USER }}&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_PORT }}&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/home/deploy/scripts/rollback.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to highlight. The &lt;code&gt;concurrency&lt;/code&gt; block with &lt;code&gt;cancel-in-progress: false&lt;/code&gt; prevents a rapid second push from interrupting a deployment that is already in flight. Two concurrent deploys racing to swap the same symlink is a scenario you do not want to debug at 2am. The &lt;code&gt;needs: test&lt;/code&gt; key creates a hard dependency: no green tests, no deploy job runs, full stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deployment Script: Symlink-Based Releases
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;deploy.sh&lt;/code&gt; script does the actual work on the production server. It accepts the commit SHA from GitHub Actions so the release directory is traceable back to a specific commit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;COMMIT_SHA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;1&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;HEAD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;APP_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/www/myapp"&lt;/span&gt;
&lt;span class="nv"&gt;RELEASES_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/releases"&lt;/span&gt;
&lt;span class="nv"&gt;SHARED_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/shared"&lt;/span&gt;
&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/current"&lt;/span&gt;
&lt;span class="nv"&gt;RELEASE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASES_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;COMMIT_SHA&lt;/span&gt;:0:7&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;KEEP_RELEASES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Creating release directory"&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASES_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Cloning at &lt;/span&gt;&lt;span class="nv"&gt;$COMMIT_SHA&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
git clone &lt;span class="nt"&gt;--depth&lt;/span&gt; 1 git@github.com:yourorg/yourapp.git &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
git checkout &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$COMMIT_SHA&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Linking shared resources"&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;/storage"&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sfn&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SHARED_DIR&lt;/span&gt;&lt;span class="s2"&gt;/storage"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;/storage"&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-sfn&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SHARED_DIR&lt;/span&gt;&lt;span class="s2"&gt;/.env"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;/.env"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Installing Composer dependencies"&lt;/span&gt;
composer &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-dev&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--no-interaction&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--prefer-dist&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--optimize-autoloader&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Building frontend assets"&lt;/span&gt;
npm ci &lt;span class="nt"&gt;--silent&lt;/span&gt;
npm run build

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Running database migrations"&lt;/span&gt;
php artisan migrate &lt;span class="nt"&gt;--force&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Warming framework caches"&lt;/span&gt;
php artisan optimize

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Swapping symlink atomically"&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.next"&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; &lt;span class="nt"&gt;--no-target-directory&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.next"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_LINK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Reloading PHP-FPM"&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl reload php8.3-fpm

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Restarting long-running services"&lt;/span&gt;
php &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_LINK&lt;/span&gt;&lt;span class="s2"&gt;/artisan"&lt;/span&gt; reload

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Pruning old releases (keeping &lt;/span&gt;&lt;span class="nv"&gt;$KEEP_RELEASES&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASES_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s2"&gt;"+&lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;KEEP_RELEASES &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | xargs &lt;span class="nt"&gt;--no-run-if-empty&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Release complete: &lt;/span&gt;&lt;span class="nv"&gt;$RELEASE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The symlink swap line deserves its own explanation because the naive version contains a subtle bug. &lt;code&gt;ln -sfn "$RELEASE_PATH" "$CURRENT_LINK"&lt;/code&gt; looks correct, but when &lt;code&gt;$CURRENT_LINK&lt;/code&gt; already resolves to a directory (which it does after the first deploy), some &lt;code&gt;ln&lt;/code&gt; implementations create the new symlink inside the target directory rather than replacing the link itself. The &lt;code&gt;ln&lt;/code&gt; followed by &lt;code&gt;mv --no-target-directory&lt;/code&gt; pattern sidesteps this entirely. On Linux, &lt;code&gt;mv&lt;/code&gt; on the same filesystem is a kernel-level atomic rename. There is no window between the unlink and the new symlink creation where a request could resolve to a missing path.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;php artisan optimize&lt;/code&gt; is the Laravel 12 shorthand for running &lt;code&gt;config:cache&lt;/code&gt;, &lt;code&gt;route:cache&lt;/code&gt;, &lt;code&gt;view:cache&lt;/code&gt;, and &lt;code&gt;event:cache&lt;/code&gt; in sequence. Run it inside the new release directory, before the symlink swap, so the first request to the new release hits a fully warm cache rather than triggering cold compilation on a live request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migrations, Cache Warming, and Service Restarts
&lt;/h2&gt;

&lt;p&gt;Notice that &lt;code&gt;php artisan migrate --force&lt;/code&gt; runs before the symlink swap. This is deliberate, and it introduces a constraint that every team using this pattern must internalise.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Production Pitfall]&lt;/strong&gt; When migrations run before the symlink swap, both old and new application code handle requests simultaneously during the migration window. The old release is still &lt;code&gt;current&lt;/code&gt; while the database schema changes. This means every migration you write must be strictly backward-compatible with the previous release. Adding a nullable column: safe. Dropping a column the old code still reads: immediate 500s in production on every affected query during the migration window. Renaming a column that old code references: same result, every time.&lt;/p&gt;

&lt;p&gt;The pattern that works is two-phase migration: deploy one adds the new column (old code ignores it, new code writes to it), deploy two removes the old column after the transition is complete and no live code references it. This discipline is enforced by services like PlanetScale’s branching workflow and documented explicitly in the &lt;a href="https://laravel.com/docs/12.x/deployment" rel="noopener noreferrer"&gt;Laravel deployment documentation&lt;/a&gt;. Treat it as a hard rule, not a guideline.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For service restarts, &lt;code&gt;php artisan reload&lt;/code&gt; is the correct command in Laravel 12. It sends a graceful termination signal to queue workers, Laravel Reverb servers, and Octane processes in a single call. Workers finish their current job before exiting. Supervisor (or your process monitor of choice) detects the exit and starts a fresh worker process, which picks up &lt;code&gt;/var/www/myapp/current/artisan&lt;/code&gt; — now pointing at the new release.&lt;/p&gt;

&lt;p&gt;Supervisor configuration for your worker pool should use the &lt;code&gt;current&lt;/code&gt; symlink path, not a hardcoded release directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[program:laravel-worker]&lt;/span&gt;
&lt;span class="py"&gt;process_name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;%(program_name)s_%(process_num)02d&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;php /var/www/myapp/current/artisan queue:work &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="s"&gt;--sleep=3 &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="s"&gt;--tries=3 &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;  &lt;span class="s"&gt;--max-time=3600&lt;/span&gt;
&lt;span class="py"&gt;autostart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;autorestart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;stopasgroup&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;killasgroup&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;user&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;www-data&lt;/span&gt;
&lt;span class="py"&gt;numprocs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;4&lt;/span&gt;
&lt;span class="py"&gt;redirect_stderr&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;stdout_logfile&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/var/log/supervisor/laravel-worker.log&lt;/span&gt;
&lt;span class="py"&gt;stopwaitsecs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3620&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;stopwaitsecs&lt;/code&gt; must exceed &lt;code&gt;--max-time&lt;/code&gt;. If Supervisor sends SIGTERM and the worker has not exited within &lt;code&gt;stopwaitsecs&lt;/code&gt; seconds, it escalates to SIGKILL. A SIGKILL mid-job is a lost job. Set &lt;code&gt;stopwaitsecs&lt;/code&gt; to &lt;code&gt;--max-time + 20&lt;/code&gt; at minimum.&lt;/p&gt;

&lt;p&gt;If you are running Laravel Horizon for queue management, the &lt;code&gt;reload&lt;/code&gt; command handles it. Horizon monitors its own worker pool and restarts processes automatically after they exit gracefully. No Supervisor configuration changes are needed for Horizon workers specifically, though you will still want Supervisor managing the &lt;code&gt;horizon&lt;/code&gt; process itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Health Checks and Automatic Rollback
&lt;/h2&gt;

&lt;p&gt;By default, the health check route is served at &lt;code&gt;/up&lt;/code&gt; and returns HTTP 200 when the application has booted without exceptions, and HTTP 500 otherwise. It is registered in &lt;code&gt;bootstrap/app.php&lt;/code&gt; via the &lt;code&gt;withRouting()&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// bootstrap/app.php&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Application&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;basePath&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;__DIR__&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withRouting&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt;&lt;span class="mf"&gt;.&lt;/span&gt;&lt;span class="s1"&gt;'/../routes/web.php'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;commands&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt;&lt;span class="mf"&gt;.&lt;/span&gt;&lt;span class="s1"&gt;'/../routes/console.php'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;health&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'/up'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Middleware&lt;/span&gt; &lt;span class="nv"&gt;$middleware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withExceptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Exceptions&lt;/span&gt; &lt;span class="nv"&gt;$handler&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;//&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When HTTP requests are made to this route, Laravel dispatches a &lt;code&gt;DiagnosingHealth&lt;/code&gt; event, allowing you to perform additional health checks. Within a listener for this event, you can check your application’s database or cache status, and throwing an exception will cause the endpoint to return 500.&lt;/p&gt;

&lt;p&gt;For production, register a listener that verifies database connectivity and your primary cache store. A bootstrap-only health check catches a broken Service Container but misses a misconfigured database driver, which is a common failure mode after an environment variable change.&lt;/p&gt;

&lt;p&gt;The pipeline waits 15 seconds after the SSH deploy step before hitting &lt;code&gt;/up&lt;/code&gt;. That window absorbs PHP-FPM’s reload time and the brief period before Nginx resolves the new symlink on the next request cycle. Adjust it upward if your server is under high load during deployments.&lt;/p&gt;

&lt;p&gt;The rollback script is intentionally simple. It does not run migrations, does not install dependencies, and does not build assets. It only swaps the symlink back and restarts services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;APP_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/www/myapp"&lt;/span&gt;
&lt;span class="nv"&gt;RELEASES_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/releases"&lt;/span&gt;
&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/current"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Identifying previous release"&lt;/span&gt;
&lt;span class="nv"&gt;PREVIOUS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASES_DIR&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s1"&gt;'2p'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PREVIOUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ERROR: No previous release found. Cannot roll back."&lt;/span&gt;
  &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Rolling back to: &lt;/span&gt;&lt;span class="nv"&gt;$PREVIOUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RELEASES_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$PREVIOUS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.rollback"&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; &lt;span class="nt"&gt;--no-target-directory&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CURRENT_LINK&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.rollback"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_LINK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Reloading PHP-FPM"&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl reload php8.3-fpm

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Restarting services on previous release"&lt;/span&gt;
php &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$APP_DIR&lt;/span&gt;&lt;span class="s2"&gt;/current/artisan"&lt;/span&gt; reload

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"==&amp;gt; Rollback complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After rollback, &lt;code&gt;current&lt;/code&gt; points to the second-most-recent release. Workers restart and load that release’s code. If the health check failure was caused by a broken migration (not just a code error), the rollback restores application behaviour but the database schema remains in its post-migration state. Plan for this. Automated rollback is a traffic restoration tool, not a database recovery mechanism. If you need genuine point-in-time database state restored, you need a pre-migration snapshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  Secrets, Environments, and the .env Problem
&lt;/h2&gt;

&lt;p&gt;Never commit &lt;code&gt;.env&lt;/code&gt; to your repository. The shared &lt;code&gt;.env&lt;/code&gt; file on the server should be provisioned once, manually or through a secrets management tool, and then left in place across all deployments. This is one area where a well-defined infrastructure provisioning strategy pays off directly. The &lt;a href="https://origin-main.com/devops/laravel-infrastructure-as-code-reliable-systems/" rel="noopener noreferrer"&gt;Infrastructure as Code guide for Laravel teams&lt;/a&gt; covers managing server state deterministically, including initial environment file provisioning and ongoing drift prevention.&lt;/p&gt;

&lt;p&gt;Configure these secrets in GitHub under &lt;strong&gt;Settings &amp;gt; Secrets and variables &amp;gt; Actions&lt;/strong&gt;, scoped to the production environment:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Secret&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DEPLOY_HOST&lt;/td&gt;
&lt;td&gt;Production server IP or hostname&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DEPLOY_USER&lt;/td&gt;
&lt;td&gt;SSH user with write access to releases dir&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DEPLOY_KEY&lt;/td&gt;
&lt;td&gt;Private SSH key (Ed25519 recommended)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DEPLOY_PORT&lt;/td&gt;
&lt;td&gt;SSH port (default: 22)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;APP_DOMAIN&lt;/td&gt;
&lt;td&gt;Application domain for the health check curl&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The deploy user should own &lt;code&gt;/var/www/myapp/releases/&lt;/code&gt; and have a narrow &lt;code&gt;sudoers&lt;/code&gt; entry permitting only the &lt;code&gt;systemctl reload php8.3-fpm&lt;/code&gt; command. Do not deploy as &lt;code&gt;root&lt;/code&gt;. This is non-negotiable.&lt;/p&gt;

&lt;p&gt;For teams running multiple environments (staging and production), create separate GitHub environments with separate secret sets. Keeping your local development stack as close to production as possible reduces the chance that an environment-specific failure makes it to the pipeline at all. The &lt;a href="https://origin-main.com/workspace/laravel-ai-development-stack-setup/" rel="noopener noreferrer"&gt;Laravel AI development stack setup guide&lt;/a&gt; covers environment parity for teams running AI-integrated Laravel applications, where model API keys, inference configuration, and queue drivers all need consistent configuration across environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running the Full Pipeline
&lt;/h2&gt;

&lt;p&gt;With &lt;code&gt;.github/workflows/deploy.yml&lt;/code&gt; committed, &lt;code&gt;deploy.sh&lt;/code&gt; and &lt;code&gt;rollback.sh&lt;/code&gt; on the server at &lt;code&gt;/home/deploy/scripts/&lt;/code&gt;, and GitHub secrets configured, the deployment cycle is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Developer pushes to &lt;code&gt;main&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; GitHub Actions starts the &lt;code&gt;test&lt;/code&gt; job with a MySQL service container&lt;/li&gt;
&lt;li&gt; Pest/PHPUnit runs against a fresh migrated test database in CI&lt;/li&gt;
&lt;li&gt; On green: the &lt;code&gt;deploy&lt;/code&gt; job starts (pending any environment approval rules you have set)&lt;/li&gt;
&lt;li&gt; SSH executes &lt;code&gt;deploy.sh&lt;/code&gt; with the commit SHA on the production server&lt;/li&gt;
&lt;li&gt; A new timestamped release directory is created, the repository cloned at that exact commit&lt;/li&gt;
&lt;li&gt; Composer installs production dependencies, assets build&lt;/li&gt;
&lt;li&gt; Migrations run against the production database (old release still serving)&lt;/li&gt;
&lt;li&gt; &lt;code&gt;php artisan optimize&lt;/code&gt; warms the framework cache inside the new release directory&lt;/li&gt;
&lt;li&gt; Symlink atomically swaps to the new release&lt;/li&gt;
&lt;li&gt; PHP-FPM reloads, services restart gracefully via &lt;code&gt;php artisan reload&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; GitHub Actions waits 15 seconds, then hits &lt;code&gt;https://yourdomain.com/up&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; HTTP 200: deployment succeeds, pipeline passes&lt;/li&gt;
&lt;li&gt; Non-200: &lt;code&gt;rollback.sh&lt;/code&gt; executes, symlink reverts, services restart on the previous release&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The cycle from push to verified production deployment runs in roughly 3 to 5 minutes, depending on Composer install time and asset build complexity. That is fast enough to ship multiple times per day without ceremony or coordination overhead.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;git pull&lt;/code&gt; approach worked for years. But it is a manual process dressed up as a system, and manual processes fail the same way every time: at the worst possible moment, on the highest-traffic day of the quarter. This pipeline is not perfect either. It is a single-server pattern that needs adaptation when you scale horizontally, and at that point, purpose-built tools like Deployer PHP or a blue-green deployment model via your load balancer justify the added complexity. For teams running one to three application servers, this handles the vast majority of production deployments cleanly, automatically, and with a recoverable failure path. That is the bar to clear.&lt;/p&gt;

&lt;p&gt;If your deployment pipeline grows to include pre-deploy validation steps such as AI inference middleware health checks or token budget verification, the job structure here accommodates additional pipeline steps naturally. The &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel AI middleware guide covering token tracking and rate limiting&lt;/a&gt; describes middleware patterns that translate directly into pre-deployment validation hooks for applications that depend on AI provider availability.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>github</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>OpenAI Structured Outputs in Laravel: Enforcing JSON Schema for Production AI Pipelines</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Sat, 02 May 2026 16:13:19 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/openai-structured-outputs-in-laravel-enforcing-json-schema-for-production-ai-pipelines-5f1p</link>
      <guid>https://dev.to/dewaldhugo/openai-structured-outputs-in-laravel-enforcing-json-schema-for-production-ai-pipelines-5f1p</guid>
      <description>&lt;p&gt;If you've been using OpenAI's JSON mode and calling it "structured output," you're building on a guarantee that's weaker than it looks. JSON mode promises syntactically valid JSON. It does not promise that &lt;code&gt;status&lt;/code&gt; will be a string, that &lt;code&gt;confidence&lt;/code&gt; will be a float, or that &lt;code&gt;items&lt;/code&gt; won't quietly disappear from the response entirely. In production, that distinction costs you. We've watched pipelines collapse at 3am because a field the downstream queue worker expected simply wasn't there, and JSON mode had no complaint.&lt;/p&gt;

&lt;p&gt;Using OpenAI structured outputs in Laravel, specifically &lt;code&gt;response_format&lt;/code&gt; with &lt;code&gt;type: json_schema&lt;/code&gt; and &lt;code&gt;strict: true&lt;/code&gt;, closes that gap. Schema compliance is enforced at the generation level using constrained decoding. The model literally cannot emit a token that would violate your schema. This article covers how to wire that up correctly in Laravel, what the real failure modes are (they're not what you'd expect), and how structured outputs slot into an agentic pipeline where downstream steps depend on typed, predictable payloads.&lt;/p&gt;

&lt;p&gt;This article focuses specifically on OpenAI's implementation. Gemini exposes the same concept via &lt;code&gt;generationConfig.responseSchema&lt;/code&gt;, and Claude approaches it through tool definitions. The JSON Schema structure is largely portable across all three. What differs is the implementation layer: how each provider receives the schema, the refusal mechanism, strict mode constraints, and the error surface. Those differences warrant their own focused articles. If you want the cross-provider picture first, the &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-integration-architecture-guide/" rel="noopener noreferrer"&gt;Laravel AI integration architecture guide&lt;/a&gt; covers how they compare at the architectural level.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JSON Mode Is No Longer Good Enough
&lt;/h2&gt;

&lt;p&gt;The table below is worth keeping. The differences are subtle on paper and catastrophic under load.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;json_object&lt;/code&gt; mode&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;json_schema&lt;/code&gt; strict mode&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Guarantees valid JSON&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guarantees all required keys present&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guarantees correct types&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enforces enum values&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enforces &lt;code&gt;additionalProperties: false&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exposes &lt;code&gt;refusal&lt;/code&gt; field&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliant with JSON Schema spec subset&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;JSON mode guarantees syntactic correctness but does not guarantee schema adherence, even fields you mark as "required" in your prompt can go missing from the response. OpenAI's own documentation now treats &lt;code&gt;json_object&lt;/code&gt; mode as legacy. Strict mode with &lt;code&gt;json_schema&lt;/code&gt; is the production default for data extraction and agentic workflows.&lt;/p&gt;

&lt;p&gt;If your Laravel application is making decisions (routing jobs, updating records, triggering downstream agents) based on AI-generated JSON, you need the stronger guarantee.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OpenAI Structured Outputs Work Under the Hood
&lt;/h2&gt;

&lt;p&gt;Understanding the mechanism matters because it explains both the capability and the constraints.&lt;/p&gt;

&lt;p&gt;OpenAI uses a Context-Free Grammar engine to mask invalid tokens before generation, the model literally cannot produce a non-conforming response. This is not prompt-level enforcement. No amount of instruction-following or fine-tuning does what constrained decoding does. The schema is compiled into the generation process itself.&lt;/p&gt;

&lt;p&gt;Safety policies still apply. The model will abide by its existing safety rules and may refuse an unsafe request. When it does, it returns a &lt;code&gt;refusal&lt;/code&gt; string value on the response, allowing you to programmatically detect that the model generated a refusal instead of schema-conforming output.&lt;/p&gt;

&lt;p&gt;Two things to internalise before writing a line of code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If the model &lt;em&gt;can&lt;/em&gt; answer, it &lt;em&gt;will&lt;/em&gt; conform to your schema. That's the guarantee.&lt;/li&gt;
&lt;li&gt;If the model &lt;em&gt;won't&lt;/em&gt; answer (safety refusal), you get a &lt;code&gt;refusal&lt;/code&gt; field instead of &lt;code&gt;content&lt;/code&gt;. You must handle both branches explicitly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Defining Your JSON Schema in Laravel
&lt;/h2&gt;

&lt;p&gt;Schema definitions scattered across service classes are a maintenance problem. Keep them in a dedicated config file: the Laravel-native home for static, environment-aware definitions that require no class instantiation to access.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="c1"&gt;// config/ai-schemas.php&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;

    &lt;span class="s1"&gt;'review_analysis'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'review_analysis'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'strict'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'schema'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'type'&lt;/span&gt;                 &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'object'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'additionalProperties'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'required'&lt;/span&gt;             &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'sentiment'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'confidence'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'summary'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'flags'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'escalate'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'properties'&lt;/span&gt;           &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'sentiment'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'enum'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'positive'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'negative'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'neutral'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'mixed'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s1"&gt;'confidence'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'number'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s1"&gt;'summary'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s1"&gt;'flags'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'type'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'array'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'items'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s1"&gt;'escalate'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'boolean'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'null'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull it into your service with a single config call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'response_format'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'type'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'json_schema'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'json_schema'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-schemas.review_analysis'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Config is publishable, overridable per environment, and version-controlled alongside your codebase. When your schema changes, you change one file. No class to instantiate, no namespace to import.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strict Mode Constraints You Must Know
&lt;/h3&gt;

&lt;p&gt;OpenAI's strict mode is enforced at the generation level using constrained decoding, not at the prompt level. That distinction matters because it means the constraints are hard, not advisory. The API will reject your request outright if your schema violates them, so know the rules before you deploy.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;additionalProperties&lt;/code&gt; must be set to &lt;code&gt;false&lt;/code&gt; for every object in the schema, and all fields in &lt;code&gt;properties&lt;/code&gt; must appear in &lt;code&gt;required&lt;/code&gt;. If your schema violates either rule, the API rejects the request immediately. You won't get a gracefully degraded response, you'll get an HTTP error.&lt;/p&gt;

&lt;p&gt;The following JSON Schema keywords are not supported and will cause the API to reject your request:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;default&lt;/code&gt; values&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minLength&lt;/code&gt; / &lt;code&gt;maxLength&lt;/code&gt; on strings&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;minimum&lt;/code&gt; / &lt;code&gt;maximum&lt;/code&gt; on numbers&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pattern&lt;/code&gt; for regex constraints&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;if&lt;/code&gt; / &lt;code&gt;then&lt;/code&gt; / &lt;code&gt;else&lt;/code&gt; conditional schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Work within these constraints, not around them. Post-API validation in Laravel (covered below) handles the domain rules that OpenAI won't enforce.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;em&gt;&lt;strong&gt;[Edge Case Alert]&lt;/strong&gt; JSON Schema &lt;code&gt;$defs&lt;/code&gt; are supported by strict mode for reusable sub-schemas. If you reference a &lt;code&gt;$def&lt;/code&gt; that isn't declared in the same schema object, the API will reject the request silently in some SDK versions rather than returning a useful error. Always validate complex schemas locally before deploying. A unit test that calls &lt;code&gt;json_validate()&lt;/code&gt; on your schema definition is cheap insurance.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Optional Fields
&lt;/h3&gt;

&lt;p&gt;This is where developers get tripped up. You cannot simply omit a field from &lt;code&gt;required&lt;/code&gt;, strict mode requires every property to be required. The idiomatic solution is to make the type nullable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'escalate'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'type'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'boolean'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'null'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model can now return &lt;code&gt;null&lt;/code&gt; for &lt;code&gt;escalate&lt;/code&gt;, which your PHP code handles cleanly. Do not try to work around this by removing optional fields from your schema entirely; you lose the benefit of a complete, predictable contract.&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the API Call with Strict Schema Enforcement
&lt;/h2&gt;

&lt;p&gt;Wrap the API call in a dedicated service. This is not optional in production. You need a single place to apply retry logic, log token usage, and handle the response contract.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\AI\Services&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Exceptions\AI\ModelRefusalException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Exceptions\AI\TruncatedResponseException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Laravel\Facades\OpenAI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Exceptions\TransporterException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Exceptions\ErrorException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReviewAnalysisService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="no"&gt;MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="no"&gt;MAX_RETRIES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="no"&gt;RETRY_DELAY_MS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;analyse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$reviewText&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$attempt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$attempt&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                    &lt;span class="s1"&gt;'model'&lt;/span&gt;           &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'messages'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="p"&gt;[&lt;/span&gt;
                            &lt;span class="s1"&gt;'role'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'system'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'You are a review analysis assistant. Analyse the provided customer review and respond according to the schema.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="p"&gt;],&lt;/span&gt;
                        &lt;span class="p"&gt;[&lt;/span&gt;
                            &lt;span class="s1"&gt;'role'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$reviewText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="s1"&gt;'response_format'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                        &lt;span class="s1"&gt;'type'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'json_schema'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'json_schema'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-schemas.review_analysis'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="s1"&gt;'max_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;]);&lt;/span&gt;

                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;parseResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ErrorException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Rate limit: 429, server error: 5xx&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getCode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getCode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="nb"&gt;usleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RETRY_DELAY_MS&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$attempt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                        &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;

                &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OpenAI structured output error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="s1"&gt;'code'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getCode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="p"&gt;]);&lt;/span&gt;

                &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TransporterException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="c1"&gt;// Network-level failure, retry&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$attempt&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;MAX_RETRIES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="nb"&gt;usleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RETRY_DELAY_MS&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$attempt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                    &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;\RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Max retries exceeded for structured output request.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;parseResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;mixed&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="c1"&gt;// Handle truncation before checking refusal&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$choice&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;finishReason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'length'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TruncatedResponseException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s1"&gt;'Structured output response was truncated. Increase max_tokens or simplify the schema.'&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$choice&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$message&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;refusal&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ModelRefusalException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$message&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;refusal&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;json_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$message&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;JSON_THROW_ON_ERROR&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register this service in the Service Container via a provider, and inject it where needed. Never reach for the facade directly in controllers if you can avoid it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Failure Modes You Must Handle
&lt;/h2&gt;

&lt;p&gt;Schema enforcement eliminates the &lt;em&gt;malformed JSON&lt;/em&gt; failure class entirely. But two failure modes survive strict mode, and both require explicit handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Refusals
&lt;/h3&gt;

&lt;p&gt;When the model declines to answer due to a safety policy, &lt;code&gt;message.content&lt;/code&gt; is &lt;code&gt;null&lt;/code&gt; and &lt;code&gt;message.refusal&lt;/code&gt; contains the refusal explanation. Do not treat this as a generic exception, it's a first-class response state. Log it separately, because a spike in refusals often signals a prompt injection attempt or a shift in input data quality.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Exceptions\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ModelRefusalException&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;\RuntimeException&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$refusalReason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;parent&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Model refused the request: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$refusalReason&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In production, we've routed these to a separate monitoring channel. A refusal on a data-extraction job is almost always either a malformed upstream payload or a policy edge case worth reviewing manually.&lt;/p&gt;

&lt;h3&gt;
  
  
  Truncated Responses
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;finish_reason: 'length'&lt;/code&gt; means the model ran out of output tokens before completing the schema. In strict mode, a truncated response is always invalid JSON. The constrained decoder cannot produce partial schema-conforming output, so the token budget cuts the response mid-stream. The fix is almost always to increase &lt;code&gt;max_tokens&lt;/code&gt; or to reduce schema complexity.&lt;/p&gt;

&lt;p&gt;[Production Pitfall] Truncation in structured outputs is not recoverable at the application level. You cannot repair a half-written JSON object. Always set &lt;code&gt;max_tokens&lt;/code&gt; generously and monitor finish reasons in your telemetry. A &lt;code&gt;length&lt;/code&gt; finish reason on a structured output call is a configuration bug, not an edge case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Laravel Validation Beyond the Schema
&lt;/h2&gt;

&lt;p&gt;Strict mode guarantees structural compliance. It does not guarantee semantic correctness. A &lt;code&gt;confidence&lt;/code&gt; field of &lt;code&gt;999.5&lt;/code&gt; is perfectly valid JSON, schema-compliant, and completely wrong for your business logic.&lt;/p&gt;

&lt;p&gt;A dedicated Laravel Data Transfer Object with validation handles the gap cleanly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\AI\DTOs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Validator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Validation\ValidationException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ReviewAnalysisResult&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$sentiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;float&lt;/span&gt;  &lt;span class="nv"&gt;$confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;  &lt;span class="nv"&gt;$flags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?bool&lt;/span&gt;  &lt;span class="nv"&gt;$escalate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;fromArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;self&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Validator&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'sentiment'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in:positive,negative,neutral,mixed'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'confidence'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'numeric'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'between:0,1'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'summary'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'min:10'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'max:1000'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'flags'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'array'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'flags.*'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'escalate'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'nullable'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'boolean'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$validator&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;fails&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ValidationException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$validator&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;self&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'sentiment'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'confidence'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'summary'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'flags'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;escalate&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'escalate'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This follows the same validation patterns documented in our &lt;a href="https://origin-main.com/guides/laravel-openai-integration-guide/" rel="noopener noreferrer"&gt;complete guide to Laravel OpenAI integration&lt;/a&gt;. The schema enforces structure; the DTO enforces domain rules. Both layers earn their place.&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;em&gt;&lt;strong&gt;[Architect's Note]&lt;/strong&gt; The combination of strict schema + DTO validation gives you something genuinely new: an AI response you can type-hint. Downstream code that receives a &lt;code&gt;ReviewAnalysisResult&lt;/code&gt; can trust its shape as reliably as any other value object in your application. This is how you stop writing defensive null-checks six layers deep.&lt;/em&gt;&lt;/small&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Structured Outputs Into an Agentic Pipeline
&lt;/h2&gt;

&lt;p&gt;This is where the real value materialises. In an agentic workflow, one AI call produces the input that triggers the next step. Without schema guarantees, you're writing defensive code at every handoff. With strict mode, the handoff is a typed contract.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Jobs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\AI\DTOs\ReviewAnalysisResult&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\AI\Services\ReviewAnalysisService&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Exceptions\AI\ModelRefusalException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Exceptions\AI\TruncatedResponseException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\Review&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Bus\Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Queue\ShouldQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AnalyseReviewJob&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Retries are handled inside the service&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$reviewId&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;ReviewAnalysisService&lt;/span&gt; &lt;span class="nv"&gt;$service&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$review&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Review&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;findOrFail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;reviewId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$raw&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$service&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;analyse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$review&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ReviewAnalysisResult&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fromArray&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$raw&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="nv"&gt;$review&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'sentiment'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;sentiment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'ai_confidence'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'ai_summary'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'flags'&lt;/span&gt;         &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="c1"&gt;// Gate the next pipeline step on a semantic condition&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;escalate&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EscalateReviewJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;reviewId&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ModelRefusalException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Review analysis refused by model'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'review_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;reviewId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'reason'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="nv"&gt;$review&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'ai_status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'refused'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;TruncatedResponseException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Review analysis truncated'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'review_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;reviewId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice what we're not doing: we're not writing &lt;code&gt;isset($result['escalate'])&lt;/code&gt; before every downstream action. The DTO guarantees the key exists and has the correct type. The agentic branching logic is clean because the data contract is clean. This is what our &lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;production AI architecture contracts and governance guide&lt;/a&gt; refers to as treating AI outputs as typed system interfaces rather than raw text.&lt;/p&gt;

&lt;p&gt;For heavier pipelines where multiple agents hand off between each other, pair this approach with schema validation against LLM hallucinations. Structured outputs remove structural hallucinations, but semantic drift (a &lt;code&gt;confidence&lt;/code&gt; value that's technically valid but meaninglessly high) still needs an application-level assertion layer.&lt;/p&gt;

&lt;p&gt;On the infrastructure side, every structured output job should run through &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel AI middleware for token tracking&lt;/a&gt; so you capture token consumption per job class. Structured outputs tend to have stable, predictable token counts once your schema is stable, which makes anomaly detection on token spend genuinely useful. And if you want to understand how the &lt;code&gt;temperature&lt;/code&gt; and &lt;code&gt;max_tokens&lt;/code&gt; parameters interact with constrained decoding, the &lt;a href="https://origin-main.com/ai-foundations/laravel-llm-inference-control/" rel="noopener noreferrer"&gt;Laravel LLM inference control guide&lt;/a&gt; covers the mechanics.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using a Laravel Abstraction Layer
&lt;/h2&gt;

&lt;p&gt;Everything covered in this article is the raw implementation. For teams who don't want to manage schema definitions, response parsing, and retry logic manually, &lt;a href="https://origin-main.com/ai-agents/laravel-prism-php-agentic-apps/" rel="noopener noreferrer"&gt;Prism PHP&lt;/a&gt; handles it through a unified SDK that wraps OpenAI, Anthropic, and Gemini behind a consistent interface.&lt;/p&gt;

&lt;p&gt;The low-level approach earns its place for two reasons. First, Prism's structured output support is built on exactly these primitives, so understanding what the abstraction does underneath makes you a better consumer of it. Second, when the abstraction breaks (and it will, at a provider API boundary, under load, with an edge-case schema), you need to know what to reach for.&lt;/p&gt;

&lt;p&gt;The decision point is straightforward. Greenfield project with multiple AI providers, or a team that shouldn't be managing provider-specific quirks: use Prism. Single-provider integration where you want explicit control over the request lifecycle, retry behaviour, and token telemetry: go direct.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>openai</category>
      <category>json</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Laravel AI Integration: A Production-Ready Architecture Guide (OpenAI vs Gemini vs Claude)</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Mon, 27 Apr 2026 00:47:06 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-ai-integration-a-production-ready-architecture-guide-openai-vs-gemini-vs-claude-lcp</link>
      <guid>https://dev.to/dewaldhugo/laravel-ai-integration-a-production-ready-architecture-guide-openai-vs-gemini-vs-claude-lcp</guid>
      <description>&lt;p&gt;Every Laravel AI integration starts the same way. You install a package, copy an API key into &lt;code&gt;.env&lt;/code&gt;, write a controller method that fires a prompt, and it works. That approach is fine for a proof of concept. It is not fine for production.&lt;/p&gt;

&lt;p&gt;The first incident makes that clear, a 429 rate limit silences a customer-facing feature at 2AM, a model deprecation breaks a workflow three sprints before anyone noticed the announcement, or a runaway token loop burns through your monthly budget in four hours. Unlike a payment or CRM API where breaking changes arrive with a migration guide and six months notice, LLM providers will deprecate model strings mid-sprint, shift safety filter behaviour between deployments, and bury the changelog entry in a developer newsletter. The maintenance surface is larger and less predictable than any other API category most Laravel developers have worked with. That changes what the architecture needs to do.&lt;/p&gt;

&lt;p&gt;This guide assumes you have already decided to integrate AI into your Laravel application. The question it answers is how to do that without it becoming a production liability. No authentication walkthroughs, no step-by-step setup. What you get is the system-level map: the problem space, the architectural patterns that address it, a clear-eyed comparison of OpenAI, Gemini, and Claude, and a decision framework you can put in front of your team before the next kickoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Laravel AI Integration Problem Space
&lt;/h2&gt;

&lt;p&gt;Before picking a provider or drawing a service diagram, you need to be honest about what you are actually building. The problems that make AI difficult in production are not unique to any one provider. They are structural.&lt;/p&gt;

&lt;h3&gt;
  
  
  Provider Abstraction
&lt;/h3&gt;

&lt;p&gt;You will almost certainly swap or combine providers over the lifecycle of a product. The model that makes sense at launch is rarely the model that makes sense at scale, or after a pricing change, or when your use case evolves. If your application code couples directly to OpenAI’s SDK or Anthropic’s client, a provider change means rewriting feature logic, not just swapping a configuration value.&lt;/p&gt;

&lt;p&gt;The correct answer is a provider interface your application depends on, with concrete adapters behind it. We cover the interface contract below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token Usage and Cost Control
&lt;/h3&gt;

&lt;p&gt;Language models charge by the token. That sounds simple. In practice it means a single misconfigured prompt, multiplied by your queue throughput, can produce a five-figure bill in an afternoon. We have seen this happen.&lt;/p&gt;

&lt;p&gt;Token management is not optional in production. You need budget enforcement before the API call, usage logging at the response level, and alerting when spend crosses threshold. This belongs in the service layer – not in a controller, not in a scheduled job that runs after the damage is done.&lt;/p&gt;

&lt;h3&gt;
  
  
  Latency and Streaming
&lt;/h3&gt;

&lt;p&gt;A synchronous AI call in an HTTP handler is almost always wrong. Inference latency runs from 800ms on a lightweight model to 30 seconds on a long-form reasoning task. PHP-FPM holds a connection open for that entire window. Under any meaningful traffic, that collapses your worker pool.&lt;/p&gt;

&lt;p&gt;Streaming helps on the UX side, but it does not fix the connection pressure problem. Background jobs fix that. The streaming question is a separate decision, covered in its own section below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error Handling and Retries
&lt;/h3&gt;

&lt;p&gt;AI APIs fail. They return 429s during traffic spikes, 500s during model instability, and occasionally time out with no response body at all. None of these are exceptional conditions. They are routine.&lt;/p&gt;

&lt;p&gt;Your retry logic must distinguish between transient failures (worth retrying with backoff) and structural failures (wrong API key, malformed payload, context length exceeded). Treating all failures the same is one of the most common production bugs we see in AI-integrated Laravel applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vendor Lock-in
&lt;/h3&gt;

&lt;p&gt;Every provider has proprietary extensions – function calling formats, system prompt conventions, streaming event shapes, vision attachment schemas. The moment you hardcode those shapes into your application logic, you own a migration project the next time you need to switch.&lt;/p&gt;

&lt;p&gt;Abstraction is the mitigation. But abstraction has a cost. Build it at the right layer, and keep provider-specific behaviour in the adapter, not in your domain services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Provider Comparison: OpenAI vs Gemini vs Claude
&lt;/h2&gt;

&lt;p&gt;No provider is the right answer for every use case. Here is an honest comparison of the three dominant options for Laravel AI integration.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;OpenAI (GPT-4o)&lt;/th&gt;
&lt;th&gt;Google (Gemini 2.5 Pro)&lt;/th&gt;
&lt;th&gt;Anthropic (Claude Sonnet 4.5)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;128K tokens&lt;/td&gt;
&lt;td&gt;1M+ tokens&lt;/td&gt;
&lt;td&gt;200K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming support&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Function / tool calling&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision / multimodal&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Native embeddings&lt;/td&gt;
&lt;td&gt;✓ (text-embedding-3)&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;✗ (use OpenAI or Gemini)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Laravel SDK support&lt;/td&gt;
&lt;td&gt;First-party&lt;/td&gt;
&lt;td&gt;Via laravel/ai&lt;/td&gt;
&lt;td&gt;Via laravel/ai&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning quality&lt;/td&gt;
&lt;td&gt;Strong, broad&lt;/td&gt;
&lt;td&gt;Strong, long-context&lt;/td&gt;
&lt;td&gt;Exceptional, safety-aware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit tolerance&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;td&gt;Conservative&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing tier&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Mid&lt;/td&gt;
&lt;td&gt;Mid-High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Honest notes on each:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI&lt;/strong&gt; has the most mature Laravel ecosystem. The first-party package is stable, the API surface is well-documented, and function calling is production-tested at scale. It is the right default for teams building general-purpose AI features with limited architectural budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini’s&lt;/strong&gt; one-million-token context window is not a marketing number, it genuinely changes what is possible for document processing, long-session memory, and RAG-less retrieval patterns. The &lt;code&gt;laravel/ai&lt;/code&gt; SDK wraps it cleanly. The downside is that rate limits are inconsistently enforced and the model behaviour at the boundary of that context window is less predictable than Anthropic’s or OpenAI’s.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude&lt;/strong&gt; is the most consistent model for reasoning-heavy tasks, structured output extraction, code review pipelines, and anything that requires following complex multi-step instructions reliably. The safety constraints are real and occasionally frustrating for edge cases, but the instruction-following quality justifies the slightly higher cost for high-stakes features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Architecture for Laravel AI Systems
&lt;/h2&gt;

&lt;p&gt;The pattern that holds up across every provider, every scale, and every use case is the same: isolate AI behind a typed contract, implement concrete adapters per provider, and register the active provider via the Service Container.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Contracts\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;AIProviderInterface&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="nc"&gt;\Generator&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each provider - OpenAI, Gemini, Claude - gets its own adapter class that implements this interface. Your application services, queued jobs, and controllers depend on &lt;code&gt;AIProviderInterface&lt;/code&gt;. They never know which provider sits behind it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AppServiceProvider or bootstrap/app.php&lt;/span&gt;
&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nc"&gt;\App\Contracts\AI\AIProviderInterface&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;\App\AI\Adapters\ClaudeAdapter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Swapping providers is now a one-line configuration change. No domain logic moves.&lt;/p&gt;

&lt;p&gt;On top of this interface sits your governance layer – budget enforcement, telemetry, prompt versioning. &lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;Our guide on production-grade AI architecture in Laravel&lt;/a&gt; covers that layer in full, including typed &lt;code&gt;AiResult&lt;/code&gt; data objects, decorator chains, and audit trail patterns. Read that guide before you build the adapter layer, the contract it defines will inform how you structure the interface above.&lt;/p&gt;

&lt;p&gt;[Architect’s Note] The three-method interface above is intentionally minimal. Resist the temptation to add provider-specific capabilities - vision input, citation formats, tool schemas - to the shared interface. Those belong in extended interfaces or provider-specific services consumed explicitly. The shared interface exists to make the 80% case portable. The 20% case is allowed to know which provider it is talking to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration Management
&lt;/h3&gt;

&lt;p&gt;Provider credentials live in &lt;code&gt;config/ai.php&lt;/code&gt;, not scattered across service provider constructors. Each adapter reads from &lt;code&gt;config('ai.providers.claude.api_key')&lt;/code&gt; and so on. Environment variables map cleanly through the config layer, and you can define per-provider defaults - temperature, max tokens, retry budget - without touching application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Management and Cost Control
&lt;/h2&gt;

&lt;p&gt;This is the section most architectural guides skip. It is also the section that determines whether your AI feature is financially viable.&lt;/p&gt;

&lt;p&gt;Token costs compound. A feature that runs 500 times per day at 2,000 tokens per call is one million tokens daily. Multiply by your model’s per-token rate, multiply by 30, and you have a monthly infrastructure line item that belongs in your budget conversations from day one.&lt;/p&gt;

&lt;p&gt;The enforcement point must be pre-dispatch. That means checking the estimated token count against a configured budget before the API call fires, not after the response arrives. You cannot claw back tokens you have already consumed.&lt;/p&gt;

&lt;p&gt;For systems that incorporate retrieval, the token problem gets harder. Injecting retrieved documents into a prompt is the right architectural move for accuracy, but each document chunk contributes to input token count. RAG systems need token-aware chunking at retrieval time, not just at prompt assembly time. If you are building retrieval into your Laravel application, the &lt;a href="https://origin-main.com/ai-foundations/laravel-vector-database-embeddings-pgvector-rag/" rel="noopener noreferrer"&gt;RAG and vector retrieval implementation guide&lt;/a&gt; covers the chunking and embedding strategy in detail.&lt;/p&gt;

&lt;p&gt;Telemetry is the other half of cost control. Log prompt tokens, completion tokens, model, latency, and user or tenant identity on every response. Without that data, you cannot attribute cost, identify expensive call patterns, or make informed decisions about model selection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Streaming vs Standard Responses
&lt;/h2&gt;

&lt;p&gt;The choice between streaming and synchronous responses is a UX decision that has infrastructure consequences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synchronous calls&lt;/strong&gt; are appropriate for background jobs, scheduled tasks, and any context where the result is written to storage before the user sees it. They are simpler to implement, easier to test, and do not require any special transport infrastructure. If your AI call feeds a Queue Worker writing to Eloquent, synchronous is the right default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Streaming&lt;/strong&gt; is appropriate when a human is waiting and the output is long enough that a full-response delay degrades the experience. The threshold in practice is around 1.5–2 seconds. Below that, streaming adds complexity for no perceivable gain.&lt;/p&gt;

&lt;p&gt;The transport mechanism behind streaming is a separate decision. Server-Sent Events are the right default for Laravel, they require no additional infrastructure, work through standard HTTP, and integrate cleanly with Livewire. WebSockets are justified when you need bidirectional communication or multi-client broadcast. Livewire polling is a reasonable fallback for teams that want to avoid streaming infrastructure entirely, at the cost of perceived responsiveness.&lt;/p&gt;

&lt;p&gt;The tradeoffs between these three approaches are significant enough to warrant their own analysis. Our &lt;a href="https://origin-main.com/ai-foundations/laravel-ai-streaming-livewire-sse-websockets/" rel="noopener noreferrer"&gt;streaming transport decision guide&lt;/a&gt; covers all three with production code and explicit recommendations for each scenario.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing the Right Provider: A Decision Framework
&lt;/h2&gt;

&lt;p&gt;Stop treating provider selection as a preference. Treat it as an engineering decision with clear inputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use OpenAI when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  You need native embeddings alongside generation (text-embedding-3-small covers most retrieval use cases)&lt;/li&gt;
&lt;li&gt;  Your team needs the most widely documented integration path&lt;/li&gt;
&lt;li&gt;  You are building function-calling pipelines where tool reliability matters more than reasoning depth&lt;/li&gt;
&lt;li&gt;  You want the lowest overhead Laravel AI integration for an MVP that may evolve later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://origin-main.com/guides/laravel-openai-integration-guide/" rel="noopener noreferrer"&gt;complete OpenAI integration guide&lt;/a&gt; covers transport, function calling, and retry patterns against the GPT-4o and GPT-4o-mini endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Gemini when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Your use case involves documents or sessions that exceed 128K tokens&lt;/li&gt;
&lt;li&gt;  You are processing multimodal inputs — documents, images, mixed media — at scale&lt;/li&gt;
&lt;li&gt;  You want to experiment with RAG-less patterns using Gemini’s long context as a retrieval proxy&lt;/li&gt;
&lt;li&gt;  Cost at volume is a primary concern (Gemini 2.0 Flash is competitive at high call rates)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;a href="https://origin-main.com/guides/laravel-gemini-integration/" rel="noopener noreferrer"&gt;Gemini setup via the Laravel AI SDK&lt;/a&gt;, including the &lt;code&gt;GeminiManager&lt;/code&gt; failover pattern and model string verification, that guide covers everything from provider registration to streaming response handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use Claude when:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  The task requires following multi-step instructions precisely and reliably&lt;/li&gt;
&lt;li&gt;  You are extracting structured output from unstructured text (JSON extraction, data normalisation)&lt;/li&gt;
&lt;li&gt;  Safety and refusal behaviour matter — regulated industries, content moderation pipelines&lt;/li&gt;
&lt;li&gt;  Code generation or review is a significant part of the workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://origin-main.com/guides/laravel-claude-api-integration-guide/" rel="noopener noreferrer"&gt;Laravel Claude API integration guide&lt;/a&gt; covers the full Anthropic client setup, system prompt patterns, and streaming implementation with Laravel’s HTTP client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mixed-provider architectures
&lt;/h3&gt;

&lt;p&gt;The interface pattern described earlier makes multi-provider setups practical. Routing by task type, OpenAI for embeddings, Claude for structured extraction, Gemini for long-document summarisation, is a legitimate production architecture, not over-engineering. The Service Container resolves the correct adapter per task type, and your domain code stays clean.&lt;/p&gt;

&lt;p&gt;[Production Pitfall] Mixed-provider setups introduce a subtle budget tracking problem. If you log token costs per provider separately, you lose the aggregate view. Make sure your telemetry table records the provider name alongside every call, and your cost dashboards aggregate across providers by feature or tenant, not just by provider account.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;If you take one thing from this guide, make it this: the providers are interchangeable. The architecture is not.&lt;/p&gt;

&lt;p&gt;OpenAI, Gemini, and Claude will all improve, deprecate models, change pricing, and introduce capabilities over the next twelve months. Any of the three can be the right choice today and the wrong choice in two quarters. The teams that weather those changes without incident are the teams that built the abstraction layer before they needed it, not the teams that coupled their application to a specific SDK and are now rewriting feature logic to swap providers.&lt;/p&gt;

&lt;p&gt;Production Laravel AI integration is a system design problem. The interface contract, the adapter layer, the token budget enforcement, the streaming transport decision, the telemetry pipeline — those are the decisions that determine whether your AI feature is operationally sustainable, or just a demo that scaled badly.&lt;/p&gt;

&lt;p&gt;Build the layer. Then pick the provider.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>ai</category>
      <category>php</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Laravel OpenAI Integration: The Complete Production Guide</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Sun, 26 Apr 2026 20:59:38 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-openai-integration-the-complete-production-guide-la1</link>
      <guid>https://dev.to/dewaldhugo/laravel-openai-integration-the-complete-production-guide-la1</guid>
      <description>&lt;p&gt;Laravel OpenAI integration is not simpler than Claude integration. The APIs differ, the model families differ — but the architectural requirements are nearly identical once you care about production stability.&lt;/p&gt;

&lt;p&gt;Laravel works well here because it already solves the problems AI integrations create: dependency boundaries, environment isolation, background work, failure handling, and observability. You’re not fighting the framework. You’re leaning on it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;On model names:&lt;/strong&gt; This guide uses &lt;code&gt;gpt-4o&lt;/code&gt; and &lt;code&gt;gpt-4o-mini&lt;/code&gt; as examples. These are current as of early 2026. OpenAI’s model catalogue evolves quickly — always verify against &lt;a href="https://platform.openai.com/docs/models" rel="noopener noreferrer"&gt;OpenAI’s official models reference&lt;/a&gt; before deploying. &lt;code&gt;gpt-3.5-turbo&lt;/code&gt; is deprecated; &lt;code&gt;gpt-4o-mini&lt;/code&gt; is its cost-effective replacement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before committing to OpenAI as your sole provider, it is worth understanding where it sits in the broader Laravel AI provider landscape. Our &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-integration-architecture-guide/" rel="noopener noreferrer"&gt;provider comparison and architecture decision guide&lt;/a&gt; covers how OpenAI stacks up against Gemini and Claude for different production use cases — useful context before you build the service layer around any single provider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the OpenAI API Surface
&lt;/h2&gt;

&lt;p&gt;OpenAI’s platform has evolved faster than its documentation. Many tutorials conflate multiple API generations or mix incompatible approaches.&lt;/p&gt;

&lt;p&gt;For Laravel applications, think in terms of three distinct capabilities: the &lt;strong&gt;Chat Completions API&lt;/strong&gt; for text generation and reasoning, the &lt;strong&gt;Embeddings API&lt;/strong&gt; for semantic search and retrieval, and the &lt;strong&gt;Audio/Vision APIs&lt;/strong&gt; for transcription and multimodal inputs.&lt;/p&gt;

&lt;p&gt;For image generation specifically – gpt-image-1, queued jobs, S3 storage, and per-request cost tracking — see the &lt;a href="https://origin-main.com/api-integration/laravel-openai-image-generation-gpt-image-1/" rel="noopener noreferrer"&gt;Laravel OpenAI image generation guide&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenAI is not “one API”. Your Laravel architecture should reflect that separation early. Controllers that call the Completions and Embeddings endpoints directly will become unmaintainable fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Foundations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Authentication and Environment Configuration
&lt;/h3&gt;

&lt;p&gt;OpenAI uses API keys issued at the account or project level. Treat them as production secrets — same as database credentials or payment tokens.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sk-...
OPENAI_ORG_ID=optional
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Never read these values directly from &lt;code&gt;env()&lt;/code&gt; outside of configuration files. This is one of the most common structural mistakes we see in Laravel AI codebases. Expose them through a dedicated config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/openai.php&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'api_key'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'organization'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_ORG_ID'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'base_url'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_BASE_URL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'https://api.openai.com/v1'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This centralizes OpenAI configuration and allows overrides for testing, proxying, or self-hosted gateways. Skipping this step is how vendor logic leaks across the codebase and makes rotation painful.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Security:&lt;/strong&gt; Never commit API keys to git. Use Laravel’s encrypted environment handling in production, and rotate keys periodically. Treat a leaked OpenAI key with the same urgency as a leaked database password — it has a direct billing impact.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Why You Should Not Use the OpenAI SDK Directly
&lt;/h3&gt;

&lt;p&gt;OpenAI provides official SDKs, including PHP community implementations. Injecting these directly into controllers or jobs is a mistake.&lt;/p&gt;

&lt;p&gt;SDKs change. Model names change. Response formats evolve. Your application should not absorb that churn.&lt;/p&gt;

&lt;p&gt;Access OpenAI through a service boundary you control:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiClient&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="kt"&gt;HttpClient&lt;/span&gt; &lt;span class="nv"&gt;$http&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="kt"&gt;OpenAiClient&lt;/span&gt; &lt;span class="nv"&gt;$client&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents your controllers from becoming vendor adapters. It also makes the Laravel &lt;strong&gt;Service Container&lt;/strong&gt; work for you — you can bind a mock in tests without touching a single controller.&lt;/p&gt;

&lt;h3&gt;
  
  
  HTTP Client Configuration and Timeouts
&lt;/h3&gt;

&lt;p&gt;AI requests can be long-running, bursty, expensive, and sensitive to retries. A default &lt;code&gt;Http::post()&lt;/code&gt; is insufficient. Configure a dedicated client with explicit behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'Authorization'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Bearer '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'openai.api_key'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'application/json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;timeout(60)&lt;/code&gt; matters because OpenAI requests can legitimately take tens of seconds with large prompts or slow models. The &lt;code&gt;retry(0, 0)&lt;/code&gt; matters even more — retrying AI requests blindly can double your cost and produce inconsistent outputs. If you need retries, implement them at the application level with full context awareness, not at the transport layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error Handling as a First-Class Concern
&lt;/h3&gt;

&lt;p&gt;OpenAI’s Chat Completions API fails in ways standard REST APIs rarely do: partial responses, rate limit throttling, model overload, malformed outputs despite 200 responses.&lt;/p&gt;

&lt;p&gt;Your service layer must treat OpenAI responses as untrusted input, even when HTTP status codes indicate success.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;successful&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAiRequestFailed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But that’s only half the problem. You must also validate semantic correctness. Did the model return text when you expected JSON? Did it ignore system instructions? Did it hallucinate fields? That validation logic does not belong in a controller.&lt;/p&gt;

&lt;h2&gt;
  
  
  Service Architecture: Designing for Change
&lt;/h2&gt;

&lt;p&gt;Your OpenAI service should not expose low-level API primitives unless absolutely necessary.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$openai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;chatCompletion&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prefer intent-based methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$openai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;$openai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;$openai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;extractStructuredData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Internally, these methods may all use the same API endpoint. Externally, they express business intent. This gives you three architectural wins: prompts become versionable assets, you can A/B test models without touching callers, and you can migrate providers with minimal surface-area changes.&lt;/p&gt;

&lt;p&gt;If you later introduce Claude alongside OpenAI — a decision many teams make as they want provider redundancy — this design pays for itself immediately. The service boundary is what makes multi-model routing tractable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependency Injection and Testability
&lt;/h3&gt;

&lt;p&gt;Every OpenAI-facing class should be injectable and mockable. No static helpers, no facades hiding HTTP calls, no &lt;code&gt;env()&lt;/code&gt; reads at runtime.&lt;/p&gt;

&lt;p&gt;In tests, you should be able to swap the OpenAI client with a fake implementation that returns deterministic responses. If you cannot do that without modifying production code, you are not integrating OpenAI — you are coupling your application to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Integration Patterns
&lt;/h2&gt;

&lt;p&gt;Once your foundation is in place, OpenAI becomes a capability your application depends on. These patterns are not OpenAI-specific — they’re standard Laravel patterns applied to probabilistic systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 1: Simple Request–Response (One-Shot Tasks)
&lt;/h3&gt;

&lt;p&gt;The simplest OpenAI interaction is still the most common: you send a prompt, you receive a response, the request lifecycle ends. Classification, extraction, summarization, content drafting — all one-shot.&lt;/p&gt;

&lt;p&gt;Most tutorials implement this directly in controllers. Don’t.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Designing the Service Method&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// implementation&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementing the API Call&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/chat/completions'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'model'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'messages'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'role'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model selection is explicit. Never rely on defaults — model choice is an architectural decision with cost and quality implications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extracting the Result Safely&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nv"&gt;$content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;data_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'choices.0.message.content'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;UnexpectedOpenAiResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;data_get()&lt;/code&gt; here. Future models may return multiple content blocks, safety filters may suppress output, and partial responses can occur under load. Your service method should either return valid output or fail loudly — silent nulls are debugging nightmares.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: Streaming Responses (Long-Running Output)
&lt;/h3&gt;

&lt;p&gt;Streaming is not about novelty. It’s about user experience under latency. Large outputs can take seconds. Streaming turns that wait into visible progress.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When Streaming Makes Sense&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use streaming when output length is unbounded and perceived latency matters. Avoid it when you need structured JSON output, must validate the full response before use, or results are consumed only by background jobs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Server-Side Streaming with OpenAI&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenAI supports streaming via Server-Sent Events (SSE). On the Laravel side:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"data: "&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nb"&gt;ob_flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nb"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'text/event-stream'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Cache-Control'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'no-cache'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'X-Accel-Buffering'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'no'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementing the Stream Method&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The &lt;code&gt;sink&lt;/code&gt; Option Defeats Streaming Entirely&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A common implementation mistake is using &lt;code&gt;-&amp;gt;withOptions(['stream' =&amp;gt; true, 'sink' =&amp;gt; fopen('php://temp', 'w+')])&lt;/code&gt; in Laravel’s HTTP client. The &lt;code&gt;sink&lt;/code&gt; option tells Guzzle to buffer the &lt;em&gt;entire&lt;/em&gt; response to that destination before returning. Your &lt;code&gt;while (!feof($handle))&lt;/code&gt; loop runs after the connection is already closed. You are not streaming — you are reading a completed buffer with extra steps. The UX benefits you expect don’t exist.&lt;/p&gt;

&lt;p&gt;For true streaming, use Guzzle directly with &lt;code&gt;stream =&amp;gt; true&lt;/code&gt; and read the PSR-7 body incrementally:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;callable&lt;/span&gt; &lt;span class="nv"&gt;$onChunk&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$client&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;\GuzzleHttp\Client&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'https://api.openai.com/v1/chat/completions'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'headers'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'Authorization'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Bearer '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'openai.api_key'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'application/json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'json'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'model'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'messages'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="s1"&gt;'stream'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'stream'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nv"&gt;$body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getBody&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$body&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;detach&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;feof&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$resource&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;rtrim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;stream_get_line&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$resource&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;empty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nv"&gt;$line&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'data: [DONE]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str_starts_with&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'data: '&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;json_decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;isset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'choices'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'delta'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;$onChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'choices'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'delta'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Senior Dev Tip:&lt;/strong&gt; If you need to decide between Livewire polling, SSE, and WebSockets for your AI UI, that architectural choice has more impact than the streaming implementation itself. We compared all three approaches in &lt;a href="https://origin-main.com/ai-foundations/laravel-ai-streaming-livewire-sse-websockets/" rel="noopener noreferrer"&gt;Livewire vs SSE vs WebSockets for AI UIs&lt;/a&gt; — worth reading before you commit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Pattern 3: Conversation Memory (Multi-Turn Interactions)
&lt;/h3&gt;

&lt;p&gt;Chatbots are one example. Multi-step reasoning, guided forms, iterative content refinement — all of these are multi-turn workflows. The key challenge is state management, not prompt engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Representing Conversation State&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'conversations'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'context'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'conversation_messages'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;foreignId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'conversation_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 'user' or 'assistant'&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Rehydrating Context&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before sending a request, load the full message history via &lt;strong&gt;Eloquent&lt;/strong&gt; and build the message array:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$conversation&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'role'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$msg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$msg&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toArray&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then append the new user input and send the full list to OpenAI. Large language models do not remember anything between API calls. Memory is entirely your responsibility. Persisting messages also lets you truncate context intelligently and summarize older turns when token counts grow large.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 4: Background Processing (Queued AI Work)
&lt;/h3&gt;

&lt;p&gt;Not every AI task belongs in the request–response cycle. Bulk processing, document analysis, scheduled generation — these belong in queued jobs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenerateSummaryJob&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;OpenAiService&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'summary'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Laravel’s queue system provides retry semantics, failure hooks, and concurrency control — all essential when working with a rate-limited, cost-based API. A background job that silently retries three times on a large document can triple your bill. Jobs must cap token usage, log cost metadata, and fail deterministically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting the Patterns
&lt;/h3&gt;

&lt;p&gt;These four patterns combine in real applications: a Livewire UI streams output, backed by a conversation record, which occasionally offloads heavy steps to queues, all of which reuse the same OpenAI service layer. That consistency is the real win — one abstraction, composable in every direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building a Document Q&amp;amp;A Feature
&lt;/h2&gt;

&lt;p&gt;Chatbots are not the most interesting application of LLMs in production. The real leverage comes from augmenting your existing data — documents, records, internal knowledge — with natural language access.&lt;/p&gt;

&lt;p&gt;We’ll implement a Document Q&amp;amp;A feature: a user uploads a document, the system extracts and stores its content, and the user can ask questions grounded strictly in that document’s content. This exercises nearly every core pattern: one-shot generation, multi-turn context, background processing, and defensive prompt design.&lt;/p&gt;

&lt;p&gt;We deliberately skip embeddings and vector databases here. That’s a scoped decision, not a shortcut. The simpler architecture works today and degrades predictably when it eventually outgrows itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem Definition and Constraints
&lt;/h3&gt;

&lt;p&gt;What we want: natural language questions over uploaded documents, answers limited to document content, reasonable performance for mid-sized files, and clear failure modes.&lt;/p&gt;

&lt;p&gt;What we’re not solving yet: semantic search across large corpora, cross-document reasoning, or high-recall retrieval at scale. Many “RAG tutorials” prematurely introduce vector databases when a simpler architecture would suffice.&lt;/p&gt;

&lt;h3&gt;
  
  
  High-Level Architecture
&lt;/h3&gt;

&lt;p&gt;Four stages: ingestion (document uploaded and stored), extraction (text extracted and normalized), context preparation (content transformed into prompt context), and question answering (user questions answered via OpenAI). Each maps cleanly to a Laravel responsibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 1: Document Upload and Persistence
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'original_name'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'path'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;longText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At upload time, do not call OpenAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'document'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'documents'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nv"&gt;$document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'original_name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getClientOriginalName&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="s1"&gt;'path'&lt;/span&gt;          &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nc"&gt;ExtractDocumentContent&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dispatch a job. Extraction may be slow. Upload requests should return quickly. Extraction failures should be isolated — not cascade into upload failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 2: Text Extraction as a Background Job
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ExtractDocumentContent&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;extractText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Storage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No AI calls, no tokens consumed. Failures are deterministic. You want extraction bugs to be boring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stage 3: Chunking Document Content
&lt;/h3&gt;

&lt;p&gt;Here’s where many implementations fail. The naive approach — stuffing the entire document into a prompt — works briefly, then collapses under token limits and cost pressure.&lt;/p&gt;

&lt;p&gt;Split the document into chunks based on size:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="nv"&gt;$length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;mb_strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offset&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$chunks&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;mb_substr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$offset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$size&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$offset&lt;/span&gt;  &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nv"&gt;$size&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$chunks&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Gotcha — &lt;code&gt;str_split()&lt;/code&gt; Will Corrupt Multibyte Text&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The PHP standard library’s &lt;code&gt;str_split($text, $size)&lt;/code&gt; splits by &lt;em&gt;bytes&lt;/em&gt;, not characters. Any document with UTF-8 content — French accents, German umlauts, Arabic, CJK — will produce chunks with broken characters at boundaries. Always use &lt;code&gt;mb_substr()&lt;/code&gt; and &lt;code&gt;mb_strlen()&lt;/code&gt; for text chunking. This is a silent production bug that only surfaces with real-world content.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Stage 4: Answering Questions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;answerQuestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Document&lt;/span&gt; &lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunkText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$question&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prompt Construction&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;buildPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$chunks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;implode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;PROMPT
You are answering questions based ONLY on the following document content.
If the answer cannot be found in the document, respond with:
"I don't know based on the provided document."

Document:
{$context}

Question:
{$question}
PROMPT;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt constrains hallucination, defines a fallback response, and makes incorrect answers detectable. The goal is not perfection — it’s bounded failure. You’ll eventually outgrow this approach, but it fails gracefully, which makes it an excellent foundation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Handling Multi-Turn Follow-Ups
&lt;/h3&gt;

&lt;p&gt;Users rarely ask one question. Instead of re-sending the entire document every time, summarize prior context, append follow-ups as messages, and re-inject only relevant chunks. The conversation schema from Pattern 3 handles this directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Observability
&lt;/h3&gt;

&lt;p&gt;Every OpenAI call in this feature should log document ID, token count, model, and latency. Not for analytics — for debugging. When a user reports “the AI gave a wrong answer,” you need the exact inputs that produced it. Without that log, you’re guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Concerns
&lt;/h2&gt;

&lt;p&gt;By the time an OpenAI integration reaches production, the question shifts from “Can we generate an answer?” to “How often does this fail? How expensive is it over time? What happens under load?”&lt;/p&gt;

&lt;p&gt;AI integrations are probabilistic, rate-limited, and cost-based. Treating them like traditional REST APIs produces brittle systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rate Limiting as a System Behaviour
&lt;/h3&gt;

&lt;p&gt;OpenAI enforces rate limits at the account and model level. Most developers respond to sporadic 429s by adding retries. That’s the wrong mental model.&lt;/p&gt;

&lt;p&gt;Rate limits are not an error condition. They’re a signal that your system’s demand exceeds its contract.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Centralizing Rate Awareness&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Your OpenAI service layer should be the only place where rate-limit responses are interpreted:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAiRateLimited&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;retryAfter&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Retry-After'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This exception should not be caught silently. Its consumers — controllers, jobs — must decide what to do next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Different Strategies for Different Contexts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;User-facing requests should fail fast with a clear message or degraded experience. Background jobs can be delayed and rescheduled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// OpenAI call&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OpenAiRateLimited&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;release&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;retryAfter&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This aligns retries with system capacity instead of guessing at backoff timing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost as an Operational Metric
&lt;/h3&gt;

&lt;p&gt;Token usage is not an abstract billing detail. It’s a runtime characteristic of your application. Two requests that look identical in code can differ by orders of magnitude in cost depending on prompt length, document size, conversation history, and model choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recording Token Usage&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every OpenAI response includes token metadata. Capture it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;data_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'usage'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;OpenAiUsage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'model'&lt;/span&gt;             &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;     &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$usage&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;      &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This data is not for dashboards at first. It’s for answering questions like: Why did costs spike yesterday? Which feature is expensive? Which users generate the most load? You cannot optimize what you do not measure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-Aware Design Decisions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once usage is visible, architectural decisions become concrete: summarizing older conversation turns to stay within token budgets, truncating document context for cheaper models, and routing simple tasks to &lt;code&gt;gpt-4o-mini&lt;/code&gt; instead of &lt;code&gt;gpt-4o&lt;/code&gt;. These are the difference between sustainable and runaway costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caching AI Outputs (Carefully)
&lt;/h3&gt;

&lt;p&gt;Caching AI responses is tempting. It’s also dangerous if done naively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Is Safe to Cache&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Caching works well for deterministic prompts, non-user-specific inputs, and expensive but repeatable operations — document summaries, extracted metadata, classification labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;remember&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;"doc:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:summary"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;addDay&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$document&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Should Not Be Cached&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Avoid caching conversational replies, user-personalized outputs, or anything derived from mutable state. If you cannot confidently answer “When should this be invalidated?”, don’t cache it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Senior Dev Tip:&lt;/strong&gt; Use &lt;strong&gt;Redis&lt;/strong&gt; for AI output caching, not the file driver. Redis gives you atomic expiry, explicit invalidation by key pattern, and the ability to inspect what’s cached without reading from disk. File-based caches become unmanageable quickly when you’re caching per-document, per-user outputs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Testing OpenAI Integrations Without Lying
&lt;/h3&gt;

&lt;p&gt;This is where many teams get stuck. Mocking the OpenAI client to return a fixed string proves nothing useful. You need two distinct kinds of tests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Contract Tests (Fast, Deterministic)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These verify that your service layer calls OpenAI correctly, that failures are handled, and that parsing logic works. Mocking is appropriate here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'api.openai.com/*'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'choices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Test output'&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'usage'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;]),&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’re testing plumbing, not intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Behavioural Tests (Slow, Real)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These hit the real OpenAI API, use real prompts, and validate constraints — not specific wording:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$service&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;answerQuestion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$doc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'What is the refund policy?'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStringContainsString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'refund'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;strtolower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These tests should run infrequently, be clearly labelled, use dedicated API keys, and have cost limits. They catch failures no mock ever will.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dealing With Non-Determinism&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI output is not stable across runs. Tests must reflect that. Assert structure, presence or absence, and refusal behaviour. If a model is instructed to say “I don’t know” under certain conditions, test for that branch explicitly. You’re testing behavioural boundaries, not string values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Considerations
&lt;/h3&gt;

&lt;p&gt;Increase PHP and proxy timeouts for streaming routes. Ensure queue workers can handle long-running jobs. If you haven’t locked down your production environment yet, &lt;a href="https://origin-main.com/guides/deploy-laravel-to-production/" rel="noopener noreferrer"&gt;deploy Laravel to production&lt;/a&gt; covers the infrastructure baseline these integrations depend on. Separate API keys across local, staging, and production — never reuse them. Wrap major prompt changes behind feature flags; a prompt change can alter behaviour as dramatically as a code change.&lt;/p&gt;

&lt;p&gt;At minimum, you should be able to answer: Which OpenAI calls failed today? Which features are most expensive? Which prompts changed recently? This doesn’t require a full observability stack. It requires discipline: structured logs, request IDs, and consistent metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Patterns
&lt;/h2&gt;

&lt;p&gt;Your Laravel application now has a clean OpenAI service layer with request/response, streaming, memory, queues, cost tracking, rate limiting, testability, and observability. Let’s extend it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Livewire Integration: Real-Time AI Without a Frontend Framework
&lt;/h3&gt;

&lt;p&gt;Not every Laravel team wants a JavaScript SPA. Livewire allows server-driven interactivity that pairs naturally with AI workflows.&lt;/p&gt;

&lt;p&gt;Livewire is request-driven, not socket-driven. Each interaction is an HTTP round trip, and long-running calls block the request lifecycle unless handled intentionally. For a complete working implementation, see &lt;a href="https://origin-main.com/coding-tutorials/laravel-livewire-claude-api-real-time-chat/" rel="noopener noreferrer"&gt;Laravel Livewire + Claude Integration&lt;/a&gt; — the architectural patterns are identical for OpenAI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Livewire AI Component&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Livewire\Component&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AiAssistant&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Component&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;   &lt;span class="nv"&gt;$loading&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;OpenAiService&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;loading&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;loading&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;render&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'livewire.ai-assistant'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works immediately but it’s blocking. If generation takes 10 seconds, the request takes 10 seconds. For short responses, acceptable. For longer outputs, you have three realistic strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Polling:&lt;/strong&gt; Trigger generation in a queued job, persist partial output, and poll from Livewire until complete. Simple and resilient.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Hybrid Route for Streaming:&lt;/strong&gt; Livewire initiates the request, a separate streaming endpoint handles chunked output, and a thin JavaScript layer listens and updates the DOM.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Accept Blocking:&lt;/strong&gt; For short, predictable outputs, it may be entirely acceptable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The decision depends on latency tolerance, not preference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inertia + Vue: SPA-Style AI Interfaces
&lt;/h3&gt;

&lt;p&gt;If your application already uses Inertia, you can leverage SSE natively on the client side.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Laravel Route:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/ai/stream'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;OpenAiService&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$openAi&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"data: "&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="nb"&gt;ob_flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
                &lt;span class="nb"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'text/event-stream'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'Cache-Control'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'no-cache'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'X-Accel-Buffering'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'no'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Vue Client:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;eventSource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;EventSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/ai/stream?prompt=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;encodeURIComponent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;eventSource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;onmessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Streaming is an architectural decision, not a cosmetic enhancement. It changes how users perceive your application’s responsiveness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Retrieval-Augmented Generation (RAG) Fundamentals
&lt;/h3&gt;

&lt;p&gt;Eventually, chunking entire documents into prompts stops scaling. You need selective context retrieval.&lt;/p&gt;

&lt;p&gt;RAG solves this by converting text into embeddings, storing them in a vector store, and injecting only semantically relevant chunks into the prompt at query time — rather than the full document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Generating Embeddings&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/embeddings'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'model'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'input'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$textChunk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="nv"&gt;$vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;data_get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'data.0.embedding'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Storing Vectors&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At small scale, PostgreSQL with the &lt;code&gt;pgvector&lt;/code&gt; extension is sufficient. If you need a deeper introduction to how embeddings actually work before choosing your storage strategy, &lt;a href="https://origin-main.com/ai-foundations/laravel-vector-database-embeddings-pgvector-rag/" rel="noopener noreferrer"&gt;Embeddings and Vector Databases: The Foundation of Semantic AI Systems&lt;/a&gt; covers the concepts and storage trade-offs in detail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Retrieving Relevant Chunks&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At query time, convert the question to an embedding, perform a cosine similarity search, and retrieve the top N chunks. Only those chunks enter the OpenAI prompt. This reduces token usage, latency, and hallucination. RAG is not magic — it’s controlled context injection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Model Strategies
&lt;/h3&gt;

&lt;p&gt;Assuming one model should do everything is one of the most expensive architectural mistakes in LLM-backed applications. Large reasoning models are slow and costly. Smaller models are fast and cheap. Embedding models serve an entirely different purpose.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent-Based Model Routing&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$taskType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="s1"&gt;'summarization'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o-mini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'complex_reasoning'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'embedding'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'text-embedding-3-small'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;default&lt;/span&gt;            &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o-mini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents overpaying for simple tasks. A document classification job does not need the same model as a multi-step reasoning pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fallback Strategies&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also implement primary → fallback model routing, high-accuracy → fast mode switching, and user-tier-based model selection. These are business decisions with direct cost and SLA implications, not technical hacks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Senior Dev Tip:&lt;/strong&gt; If you want to go further and treat your AI layer as a genuinely provider-agnostic abstraction — with tool calling, multi-model orchestration, and structured outputs built in — take a look at &lt;a href="https://origin-main.com/ai-agents/laravel-prism-php-agentic-apps/" rel="noopener noreferrer"&gt;Building Agentic Laravel Apps with Prism PHP&lt;/a&gt;. Prism handles the abstraction layer so you’re not maintaining it yourself.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Where This Leaves You
&lt;/h2&gt;

&lt;p&gt;Your Laravel + OpenAI system now supports real-time interfaces, retrieval-based document reasoning, multi-model routing, and a structured architecture ready to scale.&lt;/p&gt;

&lt;p&gt;The service boundary protects you from API changes, model shifts, and provider swaps. OpenAI becomes infrastructure — replaceable, measurable, and contained. That’s the outcome worth building toward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ready to build with real conversation memory and streaming?&lt;/strong&gt; &lt;a href="https://origin-main.com/coding-tutorials/laravel-claude-streaming-chatbot/" rel="noopener noreferrer"&gt;Building a Claude API Chatbot in Laravel&lt;/a&gt; demonstrates both patterns in a complete, working implementation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hit rate limits in production?&lt;/strong&gt; The queue-based throttling and backpressure strategies in &lt;a href="https://origin-main.com/guides/laravel-claude-api-integration-guide/" rel="noopener noreferrer"&gt;Laravel Claude API Integration Guide&lt;/a&gt; apply directly to OpenAI workloads.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Want a server-driven real-time UI without JavaScript frameworks?&lt;/strong&gt; &lt;a href="https://origin-main.com/coding-tutorials/laravel-livewire-claude-api-real-time-chat/" rel="noopener noreferrer"&gt;Laravel Livewire + Claude Integration&lt;/a&gt; shows how to build AI features that stay in PHP.&lt;/p&gt;

&lt;p&gt;Questions? &lt;a href="https://origin-main.com/questions/" rel="noopener noreferrer"&gt;Ask in our Developer Q&amp;amp;A&lt;/a&gt; or reach out directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Official References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://platform.openai.com/docs/guides/text-generation" rel="noopener noreferrer"&gt;OpenAI Chat Completions API&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://platform.openai.com/docs/models" rel="noopener noreferrer"&gt;OpenAI Models Reference&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://laravel.com/docs/http-client" rel="noopener noreferrer"&gt;Laravel HTTP Client Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://laravel.com/docs/queues" rel="noopener noreferrer"&gt;Laravel Queues Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>laravel</category>
      <category>ai</category>
      <category>openai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>RAG-less Architecture in Laravel: Long-Context Caching with Gemini</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Thu, 23 Apr 2026 03:01:13 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/rag-less-architecture-in-laravel-long-context-caching-with-gemini-5h36</link>
      <guid>https://dev.to/dewaldhugo/rag-less-architecture-in-laravel-long-context-caching-with-gemini-5h36</guid>
      <description>&lt;p&gt;For the past few years, RAG has been presented as the only serious approach to grounding AI on large private datasets. Build a vector database, chunk your documents, embed the chunks, retrieve the closest matches, inject them into the prompt. Repeat for every query. It works, and for certain use cases it is genuinely the right call.&lt;/p&gt;

&lt;p&gt;But for others — particularly repeated, deep-analysis tasks on the same corpus — RAG is an expensive workaround for a context window problem that the models have largely outgrown.&lt;/p&gt;

&lt;p&gt;Google’s Gemini 2.5 Flash and 2.5 Pro both ship with a 1-million-token context window. That is approximately 750,000 words, or a mid-sized Laravel monolith with room to spare. More importantly, the Gemini API now supports &lt;strong&gt;Explicit Context Caching&lt;/strong&gt;: you upload that corpus once, pay full input price once, and reference the cached context on every subsequent query at up to 75% off the standard input rate. For workloads that involve many questions against the same codebase, document set, or knowledge base, this is a fundamentally different cost model — and a fundamentally different architecture.&lt;/p&gt;

&lt;p&gt;This guide walks you through building a &lt;code&gt;GeminiContextCacheService&lt;/code&gt; in Laravel 11/12, binding it to the Service Container, and building a production-ready audit endpoint around it. We will cover the full cache lifecycle, including the storage cost trap that most tutorials skip entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  RAG vs. Long-Context Caching with Gemini: The Architectural Trade-Off
&lt;/h2&gt;

&lt;p&gt;Before touching code, we need to be honest about what each approach actually is. Neither is universally superior.&lt;/p&gt;

&lt;p&gt;RAG trades context completeness for cost predictability. You pay only for the retrieved chunks, not the full corpus — which is excellent when your dataset is enormous (hundreds of GB) and any single query only needs a narrow slice of it. The problem is that retrieval is fundamentally lossy. A semantic search might pull the right controller method and miss the &lt;code&gt;BaseModel&lt;/code&gt; that overrides &lt;code&gt;save()&lt;/code&gt;. It will find the public interface and skip the private helper that silently transforms the input. That retrieval gap is not a bug you can patch; it is the architecture.&lt;/p&gt;

&lt;p&gt;Long-context caching with Gemini trades retrieval precision for corpus completeness. The model sees everything. There is no retrieval step, no chunking pipeline, no embedding model to maintain. The trade-off is cost structure: the first query is expensive (you pay to ingest the full context), and you carry a per-hour storage fee while the cache lives. Beyond the break-even point — which arrives faster than you expect — repeated queries cost significantly less than the RAG equivalent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Curve: Why Caching Is the Financial Lever
&lt;/h2&gt;

&lt;p&gt;Let us put real numbers on this. Using Gemini 2.5 Flash, which offers the best price-to-context ratio for this workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Standard input rate (long context &amp;gt;128K tokens):&lt;/strong&gt; $0.30/MTok&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cached input rate:&lt;/strong&gt; $0.03/MTok (90% reduction at Flash long-context tier)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cache storage:&lt;/strong&gt; billed per hour per MTok while the cache lives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a 500K-token codebase audit with 20 questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without caching:&lt;/strong&gt; Each query sends 500K tokens to the API. That is $0.15 per question × 20 = &lt;strong&gt;$3.00 total&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With caching:&lt;/strong&gt; Initial cache creation: $0.15 (one-time). Each subsequent query references the cache at $0.015. Twenty queries: $0.15 + ($0.015 × 19) = &lt;strong&gt;$0.435 total&lt;/strong&gt; — plus a small storage fee for however long the cache lives.&lt;/p&gt;

&lt;p&gt;The break-even is query 2. Every question after that is 90% cheaper than the RAG alternative that re-sends the same context.&lt;/p&gt;

&lt;h2&gt;
  
  
  When RAG Still Wins
&lt;/h2&gt;

&lt;p&gt;We should be direct about this before anyone rips out a working vector pipeline.&lt;/p&gt;

&lt;p&gt;If your corpus exceeds 1M tokens — a multi-year ticket archive, a large documentation set, a multi-repo monorepo — the long-context window is not a solution. You physically cannot fit the data. RAG is the architecture. Our existing guide on &lt;a href="https://origin-main.com/ai-foundations/laravel-vector-database-embeddings-pgvector-rag/" rel="noopener noreferrer"&gt;Laravel Embeddings, Vector Databases, and RAG&lt;/a&gt; covers pgvector and the full embedding pipeline if that is your situation.&lt;/p&gt;

&lt;p&gt;Similarly, if your corpus changes frequently — user-generated content, live database records — cache invalidation becomes expensive. Re-creating a 1M-token cache on every data change defeats the cost benefit.&lt;/p&gt;

&lt;p&gt;Long-context caching targets a specific sweet spot: a corpus that is large enough to matter (above RAG’s retrieval quality threshold), small enough to fit in 1M tokens, and stable enough that you will run many queries before the next update. B2B codebase auditing, legal document analysis, product catalog deep-dives — these are the use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation: The Stateful Expert Pattern
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Add your Gemini API key to &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your_key_here&lt;/span&gt;
&lt;span class="py"&gt;GEMINI_MODEL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;
&lt;span class="py"&gt;GEMINI_CACHE_TTL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;7200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then register the configuration values in &lt;code&gt;config/services.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'gemini'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'api_key'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'base_url'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'https://generativelanguage.googleapis.com/v1beta'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'model'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_MODEL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="s1"&gt;'cache_ttl'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_CACHE_TTL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The minimum content size for Gemini context caching is 32,768 tokens — roughly 25,000 words. You cannot cache a handful of files. You need to be pushing a full module, a full codebase, or a substantial document set. Keep that constraint in mind when designing your corpus preparation step.&lt;/p&gt;

&lt;p&gt;If you have not yet set up your Gemini client or worked through model selection on your Laravel stack, our &lt;a href="https://origin-main.com/guides/laravel-gemini-integration/" rel="noopener noreferrer"&gt;Integrating Gemini into the Laravel AI SDK&lt;/a&gt; guide covers the foundational setup decisions before you start hitting the API directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The Context Caching Service
&lt;/h3&gt;

&lt;p&gt;This service owns two responsibilities: cache lifecycle (create, retrieve, delete) and query dispatch. We keep them in one class because the cache name is required for query — splitting them would force unnecessary dependency injection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Services&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Cache&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Http&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GeminiContextCacheService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$apiKey&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$baseUrl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$ttl&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.api_key'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.base_url'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.model'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.cache_ttl'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Return an existing live cache name from Redis, or create a new one.
     * The cache key is your application's identifier (e.g. "audit:project-42").
     * The Gemini cache name is what the API issues back — it looks like
     * "cachedContents/abc123xyz" and is what you reference in generation requests.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getOrCreateCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;array&lt;/span&gt;  &lt;span class="nv"&gt;$documents&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gemini_cache:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini cache hit'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$documents&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;createCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;array&lt;/span&gt;  &lt;span class="nv"&gt;$documents&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Each document becomes a separate part in the cached content&lt;/span&gt;
        &lt;span class="nv"&gt;$parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;array_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="k"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$doc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$doc&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="nv"&gt;$documents&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'x-goog-api-key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'application/json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/cachedContents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'model'&lt;/span&gt;            &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"models/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'displayName'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'systemInstruction'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'parts'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'contents'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'parts'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$parts&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'ttl'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini cache creation failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'body'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'key'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s1"&gt;'Failed to create Gemini context cache: '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$cacheName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// Store with a 60-second buffer before the Gemini TTL expires&lt;/span&gt;
        &lt;span class="c1"&gt;// so we never reference a cache the API has already deleted&lt;/span&gt;
        &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="s2"&gt;"gemini_cache:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini context cache created'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'key'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Run a user question against an existing cached context.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$question&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'x-goog-api-key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'Content-Type'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'application/json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/models/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;:generateContent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'cachedContent'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'contents'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'parts'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'text'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$question&lt;/span&gt;&lt;span class="p"&gt;]]],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="s1"&gt;'Gemini rate limit exceeded. Implement exponential backoff before retrying.'&lt;/span&gt;
                &lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini query failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'cache'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini query failed: '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'candidates.0.content.parts.0.text'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$text&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s1"&gt;'Unexpected Gemini response structure: '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="cd"&gt;/**
     * Explicitly delete a cache when the session ends.
     * Do not leave caches running — storage costs accumulate per hour.
     */&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;deleteCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gemini_cache:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'x-goog-api-key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;baseUrl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$geminiCacheName&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;Cache&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;forget&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gemini_cache:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini context cache deleted'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cacheKey&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Architect’s Note]&lt;/strong&gt; The 60-second TTL buffer on the Redis key is not optional. If your Laravel cache expiry aligns exactly with Gemini’s TTL, a query that arrives in the final seconds will find a valid Redis key but hit a deleted Gemini cache. The API returns a 404, your &lt;code&gt;query()&lt;/code&gt; method throws, and your user gets a 503. Build the buffer in. If you are concerned about token governance and want to log every token consumed by cached vs. standard queries separately, our &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel AI Middleware: Token Tracking &amp;amp; Rate Limiting&lt;/a&gt; guide covers the instrumentation layer you would wrap around this service.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 2: Binding to the Service Container
&lt;/h3&gt;

&lt;p&gt;Register the service as a singleton in &lt;code&gt;AppServiceProvider&lt;/code&gt;. Because the service reads from config in its constructor, a singleton is correct here — we want one instance per request cycle, not one per injection.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Providers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Services\GeminiContextCacheService&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\ServiceProvider&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AppServiceProvider&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;ServiceProvider&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;singleton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;GeminiContextCacheService&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Laravel’s Service Container handles the rest. Any controller, job, or command that type-hints &lt;code&gt;GeminiContextCacheService&lt;/code&gt; receives the same instance within a given request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Audit Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Http\Controllers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Services\GeminiContextCacheService&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\JsonResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodeAuditController&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Controller&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;GeminiContextCacheService&lt;/span&gt; &lt;span class="nv"&gt;$gemini&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'project_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'max:100'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'question'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'max:2000'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;loadProjectDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'project_id'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'You are a senior Laravel architect reviewing a production codebase. '&lt;/span&gt;
                &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'When answering questions, cite specific files and line numbers. '&lt;/span&gt;
                &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'Flag security risks, legacy patterns, and architectural concerns explicitly.'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

            &lt;span class="nv"&gt;$cacheName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getOrCreateCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;cacheKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="s2"&gt;"audit:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'project_id'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;systemPrompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="nv"&gt;$documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="nv"&gt;$answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$cacheName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'question'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'answer'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$answer&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RuntimeException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Code audit failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'project_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'project_id'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="s1"&gt;'error'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Audit service temporarily unavailable.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="mi"&gt;503&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;loadProjectDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$projectId&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;storage_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"app/projects/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$projectId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/codebase.txt"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;file_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s2"&gt;"Codebase not found for project: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$projectId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Return as array — each element becomes a separate Part in the cached content.&lt;/span&gt;
        &lt;span class="c1"&gt;// Split large codebases into logical chunks (per-module) if approaching 1M tokens.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;file_get_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;)];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register the route with Sanctum auth and a conservative throttle. Twenty questions per minute is generous for a codebase audit endpoint — adjust to match your billing ceiling.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/api.php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Http\Controllers\CodeAuditController&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'throttle:20,1'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/audit'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;CodeAuditController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'audit'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Securing this endpoint properly is not optional — an unauthenticated audit endpoint is a direct vector for API cost abuse. Our &lt;a href="https://origin-main.com/guides/laravel-sanctum-api-authentication/" rel="noopener noreferrer"&gt;Laravel Sanctum API Authentication&lt;/a&gt; guide covers token scoping and rate-limit configuration if you need to tighten this down further.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cache Lifecycle Management
&lt;/h2&gt;

&lt;p&gt;This is the section most tutorials omit. Skip it and you will find $800 in unexpected Gemini storage charges at the end of the month.&lt;/p&gt;

&lt;p&gt;Gemini charges per MTok per hour for every active cache. The pricing varies by model — check &lt;a href="https://ai.google.dev/pricing" rel="noopener noreferrer"&gt;Google’s official API pricing page&lt;/a&gt; for current rates. The pattern you want is: create the cache when the audit session starts, delete it when the session ends, and run a scheduled cleanup job to catch anything that leaked through.&lt;/p&gt;

&lt;p&gt;The cleanup command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Console\Commands&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Carbon\Carbon&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Console\Command&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Http&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CleanExpiredGeminiCaches&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$signature&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'gemini:clean-caches'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Delete expired or stale Gemini context caches'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$apiKey&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.api_key'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$baseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.base_url'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'x-goog-api-key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$apiKey&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$baseUrl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/cachedContents"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Failed to list caches: '&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;FAILURE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$caches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cachedContents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]);&lt;/span&gt;

        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$caches&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$cache&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$expireTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Carbon&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'expireTime'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$expireTime&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;isPast&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Http&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withHeaders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'x-goog-api-key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$apiKey&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$baseUrl&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

                &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Deleted stale Gemini cache'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]]);&lt;/span&gt;
                &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Deleted: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Schedule it in &lt;code&gt;bootstrap/app.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Schedule&lt;/span&gt; &lt;span class="nv"&gt;$schedule&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$schedule&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gemini:clean-caches'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;hourly&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Edge Case Alert]&lt;/strong&gt; If a queue worker crashes mid-audit before &lt;code&gt;deleteCache()&lt;/code&gt; is called, the Gemini cache lives until its TTL expires and storage charges accumulate. The scheduled cleanup command is your safety net, not a replacement for explicit deletion. Consider also dispatching a &lt;code&gt;DeleteGeminiCacheJob&lt;/code&gt; via &lt;code&gt;defer()&lt;/code&gt; at the end of each audit session as a belt-and-suspenders measure.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Use Case: The Persistent Auditor in Practice
&lt;/h2&gt;

&lt;p&gt;Here is where the architecture earns its keep. You are auditing a legacy Laravel 6 application for a client. The codebase is 400K tokens — controllers, models, migrations, the lot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The RAG approach:&lt;/strong&gt; You ask about &lt;code&gt;UserController::store()&lt;/code&gt;. The retrieval pulls three relevant chunks. The answer looks correct. But the client’s &lt;code&gt;BaseModel&lt;/code&gt; has an overridden &lt;code&gt;save()&lt;/code&gt; method that intercepts every Eloquent write and transforms certain fields. RAG did not retrieve &lt;code&gt;BaseModel&lt;/code&gt; — it was not semantically close enough to the query. The AI gives you an answer that describes the surface behaviour and misses the hack underneath. The audit is wrong.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cached-context approach:&lt;/strong&gt; You upload the full codebase. When you ask about &lt;code&gt;UserController::store()&lt;/code&gt;, Gemini has already read &lt;code&gt;BaseModel&lt;/code&gt;. It answers that the apparent save logic is intercepted downstream, identifies the override, and flags it as a maintenance risk. The context is complete. The answer is complete.&lt;/p&gt;

&lt;p&gt;This is what we mean by Architectural Integrity. RAG optimises for retrieval cost. Long-context caching optimises for answer correctness. For an audit product where a wrong answer has professional liability implications, correctness wins.&lt;/p&gt;

&lt;p&gt;The cost math, using our earlier figures: 20 audit questions on a 500K-token codebase costs $3.00 without caching and approximately $0.44 with caching — plus storage for however long the session runs. For a B2B audit priced at $200 per report, the infrastructure cost is negligible either way. But the quality difference is not.&lt;/p&gt;

&lt;p&gt;For production agentic pipelines — where the “questions” are not human prompts but tool calls from an orchestrated agent — the cost advantage compounds quickly. Our guide on &lt;a href="https://origin-main.com/laravel-architecture/laravel-agentic-workflow-schema-validation/" rel="noopener noreferrer"&gt;Hardening Laravel Agentic Workflows: Schema Validation Against LLM Hallucinations&lt;/a&gt; covers the validation layer you will want around those agent outputs, regardless of which AI architecture you choose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing Your Corpus for the Cache
&lt;/h2&gt;

&lt;p&gt;The code examples above load a pre-processed &lt;code&gt;codebase.txt&lt;/code&gt; file. In production, you need a pipeline that generates this file from your actual source. A simple Artisan command handles it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Console\Commands&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Console\Command&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\File&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Symfony\Component\Finder\Finder&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PrepareCodebaseCorpus&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$signature&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'corpus:prepare {project_id} {path}'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Concatenate a project codebase into a single corpus file'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$projectId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'project_id'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$path&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'path'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$finder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Finder&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;files&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;name&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'*.php'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'*.json'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'*.yaml'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'*.env.example'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;notPath&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'vendor'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'node_modules'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'storage'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'.git'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortByName&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nv"&gt;$corpus&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$finder&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$relativePath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;str_replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$path&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getRealPath&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
            &lt;span class="nv"&gt;$corpus&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;// FILE: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$relativePath&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="nv"&gt;$corpus&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$file&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getContents&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="nv"&gt;$counter&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$outputDir&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;storage_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"app/projects/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$projectId&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$outputPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$outputDir&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/codebase.txt'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;ensureDirectoryExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$outputDir&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$outputPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$corpus&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$sizeKb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$corpus&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Corpus prepared: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$counter&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; files, &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$sizeKb&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; KB → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$outputPath&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;SUCCESS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it before starting an audit session: &lt;code&gt;php artisan corpus:prepare project-42 /path/to/legacy-app&lt;/code&gt;. The file headers (&lt;code&gt;// FILE: app/Models/BaseModel.php&lt;/code&gt;) give Gemini the file path context it needs to cite locations in its answers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Efficiency Gain]&lt;/strong&gt; Strip comments and docblocks from PHP files before creating the corpus. A typical Laravel application loses 20–30% of its token count after comment stripping, which can push a borderline 1.2M-token codebase under the 1M limit and save meaningfully on cache creation costs. PHP’s &lt;code&gt;token_get_all()&lt;/code&gt; makes this straightforward; a simple token filter removes &lt;code&gt;T_COMMENT&lt;/code&gt; and &lt;code&gt;T_DOC_COMMENT&lt;/code&gt; tokens before concatenation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion: Context Is the New Database
&lt;/h2&gt;

&lt;p&gt;RAG will not disappear. For certain corpus shapes — enormous, dynamic, multi-tenant — it remains the correct architecture. But it has been treated as the default answer for too long, applied reflexively to problems that long-context models can now solve more cleanly and more accurately.&lt;/p&gt;

&lt;p&gt;Explicit context caching in Gemini changes the trade-off. The vector database, the chunking strategy, the embedding model, the retrieval tuning — for a well-bounded corpus, all of that overhead collapses into a single cached upload and a TTL you manage in Redis. The model sees the whole codebase. The answers reflect the whole codebase.&lt;/p&gt;

&lt;p&gt;The architecture we have built here — &lt;code&gt;GeminiContextCacheService&lt;/code&gt; bound as a singleton, TTL-buffered cache keys in Redis, explicit lifecycle management, a corpus preparation command — is production-ready. Wire up your Sanctum-protected audit endpoint, run the corpus preparation pipeline, and you have a stateful expert that knows your codebase better than any retrieval pipeline can.&lt;/p&gt;

&lt;p&gt;For official context caching documentation and current pricing, refer directly to &lt;a href="https://ai.google.dev/gemini-api/docs/caching" rel="noopener noreferrer"&gt;Google’s Gemini API Context Caching guide&lt;/a&gt; — pricing has shifted frequently in 2026 and the source of truth is Google’s pricing page, not any third-party aggregator.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>gemini</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Integrating Gemini into the Laravel AI SDK</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Thu, 16 Apr 2026 13:45:05 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/beyond-the-api-key-integrating-gemini-into-the-laravel-ai-sdk-325o</link>
      <guid>https://dev.to/dewaldhugo/beyond-the-api-key-integrating-gemini-into-the-laravel-ai-sdk-325o</guid>
      <description>&lt;p&gt;If your Laravel application already speaks to OpenAI or Anthropic through the first-party &lt;code&gt;laravel/ai&lt;/code&gt; SDK, adding Gemini feels like it should be a five-minute job. Set a new API key. Point to a new provider. Deploy. Done. That assumption is mostly correct, but the gap between “mostly” and “done” is where production systems fall apart. This guide closes that gap.&lt;/p&gt;

&lt;p&gt;We will cover the full &lt;strong&gt;Laravel Gemini integration&lt;/strong&gt; path: the OpenAI-compatible soft entry that lets you validate the connection in under ten minutes, the native Gemini provider configuration that actually belongs in a long-lived codebase, and a &lt;code&gt;GeminiManager&lt;/code&gt; service pattern that gives you model failover, rate-limit awareness, and a response contract your Blade templates never have to think about.&lt;/p&gt;

&lt;p&gt;By the end, Gemini will be a first-class citizen in your AI stack, not an afterthought bolted to a controller.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Add Gemini At All?
&lt;/h2&gt;

&lt;p&gt;Fair question. If you are already running &lt;code&gt;claude-sonnet-4-6&lt;/code&gt; or &lt;code&gt;gpt-4o&lt;/code&gt; in production, the case for a third provider needs to be more compelling than “Google exists.” Here is the honest pitch.&lt;/p&gt;

&lt;p&gt;Gemini 2.5 Flash is fast, cheap, and ships with a one-million-token context window as standard. For high-throughput tasks—document summarisation, classification pipelines, RAG over large corpora—it undercuts the competition on cost without giving up much capability. Gemini 2.5 Pro sits at the other end: serious reasoning, multimodal by default, and Vertex AI integration if you are already inside the Google Cloud ecosystem.&lt;/p&gt;

&lt;p&gt;The second reason is resilience. Single-provider AI architectures have a hard failure mode. When OpenAI rate-limits your 2 AM batch job, the entire feature goes dark. Gemini as a fallback provider changes that calculation. We are not talking about a hot standby you never test; we are talking about a properly designed failover that routes intelligently.&lt;/p&gt;

&lt;p&gt;The third reason is the &lt;code&gt;laravel/ai&lt;/code&gt; SDK itself. As covered in detail in &lt;a href="https://origin-main.com/laravel/laravel-13-ai-sdk-features-breakdown/" rel="noopener noreferrer"&gt;What Laravel 13 Actually Changes for AI Development&lt;/a&gt;, the SDK was designed from the start to be provider-agnostic. Gemini is a first-class supported provider. The architecture already expects you to run multiple models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Route 1: The OpenAI-Compatible Endpoint (The Soft Entry)
&lt;/h2&gt;

&lt;p&gt;Before committing to the native integration, you can validate that Gemini works for your use case using Google’s OpenAI-compatible endpoint. This is genuinely useful for a quick proof-of-concept, or when you are migrating from an existing &lt;code&gt;openai-php/client&lt;/code&gt; setup rather than the &lt;code&gt;laravel/ai&lt;/code&gt; SDK.&lt;/p&gt;

&lt;p&gt;Google maintains an endpoint at &lt;code&gt;https://generativelanguage.googleapis.com/v1beta/openai/&lt;/code&gt; that accepts Chat Completions API requests in the standard OpenAI format. The swap is mechanical: change the base URL, swap the API key, and change the model string.&lt;/p&gt;

&lt;p&gt;Start with your &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your-google-ai-studio-key&lt;/span&gt;
&lt;span class="py"&gt;GEMINI_BASE_URL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;https://generativelanguage.googleapis.com/v1beta/openai/&lt;/span&gt;
&lt;span class="py"&gt;GEMINI_MODEL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are on an existing OpenAI configuration in &lt;code&gt;config/ai.php&lt;/code&gt;, you can add a second connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/ai.php&lt;/span&gt;
&lt;span class="s1"&gt;'connections'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'openai'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'api_key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="s1"&gt;'gemini_compat'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// still using the OpenAI driver&lt;/span&gt;
        &lt;span class="s1"&gt;'api_key'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'base_url'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_BASE_URL'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now call Gemini through any code that already uses the &lt;code&gt;openai&lt;/code&gt; driver, just by swapping the connection name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Illuminate\Support\Facades\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;AI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gemini_compat'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This works well enough for a smoke test. Resist the urge to ship it as your production integration.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;[Production Pitfall]&lt;/strong&gt; The OpenAI-compatible endpoint has real coverage gaps. Gemini-native features—grounding via Google Search, the thinking budget parameter, native multimodal tool use, and fine-grained safety settings—are not available through the compat layer. If your feature depends on any of those, the compat approach will silently drop them and return a degraded response with no error. You will not find out until QA, or worse, production.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Route 2: The Native Gemini Provider (The Right Way)
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;laravel/ai&lt;/code&gt; SDK ships with a first-party Gemini driver. Use it. The compat approach is a migration crutch; the native driver is what belongs in your &lt;code&gt;config/ai.php&lt;/code&gt; long-term.&lt;/p&gt;

&lt;p&gt;Install the SDK if you have not already:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require laravel/ai
php artisan vendor:publish &lt;span class="nt"&gt;--provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Laravel&lt;/span&gt;&lt;span class="se"&gt;\A&lt;/span&gt;&lt;span class="s2"&gt;i&lt;/span&gt;&lt;span class="se"&gt;\A&lt;/span&gt;&lt;span class="s2"&gt;iServiceProvider"&lt;/span&gt;
php artisan migrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then register the native Gemini connection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/ai.php&lt;/span&gt;
&lt;span class="s1"&gt;'connections'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'openai'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'api_key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'OPENAI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="s1"&gt;'anthropic'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'anthropic'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'api_key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ANTHROPIC_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="s1"&gt;'gemini'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'api_key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;

&lt;span class="s1"&gt;'default'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'AI_DEFAULT_CONNECTION'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your &lt;code&gt;.env&lt;/code&gt; stays clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;your-google-ai-studio-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A basic request through the native driver looks identical to your existing OpenAI calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Illuminate\Support\Facades\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;AI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the surface. Now let’s build something that actually runs in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Selection: Which Gemini Model?
&lt;/h2&gt;

&lt;p&gt;Not all Gemini models are created equal, and Google’s naming conventions have become genuinely confusing over the last year. Here is the practical breakdown for production use.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gemini-2.5-flash&lt;/code&gt; is your default. It hit general availability in June 2025, it is fast, it is cost-efficient, and it handles up to one million tokens of context. For most text generation, summarisation, and classification tasks, it is the correct choice.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;gemini-2.5-pro&lt;/code&gt; is for heavy lifting. Complex reasoning chains, code generation over large codebases, long-form synthesis tasks. It is meaningfully more expensive than Flash and slower under load, so do not reach for it by default.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;[Edge Case Alert]&lt;/strong&gt; Do not use &lt;code&gt;gemini-2.0-flash&lt;/code&gt; in new integrations. Google has announced that Gemini 2.0 Flash and Flash-Lite models will be shut down on June 1, 2026. Any production system pointing at those model strings will start throwing &lt;code&gt;404&lt;/code&gt; errors at that date with no warning beyond the deprecation announcement. If you have any existing references to &lt;code&gt;gemini-2.0-flash&lt;/code&gt; in your codebase, change them now.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For preview models in the Gemini 3.x family, treat them exactly as you would treat &lt;code&gt;preview&lt;/code&gt; Claude models: useful for internal testing, not for customer-facing production paths until they have a stable designation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The GeminiManager Pattern
&lt;/h2&gt;

&lt;p&gt;A single AI connection configured in &lt;code&gt;config/ai.php&lt;/code&gt; is fine for prototyping. Production systems need something more deliberate: model selection logic, rate-limit handling, failover between providers, and a response contract that the rest of your application can depend on regardless of which model actually ran.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;GeminiManager&lt;/code&gt; service class is that layer. It sits between the &lt;code&gt;laravel/ai&lt;/code&gt; facade and your application code, owns the failover logic, and returns a typed response object your Blade templates or API controllers can render without knowing which provider answered the request.&lt;/p&gt;

&lt;p&gt;Start by creating the service in &lt;code&gt;app/Services/AI/&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Services\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="no"&gt;Illuminate\Support\Facades\AI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Log&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\DTOs\AiResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GeminiManager&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$modelPriority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'gemini-2.5-pro'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$fallbackConnection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'openai'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$fallbackModel&lt;/span&gt;      &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o-mini'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="kt"&gt;AiResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;modelPriority&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;attemptGemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AiResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;   &lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;inputTokens&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;inputTokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;outputTokens&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;outputTokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Exception&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini model failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'model'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'error'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="s1"&gt;'prompt_length'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;strlen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;]);&lt;/span&gt;

                &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// All Gemini models exhausted. Fall back to configured provider.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;attemptGemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;mixed&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;AI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;$options&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;AiResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'All Gemini models exhausted. Falling back.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'fallback_connection'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackConnection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'fallback_model'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nv"&gt;$response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;AI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackConnection&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AiResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;     &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;       &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;fallbackConnection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;inputTokens&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;inputTokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;outputTokens&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;outputTokens&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;AiResponse&lt;/code&gt; DTO keeps your response contract stable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\DTOs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AiResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$inputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$outputTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bind it in your &lt;code&gt;bootstrap/app.php&lt;/code&gt;, using Laravel 11/12 syntax:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Services\AI\GeminiManager&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withProviders&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;app&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;singleton&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;GeminiManager&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;&lt;strong&gt;[Architect’s Note]&lt;/strong&gt; This pattern mirrors exactly what we described in &lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;Production-Grade AI Architecture in Laravel: Contracts, Governance &amp;amp; Telemetry&lt;/a&gt;—the response DTO is your contract, and nothing outside the service layer should care which provider produced the response. The moment your Blade template starts conditionally rendering based on &lt;code&gt;$response-&amp;gt;provider&lt;/code&gt;, you have an architecture problem, not a template problem.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limit Handling
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;GeminiManager&lt;/code&gt; above catches all exceptions and continues to the next model. That is fine for a first pass but too broad for production. You want to distinguish between rate-limit errors, which should trigger immediate failover, and authentication or configuration errors, which should not silently retry.&lt;/p&gt;

&lt;p&gt;Gemini rate-limit responses return HTTP &lt;code&gt;429&lt;/code&gt;. A tighter catch block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Client\RequestException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;attemptGemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$options&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;mixed&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;AI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$prompt&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="mf"&gt;...&lt;/span&gt;&lt;span class="nv"&gt;$options&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RequestException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Rate limited. Try next model.&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nv"&gt;$status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Auth failure. Do not retry silently.&lt;/span&gt;
            &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini authentication failure'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'status'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$status&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;\RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Gemini authentication failed. Check GEMINI_API_KEY.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Anything else: surface it.&lt;/span&gt;
        &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For consistent rate-limit and token tracking across all your AI providers, the middleware pattern we detailed in &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel AI Middleware: Token Tracking &amp;amp; Rate Limiting&lt;/a&gt; plugs directly into this architecture. Wire it as HTTP middleware on your AI routes and your &lt;code&gt;AiResponse&lt;/code&gt; DTO’s token counts feed it automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping Your Blade Templates Clean
&lt;/h2&gt;

&lt;p&gt;This is the part most tutorials ignore. It is also where the design falls apart in real applications.&lt;/p&gt;

&lt;p&gt;Your Blade templates should receive an &lt;code&gt;AiResponse&lt;/code&gt; object and render &lt;code&gt;$response-&amp;gt;content&lt;/code&gt;. That is it. They should not know the provider is Gemini, they should not branch on &lt;code&gt;$response-&amp;gt;model&lt;/code&gt;, and they should never contain try/catch logic. If your failover logic is working correctly, the view never knows a failover happened.&lt;/p&gt;

&lt;p&gt;In a controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Services\AI\GeminiManager&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ArticleSummaryController&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Controller&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;GeminiManager&lt;/span&gt; &lt;span class="nv"&gt;$ai&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Article&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;View&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"Summarise this article in three sentences: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;view&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'articles.show'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'article'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'summary'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Blade template:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;div&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"summary"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    {{ $summary-&amp;gt;content }}
&lt;span class="nt"&gt;&amp;lt;/div&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done. No provider logic. No model names. No error handling. The controller injects the service, the service handles everything messy, and the view renders a string.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;[Word to the Wise]&lt;/strong&gt; The discipline to keep AI provider logic out of your views pays off when you swap models six months from now. And you will swap models. Gemini 2.5 Flash will eventually give way to whatever Google releases next. If your template has &lt;code&gt;@if ($summary-&amp;gt;model === 'gemini-2.5-flash')&lt;/code&gt; buried inside it, that migration becomes a grep exercise across your entire view layer. Design the abstraction correctly now.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Registering Gemini as a Queued Job Provider
&lt;/h2&gt;

&lt;p&gt;Synchronous AI calls in web requests are a bad idea at scale. Wrap your &lt;code&gt;GeminiManager&lt;/code&gt; calls in a queued job for anything that is not a real-time user interaction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;?php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Jobs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Services\AI\GeminiManager&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\Article&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Bus\Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Queue\ShouldQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Foundation\Bus\Dispatchable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GenerateArticleSummary&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Dispatchable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Queueable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tries&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$backoff&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// seconds&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$articleId&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;GeminiManager&lt;/span&gt; &lt;span class="nv"&gt;$ai&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$article&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Article&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;findOrFail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;articleId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$ai&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"Summarise this article in three sentences: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$article&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'summary'&lt;/span&gt;          &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'summary_model'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'summary_provider'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$summary&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\Throwable&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Log&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Article summary generation failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'article_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;articleId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'error'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;$tries = 3&lt;/code&gt; and &lt;code&gt;$backoff = 30&lt;/code&gt; give you three attempts with a 30-second gap between them, which is usually enough to ride out a transient Gemini rate limit without wasting queue resources. The failed hook gives you an audit trail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Official Documentation
&lt;/h2&gt;

&lt;p&gt;Two references worth bookmarking as you build this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://ai.google.dev/gemini-api/docs/openai" rel="noopener noreferrer"&gt;Google Gemini OpenAI Compatibility Documentation&lt;/a&gt; — the canonical reference for what is and is not supported through the compat layer, including current model strings and endpoint coverage.&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://laravel.com/blog/introducing-the-laravel-ai-sdk" rel="noopener noreferrer"&gt;Laravel AI SDK Documentation&lt;/a&gt; — the official introduction covering provider setup, agent workflows, and the provider-agnostic design philosophy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Production Deployment Checklist
&lt;/h2&gt;

&lt;p&gt;A few things that catch teams off-guard when they move this from staging to production.&lt;/p&gt;

&lt;p&gt;Google AI Studio API keys are rate-limited by default. For production workloads, you need to request increased quota through Google Cloud, or switch to Vertex AI. The Vertex AI endpoint requires a different authentication mechanism (service account credentials rather than a plain API key) and a slightly different base URL, so plan for that migration if your volume grows.&lt;/p&gt;

&lt;p&gt;Environment-specific model configuration matters. You might run &lt;code&gt;gemini-2.5-flash&lt;/code&gt; in production for cost reasons and &lt;code&gt;gemini-2.5-pro&lt;/code&gt; in staging for testing purposes. Keep that in your &lt;code&gt;.env&lt;/code&gt; and out of your &lt;code&gt;GeminiManager&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# .env.production
&lt;/span&gt;&lt;span class="py"&gt;GEMINI_MODEL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;

&lt;span class="c"&gt;# .env.staging
&lt;/span&gt;&lt;span class="py"&gt;GEMINI_MODEL&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-pro&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read it in the service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$modelPriority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;modelPriority&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'services.gemini.model'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="s1"&gt;'gemini-2.5-pro'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// fallback within Gemini&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And add to &lt;code&gt;config/services.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'gemini'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'model'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_MODEL'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'gemini-2.5-flash'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are deploying to Laravel Forge or a similar managed environment, confirm your queue workers restart after deployment. A worker that caches the old &lt;code&gt;GeminiManager&lt;/code&gt; configuration will ignore model changes until it restarts. This is not Gemini-specific, but it catches teams every time they change provider configuration without restarting workers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;[Efficiency Gain]&lt;/strong&gt; Gemini 2.5 Flash’s one-million-token context window is genuinely useful for RAG pipelines. If you are building document search or retrieval-augmented generation, the ability to stuff a much larger context window means fewer chunking edge cases and simpler retrieval logic. If you are already running embeddings and vector search in Laravel, the integration pattern described in &lt;a href="https://origin-main.com/ai-foundations/laravel-vector-database-embeddings-pgvector-rag/" rel="noopener noreferrer"&gt;Laravel Embeddings, Vector Databases, and RAG: A Production Implementation Guide&lt;/a&gt; maps directly to Gemini’s native multimodal context handling.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to Go From Here
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;GeminiManager&lt;/code&gt; pattern above is a solid foundation. The logical next step is versioning your prompts so that model upgrades do not introduce silent regressions—something that becomes critical the moment you start comparing output quality between &lt;code&gt;gemini-2.5-flash&lt;/code&gt; and whatever Gemini 3.x stable lands in your production window. The &lt;a href="https://origin-main.com/laravel-architecture/laravel-prompt-migrations-version-control-ai-prompts/" rel="noopener noreferrer"&gt;Prompt Migrations: Bringing Determinism to AI in Laravel&lt;/a&gt; tutorial covers exactly that workflow.&lt;/p&gt;

&lt;p&gt;The Google AI ecosystem moves fast. &lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt; is already in the preview tier as of this writing, and Google is actively iterating. The architecture we have built here isolates model selection to a single config value and a single service class. When the next stable Gemini model lands, your migration is a one-line change.&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>gemini</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>Laravel Filament Admin Dashboard for AI Applications: Token Costs, Prompt Management, and Agent Audit Trails</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Thu, 16 Apr 2026 03:34:17 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-filament-admin-dashboard-for-ai-applications-token-costs-prompt-management-and-agent-10g9</link>
      <guid>https://dev.to/dewaldhugo/laravel-filament-admin-dashboard-for-ai-applications-token-costs-prompt-management-and-agent-10g9</guid>
      <description>&lt;p&gt;You have the AI pipeline. You have the service classes, the middleware, the queued jobs. The backend is instrumented. What you do not have is any way to &lt;em&gt;look&lt;/em&gt; at it without writing a SQL query at 2am because costs just spiked.&lt;/p&gt;

&lt;p&gt;That is the gap this article closes.&lt;/p&gt;

&lt;p&gt;A Laravel Filament admin dashboard is not the glamorous part of AI development. But the teams that skip it are the ones who discover they have been over-spending on tokens for six weeks, or that an agent ran a destructive tool call nobody authorised, or that a prompt change from last Tuesday quietly broke output quality across the board. You cannot govern what you cannot see.&lt;/p&gt;

&lt;p&gt;The Laravel Filament admin dashboard built in this guide surfaces four things: real-time token cost monitoring, usage analytics broken down by user and model, a prompt version management panel, and a full agent audit trail. Each section is a Filament v3 Resource or Widget, built on top of the telemetry data your existing middleware and service layer already generate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Filament, and Why Now
&lt;/h2&gt;

&lt;p&gt;There is no serious competition in the Laravel ecosystem for building admin interfaces right now. Filament v3 ships with a full panel system, Livewire-powered tables, stats widgets, chart widgets, custom pages, and a role-based authorization layer. It works inside your existing Laravel application. It does not require a separate process or a Node frontend.&lt;/p&gt;

&lt;p&gt;The Livewire-driven rendering keeps things reactive without JavaScript complexity. The Eloquent-backed Resources mean your existing models map directly to admin panels with almost no boilerplate. And because it is all Laravel under the hood, your Service Container, policies, events, and observers all still apply — you are not learning a new paradigm, just adding a UI layer on top of architecture you have already built.&lt;/p&gt;

&lt;p&gt;Filament v3 requires PHP 8.1+ and Laravel 10+. On Laravel 11 or 12 you will be fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Panel Setup
&lt;/h2&gt;

&lt;p&gt;Install Filament with the panel scaffolding command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require filament/filament:&lt;span class="s2"&gt;"^3.0"&lt;/span&gt; &lt;span class="nt"&gt;-W&lt;/span&gt;
php artisan filament:install &lt;span class="nt"&gt;--panels&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This generates &lt;code&gt;app/Providers/Filament/AdminPanelProvider.php&lt;/code&gt; and registers it via &lt;code&gt;bootstrap/app.php&lt;/code&gt;. The panel provider is where you wire discovery, colours, middleware, and authentication. Here is a clean starting configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Providers/Filament/AdminPanelProvider.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Providers\Filament&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Http\Middleware\Authenticate&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Http\Middleware\DisableBladeIconComponents&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Http\Middleware\DispatchServingFilamentEvent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Panel&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\PanelProvider&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Support\Colors\Color&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Cookie\Middleware\AddQueuedCookiesToResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Cookie\Middleware\EncryptCookies&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Foundation\Http\Middleware\VerifyCsrfToken&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Routing\Middleware\SubstituteBindings&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Session\Middleware\AuthenticateSession&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Session\Middleware\StartSession&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\View\Middleware\ShareErrorsFromSession&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AdminPanelProvider&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;PanelProvider&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;panel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Panel&lt;/span&gt; &lt;span class="nv"&gt;$panel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Panel&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$panel&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'admin'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'admin'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;login&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;colors&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'primary'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Color&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nc"&gt;Violet&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;discoverResources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;app_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Filament/Resources'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'App\\Filament\\Resources'&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;discoverPages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;app_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Filament/Pages'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'App\\Filament\\Pages'&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;discoverWidgets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;app_path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Filament/Widgets'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'App\\Filament\\Widgets'&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;EncryptCookies&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;AddQueuedCookiesToResponse&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;StartSession&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;AuthenticateSession&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;ShareErrorsFromSession&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;VerifyCsrfToken&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;SubstituteBindings&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;DisableBladeIconComponents&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nc"&gt;DispatchServingFilamentEvent&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;authMiddleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;Authenticate&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;authorization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_admin&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last &lt;code&gt;-&amp;gt;authorization()&lt;/code&gt; call is the line most tutorials omit. Without it, any authenticated user can reach your admin panel. We will come back to access control later — but set that guard from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modeling the Data Your Dashboard Will Surface
&lt;/h2&gt;

&lt;p&gt;Before you build a single Filament Resource, you need to decide what data you are collecting. On a production AI application, there are two primary log types worth persisting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Usage Logs&lt;/strong&gt; capture every inference call: which provider, which model, how many tokens, what it cost, which feature triggered it, and whether it succeeded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent Audit Logs&lt;/strong&gt; capture agentic actions: which agent class ran, what action it performed, which tool it called, what the input and output were, and how long it took.&lt;/p&gt;

&lt;p&gt;Both should be Eloquent models backed by indexed tables. Start with the usage log migration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// database/migrations/2026_04_15_000001_create_ai_usage_logs_table.php&lt;/span&gt;

&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai_usage_logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;foreignId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;constrained&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullOnDelete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'provider'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;           &lt;span class="c1"&gt;// 'anthropic', 'openai'&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'model'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feature'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// 'chat', 'summarize', 'agent'&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cost_usd'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt_version'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'metadata'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'error_message'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'provider'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'model'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'feature'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compound indexes on &lt;code&gt;created_at&lt;/code&gt; are non-negotiable. Dashboard widgets run aggregation queries constantly, and without them you will feel the difference under real traffic.&lt;/p&gt;

&lt;p&gt;The agent audit log has a slightly different shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// database/migrations/2026_04_15_000002_create_agent_audit_logs_table.php&lt;/span&gt;

&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agent_audit_logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;foreignId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;constrained&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullOnDelete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'action'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tool_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'input_payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'output_payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tokens_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'duration_ms'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both models should cast their JSON columns and use standard Eloquent relationships back to &lt;code&gt;User&lt;/code&gt;. Nothing exotic required.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Cost Monitoring: The Stats Overview Widget
&lt;/h2&gt;

&lt;p&gt;The first thing an operator needs to see when they open the dashboard is a top-level summary of what the AI layer has cost in the last 24 hours. Filament’s &lt;code&gt;StatsOverviewWidget&lt;/code&gt; handles this cleanly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan make:filament-widget AiStatsOverview &lt;span class="nt"&gt;--stats-overview&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Filament/Widgets/AiStatsOverview.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Filament\Widgets&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AiUsageLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Widgets\StatsOverviewWidget&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;BaseWidget&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Widgets\StatsOverviewWidget\Stat&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AiStatsOverview&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;BaseWidget&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?int&lt;/span&gt; &lt;span class="nv"&gt;$sort&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getStats&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$today&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiUsageLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;whereDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
        &lt;span class="nv"&gt;$month&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiUsageLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;whereMonth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;whereYear&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;Stat&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Today's Tokens"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;number_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$today&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Across all providers and models'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'primary'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="nc"&gt;Stat&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Today's Cost"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$'&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nb"&gt;number_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$today&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cost_usd'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'USD — live tally'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'warning'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="nc"&gt;Stat&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Monthly Spend'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'$'&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nb"&gt;number_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$month&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cost_usd'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'F Y'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'danger'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

            &lt;span class="nc"&gt;Stat&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Failures Today'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;number_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$today&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;description&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Inference errors across all features'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'gray'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register this widget on your dashboard in the panel provider’s &lt;code&gt;-&amp;gt;widgets()&lt;/code&gt; array. It polls on a configurable interval using Livewire’s polling mechanism — add &lt;code&gt;protected static ?string $pollingInterval = '30s';&lt;/code&gt; if you want it to refresh automatically.&lt;/p&gt;

&lt;p&gt;If you have not yet built the middleware layer that feeds these logs, the &lt;a href="https://origin-main.com/laravel-architecture/laravel-ai-middleware-token-tracking/" rel="noopener noreferrer"&gt;Laravel AI Middleware guide on token tracking and rate limiting&lt;/a&gt; covers the exact pattern for intercepting inference calls and persisting usage data before the response reaches the controller.&lt;/p&gt;

&lt;h2&gt;
  
  
  Daily Token Usage: The Chart Widget
&lt;/h2&gt;

&lt;p&gt;Raw totals are useful. Trend lines are what actually surface problems — a cost spike on Tuesday, a failure rate that spiked after a deployment. Add a chart widget to give operators a 30-day view:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan make:filament-widget TokenUsageChart &lt;span class="nt"&gt;--chart&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Filament/Widgets/TokenUsageChart.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Filament\Widgets&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AiUsageLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Widgets\ChartWidget&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenUsageChart&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;ChartWidget&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$heading&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Daily Token Usage — Last 30 Days'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?int&lt;/span&gt; &lt;span class="nv"&gt;$sort&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'tokens'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getFilters&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;?array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Total Tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'cost'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Cost (USD)'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getData&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$column&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'cost'&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'cost_usd'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="nv"&gt;$label&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'cost'&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'Cost (USD)'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Total Tokens'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiUsageLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;selectRaw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"DATE(created_at) as date, SUM(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$column&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) as value"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'&amp;gt;='&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;subDays&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;groupBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;orderBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'datasets'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="s1"&gt;'label'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$label&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'data'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pluck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'value'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toArray&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                    &lt;span class="s1"&gt;'borderColor'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'#8b5cf6'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'tension'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s1"&gt;'fill'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'labels'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;pluck&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'date'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toArray&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getType&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s1"&gt;'line'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;$filter&lt;/code&gt; toggle lets operators switch between token volume and cost in a single widget, which keeps the dashboard clean without multiplying panels.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Usage Log Resource: Per-User and Per-Model Breakdown
&lt;/h2&gt;

&lt;p&gt;Stats and charts give you the summary. When you need to investigate — which user ran 400,000 tokens in one session, which model is responsible for the cost spike — you need a sortable, filterable table.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan make:filament-resource AiUsageLog &lt;span class="nt"&gt;--generate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then replace the generated table definition with something that has real operator value:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Filament/Resources/AiUsageLogResource.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Filament\Resources&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Filament\Resources\AiUsageLogResource\Pages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AiUsageLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Resources\Resource&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\IconColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\TextColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Filters\SelectFilter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Filters\Filter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Database\Eloquent\Builder&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AiUsageLogResource&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;           &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiUsageLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationIcon&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'heroicon-o-chart-bar'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationGroup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'AI Governance'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Usage Logs'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Table&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Table&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user.name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'User'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'—'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'provider'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'model'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;copyable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feature'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'info'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'—'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Tables\Columns\Summarizers\Sum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Total'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cost_usd'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;money&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'usd'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Tables\Columns\Summarizers\Sum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Total'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt_version'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'warning'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'—'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;

                &lt;span class="nc"&gt;IconColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;

                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;dateTime&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;since&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toggleable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;SelectFilter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'provider'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="s1"&gt;'anthropic'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Anthropic'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'openai'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'OpenAI'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;]),&lt;/span&gt;

                &lt;span class="nc"&gt;SelectFilter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="s1"&gt;'1'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Successful'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="s1"&gt;'0'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Failed'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;]),&lt;/span&gt;

                &lt;span class="nc"&gt;Filter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'today'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Today only'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Builder&lt;/span&gt; &lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$query&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;whereDate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;())),&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;defaultSort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'desc'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;paginated&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getPages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'index'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Pages\ListAiUsageLogs&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canCreate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things to note here. First, &lt;code&gt;-&amp;gt;summarize()&lt;/code&gt; on the token and cost columns gives you running totals in the table footer — incredibly useful when you have applied filters to isolate a user or a date range. Second, &lt;code&gt;canCreate(): false&lt;/code&gt; disables the “New record” button. Usage logs are system-generated. Nobody should be hand-entering them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://origin-main.com/laravel-architecture/production-grade-ai-architecture-in-laravel/" rel="noopener noreferrer"&gt;Production-Grade AI Architecture in Laravel: Contracts, Governance &amp;amp; Telemetry&lt;/a&gt; details the telemetry layer that populates this data. If your AI services are not already emitting structured usage events to an observable logging interface, that article is the right place to start before wiring up this dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Version Management
&lt;/h2&gt;

&lt;p&gt;Operators need to know which prompt is active. They also need to see when it changed, what the previous version said, and whether cost or quality shifted after a change. This is where the dashboard and your version control system connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://origin-main.com/laravel-architecture/laravel-prompt-migrations-version-control-ai-prompts/" rel="noopener noreferrer"&gt;Prompt Migrations: Bringing Determinism to AI in Laravel&lt;/a&gt; covers building the underlying prompt migration system. Assuming you have a &lt;code&gt;prompt_versions&lt;/code&gt; table from that guide, the Filament Resource surfaces it with minimal effort:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan make:filament-resource PromptVersion &lt;span class="nt"&gt;--generate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Filament/Resources/PromptVersionResource.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Filament\Resources&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Filament\Resources\PromptVersionResource\Pages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\PromptVersion&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Forms\Components\Textarea&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Forms\Components\TextInput&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Forms\Components\Toggle&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Forms\Form&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Resources\Resource&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\IconColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\TextColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PromptVersionResource&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;           &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PromptVersion&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationIcon&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'heroicon-o-document-text'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationGroup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'AI Governance'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Prompt Versions'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;form&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Form&lt;/span&gt; &lt;span class="nv"&gt;$form&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Form&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$form&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;TextInput&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextInput&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'version'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;required&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;Textarea&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'system_prompt'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columnSpanFull&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;Textarea&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'description'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columnSpanFull&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;Toggle&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'is_active'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Active'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;disabled&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Table&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Table&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'key'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'version'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'primary'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'description'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'—'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;IconColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'is_active'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Active'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;dateTime&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;defaultSort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'desc'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getPages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'index'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Pages\ListPromptVersions&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s1"&gt;'view'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Pages\ViewPromptVersion&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/{record}'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canCreate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canEdit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canDelete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three &lt;code&gt;can*&lt;/code&gt; methods return false. Prompt changes belong in migrations under version control. The admin panel gives you read access only. This is intentional. The moment you allow operators to edit prompt content via a UI form, you have created an unversioned, un-reviewable, un-rollbackable change path that will cause you pain.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Architect’s Note]&lt;/strong&gt; Keep your admin panel read-only for prompt content. The dashboard is a &lt;em&gt;governance tool&lt;/em&gt;, not a content editor. Every prompt mutation should flow through a migration, be committed to version control, and be deployable like code. If your team is asking for a UI to edit prompts live, that is a process problem masquerading as a tooling request. Fix the process.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Agent Audit Trails
&lt;/h2&gt;

&lt;p&gt;This is the section most teams skip. It is also the one that saves them when something goes wrong.&lt;/p&gt;

&lt;p&gt;Agentic workflows make decisions. They call tools. They read data and write data. When a user complains that the agent deleted something it should not have, or an integration breaks because the agent started formatting its output differently, you need a record of exactly what happened.&lt;/p&gt;

&lt;p&gt;If you are building agentic pipelines, the &lt;a href="https://origin-main.com/laravel-architecture/laravel-agentic-workflow-schema-validation/" rel="noopener noreferrer"&gt;Hardening Laravel Agentic Workflows guide&lt;/a&gt; covers schema validation and output hardening. The audit trail built here is the operational companion to that — it answers “what did it do?” as opposed to “did it do it correctly?”&lt;/p&gt;

&lt;p&gt;Emit an audit log from your base agent class on every action:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/AI/Agents/BaseAgent.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\AI\Agents&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AgentAuditLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;abstract&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BaseAgent&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;logAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;array&lt;/span&gt; &lt;span class="nv"&gt;$output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$toolUsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tokensUsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;float&lt;/span&gt; &lt;span class="nv"&gt;$durationMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="nv"&gt;$success&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;AgentAuditLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'user_id'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="s1"&gt;'agent_class'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'action'&lt;/span&gt;         &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$action&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'tool_used'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$toolUsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'input_payload'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'output_payload'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'tokens_used'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$tokensUsed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'duration_ms'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$durationMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'success'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$success&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'error'&lt;/span&gt;          &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Filament Resource for the audit log is where the detail view earns its keep. Operators need to be able to click into a single agent run and see the full input/output payload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Filament/Resources/AgentAuditLogResource.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Filament\Resources&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Filament\Resources\AgentAuditLogResource\Pages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AgentAuditLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Infolists\Components\KeyValueEntry&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Infolists\Components\Section&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Infolists\Components\TextEntry&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Infolists\Infolist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Resources\Resource&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\IconColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Columns\TextColumn&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Filament\Tables\Filters\SelectFilter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentAuditLogResource&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;           &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentAuditLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationIcon&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'heroicon-o-cpu-chip'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationGroup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'AI Governance'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;?string&lt;/span&gt; &lt;span class="nv"&gt;$navigationLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Agent Audit Trail'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Table&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Table&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user.name'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'User'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'System'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'action'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;searchable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tool_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'info'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'—'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tokens_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'duration_ms'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ms'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;IconColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="nc"&gt;TextColumn&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;dateTime&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;sortable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;since&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;SelectFilter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'1'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Successful'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'0'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Failed'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
                &lt;span class="nc"&gt;SelectFilter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agentClass'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;defaultSort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'desc'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;recordUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;getUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'view'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'record'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;]));&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;infolist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Infolist&lt;/span&gt; &lt;span class="nv"&gt;$infolist&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Infolist&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$infolist&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="nc"&gt;Section&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Run Details'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'agent_class'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'action'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tool_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'info'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'None'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'tokens_used'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'duration_ms'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Duration (ms)'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;numeric&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                        &lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'success'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;badge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getStateUsing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                            &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'Success'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'Failed'&lt;/span&gt;
                        &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$state&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'Success'&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="s1"&gt;'success'&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'danger'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="p"&gt;]),&lt;/span&gt;

                &lt;span class="nc"&gt;Section&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Input Payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="nc"&gt;KeyValueEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'input_payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="p"&gt;]),&lt;/span&gt;

                &lt;span class="nc"&gt;Section&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Output Payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                        &lt;span class="nc"&gt;KeyValueEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'output_payload'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;label&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="p"&gt;]),&lt;/span&gt;

                &lt;span class="nc"&gt;Section&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Error'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nc"&gt;TextEntry&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;make&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;placeholder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'None'&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
                    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;visible&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;getPages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'index'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Pages\ListAgentAuditLogs&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="s1"&gt;'view'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;Pages\ViewAgentAuditLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/{record}'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canCreate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;canEdit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$record&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;Infolist&lt;/code&gt; view is the right choice here over a form, because you are displaying data, not editing it. The &lt;code&gt;KeyValueEntry&lt;/code&gt; component renders JSON payloads cleanly without any custom Blade work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Locking Down the Admin Panel
&lt;/h2&gt;

&lt;p&gt;Your Filament admin is a high-value target. It contains cost data, user data, prompt intellectual property, and agent decision history. Basic access control is not optional.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;-&amp;gt;authorization()&lt;/code&gt; callback in your panel provider is your first line of defence. The example earlier used &lt;code&gt;is_admin&lt;/code&gt; on the User model. In practice, most teams reach for Spatie’s Laravel Permission package for this. Your panel provider call then becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;authorization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;hasRole&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'admin'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For more granular control over individual Resources, use Laravel Policies. Filament automatically discovers and respects a policy for any model that has one. A &lt;code&gt;AiUsageLogPolicy&lt;/code&gt; with &lt;code&gt;viewAny&lt;/code&gt; returning &lt;code&gt;$user-&amp;gt;hasRole('admin')&lt;/code&gt; gives you fine-grained control without putting access logic inside the Resource class itself.&lt;/p&gt;

&lt;p&gt;One pattern worth enforcing: scope your AI admin panel to a separate &lt;code&gt;admin&lt;/code&gt; guard, and ensure it sits behind your Sanctum token authentication if your application issues API tokens to admin users. The &lt;a href="https://origin-main.com/guides/laravel-sanctum-api-authentication/" rel="noopener noreferrer"&gt;Laravel Sanctum API Authentication guide&lt;/a&gt; covers token scoping and ability verification, and the same principles apply when you are protecting admin panel access programmatically.&lt;/p&gt;

&lt;p&gt;Do not expose the Filament panel under the default &lt;code&gt;/admin&lt;/code&gt; path in production. Change the path in your panel provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ops-internal'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// or any non-obvious string&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not security through obscurity as a replacement for authentication. It is a minor but free win against automated scanning.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying Filament Alongside Your AI Stack
&lt;/h2&gt;

&lt;p&gt;Filament adds Livewire, Alpine.js, and its own asset pipeline to your application. Running &lt;code&gt;php artisan filament:assets&lt;/code&gt; publishes vendor assets to your &lt;code&gt;public/&lt;/code&gt; directory. Make sure your deployment script runs this command after every Filament update — if you skip it, operators will see unstyled panels or JavaScript errors after a version bump.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan filament:assets
php artisan filament:optimize  &lt;span class="c"&gt;# Filament v3.1+&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;filament:optimize&lt;/code&gt; caches component discovery. On applications with many Resources and Widgets, it makes a measurable difference to panel load time.&lt;/p&gt;

&lt;p&gt;If your application uses Redis for caching, you may want to cache the stats queries powering your overview widgets. Filament’s widgets do not cache their data by default. For a stats widget that runs four &lt;code&gt;SUM()&lt;/code&gt; aggregations on every page load, wrapping them in &lt;code&gt;cache()-&amp;gt;remember()&lt;/code&gt; with a 60-second TTL is worth doing before you go to production.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://origin-main.com/guides/deploy-laravel-to-production/" rel="noopener noreferrer"&gt;complete Laravel production deployment guide&lt;/a&gt; covers the full deployment pipeline in detail. When you add Filament, extend the deploy script to include &lt;code&gt;filament:assets&lt;/code&gt; and &lt;code&gt;filament:optimize&lt;/code&gt; alongside your standard &lt;code&gt;migrate&lt;/code&gt;, &lt;code&gt;config:cache&lt;/code&gt;, and &lt;code&gt;queue:restart&lt;/code&gt; steps. Missing any of these on a Filament upgrade is a common source of post-deploy surprises.&lt;/p&gt;




&lt;h2&gt;
  
  
  Organising the Navigation: The AI Governance Group
&lt;/h2&gt;

&lt;p&gt;With three or more Filament Resources all related to AI observability, you want them grouped in the sidebar. Every Resource in this article already carries &lt;code&gt;protected static ?string $navigationGroup = 'AI Governance';&lt;/code&gt;. That single property is enough to group them under a collapsible sidebar section.&lt;/p&gt;

&lt;p&gt;You can control group ordering in the panel provider:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;navigationGroups&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="s1"&gt;'AI Governance'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'User Management'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'System'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Groups listed here appear in that order in the sidebar. Groups not listed appear at the bottom. Keep AI Governance at the top — it is why operators are opening this panel.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Cache, What to Queue
&lt;/h2&gt;

&lt;p&gt;Two performance considerations you will hit once real traffic arrives.&lt;/p&gt;

&lt;p&gt;The stats overview widget runs database queries on every page render. If 10 admins have this dashboard open, that is 10 concurrent aggregation queries on your &lt;code&gt;ai_usage_logs&lt;/code&gt; table. Cache the results. Override &lt;code&gt;getStats()&lt;/code&gt; to use &lt;code&gt;cache()-&amp;gt;remember('ai_stats_today', 60, fn() =&amp;gt; ...)&lt;/code&gt; for each query group.&lt;/p&gt;

&lt;p&gt;Bulk insert your usage logs via a queued job rather than synchronously inside the middleware response cycle. A single queued insert dispatched after the AI response returns adds zero latency to the user-facing request and prevents your &lt;code&gt;ai_usage_logs&lt;/code&gt; table from becoming a write bottleneck under load. This is especially relevant if you are logging agent audit trails — agent runs that call multiple tools can generate a burst of log writes in a short window.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;[Production Pitfall]&lt;/strong&gt; If you are writing &lt;code&gt;AiUsageLog::create()&lt;/code&gt; synchronously inside an AI middleware or a service class, you will see that call become a bottleneck under sustained load. A single slow database write can add 20–50ms to every AI response. Dispatch a queued job instead. Your usage logs do not need to be written before the response returns — they need to be written &lt;em&gt;eventually and reliably&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Extending the Dashboard Over Time
&lt;/h2&gt;

&lt;p&gt;This guide builds the foundation. Here is what comes next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget alerts.&lt;/strong&gt; Add an Eloquent Observer on &lt;code&gt;AiUsageLog&lt;/code&gt; that triggers a notification when daily spend crosses a configurable threshold. Filament’s database notifications work well here — the panel will show the alert in the notification bell without any additional UI work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-feature cost attribution.&lt;/strong&gt; The &lt;code&gt;feature&lt;/code&gt; column on &lt;code&gt;ai_usage_logs&lt;/code&gt; lets you break cost down by product feature. A second chart widget comparing cost by feature over time gives product teams the data they need to make informed decisions about which AI features are worth their current token spend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt performance correlation.&lt;/strong&gt; Once you have both usage logs (with &lt;code&gt;prompt_version&lt;/code&gt;) and an output quality signal — whether that is a user rating, a downstream validation pass/fail, or a schema validation score from your agentic pipeline — you can join them in a custom Filament page and show quality versus cost per prompt version. That is a genuinely differentiated internal tool that most teams never build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model comparison.&lt;/strong&gt; If you are testing multiple models for the same feature (Claude versus GPT-4o, for example), the &lt;code&gt;model&lt;/code&gt; column combined with a quality signal gives you a cost-per-quality-point comparison you can act on. That decision gets much easier when you can see it in a table rather than inferring it from CloudWatch logs.&lt;/p&gt;




&lt;p&gt;Refer to the &lt;a href="https://filamentphp.com/docs/3.x/panels/getting-started" rel="noopener noreferrer"&gt;official Filament v3 documentation&lt;/a&gt; for the full API surface on widgets, infolists, and panel configuration. The patterns in this guide are production-tested starting points, not exhaustive implementations.&lt;/p&gt;




</description>
      <category>laravel</category>
      <category>php</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Laravel Sanctum API Authentication: The Complete Production Guide</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 22:06:40 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-sanctum-api-authentication-the-complete-production-guide-58ac</link>
      <guid>https://dev.to/dewaldhugo/laravel-sanctum-api-authentication-the-complete-production-guide-58ac</guid>
      <description>&lt;p&gt;There's a quiet assumption baked into almost every Laravel AI integration tutorial: authentication exists. Routes are protected. Tokens are issued. The API is locked down.&lt;/p&gt;

&lt;p&gt;That assumption breaks the moment you sit down to build something real.&lt;/p&gt;

&lt;p&gt;Laravel Sanctum is the framework's answer to lightweight API token authentication. It ships with Laravel, it integrates cleanly with Eloquent, and it handles the two most common authentication patterns - SPA cookie-based sessions and mobile/external API token issuance, without pulling in a full OAuth server. This guide covers both patterns, but it leans hard into the personal access token model, because that's what you need when you're building an API that your own frontend, mobile app, or third-party client will consume.&lt;/p&gt;

&lt;p&gt;By the end, you'll have a production-ready authentication layer: token issuance with ability scoping, protected routes, revocation endpoints, rate limiting via Redis, and a multi-tenant token pattern that holds up under real load. We're also covering the Laravel 11/12 &lt;code&gt;bootstrap/app.php&lt;/code&gt; configuration style throughout, no legacy &lt;code&gt;Kernel.php&lt;/code&gt; references.&lt;/p&gt;

&lt;p&gt;Let's get into it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Sanctum Actually Does
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of code, you need to understand where Sanctum fits in the ecosystem. Sanctum is not OAuth. It doesn't issue refresh tokens. It doesn't support third-party authorization flows. If you need those things, reach for &lt;a href="https://laravel.com/docs/passport" rel="noopener noreferrer"&gt;Laravel Passport&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What Sanctum does exceptionally well is issue hashed personal access tokens tied to your &lt;code&gt;users&lt;/code&gt; table via a polymorphic &lt;code&gt;personal_access_tokens&lt;/code&gt; table. Each token can carry a set of &lt;em&gt;abilities&lt;/em&gt; - scoped permissions that your application checks at runtime. The token itself is a random string; only the SHA-256 hash lives in the database. That's a sensible default.&lt;/p&gt;

&lt;p&gt;For SPAs on the same domain, Sanctum piggybacks on Laravel's existing session authentication via a cookie. This is the &lt;code&gt;EnsureFrontendRequestsAreStateful&lt;/code&gt; middleware doing its job. We'll touch on this briefly, but the majority of this guide focuses on token-based auth for API clients.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation and Setup (Laravel 11 / 12)
&lt;/h2&gt;

&lt;p&gt;Sanctum ships with Laravel 11 and 12. If you're starting a fresh project, it's already in your &lt;code&gt;composer.json&lt;/code&gt;. Confirm it's there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer show laravel/sanctum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're on an older install that upgraded to Laravel 11, you may need to install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require laravel/sanctum
php artisan vendor:publish &lt;span class="nt"&gt;--provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Laravel&lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;&lt;span class="s2"&gt;anctum&lt;/span&gt;&lt;span class="se"&gt;\S&lt;/span&gt;&lt;span class="s2"&gt;anctumServiceProvider"&lt;/span&gt;
php artisan migrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That publish step drops a &lt;code&gt;config/sanctum.php&lt;/code&gt; file and a migration for the &lt;code&gt;personal_access_tokens&lt;/code&gt; table. Run the migration before anything else.&lt;/p&gt;

&lt;h3&gt;
  
  
  bootstrap/app.php - The Laravel 11/12 Way
&lt;/h3&gt;

&lt;p&gt;In Laravel 11, middleware registration moved out of &lt;code&gt;Kernel.php&lt;/code&gt; and into &lt;code&gt;bootstrap/app.php&lt;/code&gt;. This is where you register the Sanctum stateful middleware for SPA authentication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// bootstrap/app.php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Laravel\Sanctum\Http\Middleware\EnsureFrontendRequestsAreStateful&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Middleware&lt;/span&gt; &lt;span class="nv"&gt;$middleware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$middleware&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;statefulApi&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calling &lt;code&gt;$middleware-&amp;gt;statefulApi()&lt;/code&gt; is the clean Laravel 11 way to register Sanctum's SPA middleware. Avoid manually listing the middleware class inline, &lt;code&gt;statefulApi()&lt;/code&gt; handles the correct ordering.&lt;/p&gt;

&lt;p&gt;For token-only APIs (no SPA), you can skip this entirely. Your token-protected routes will use the &lt;code&gt;auth:sanctum&lt;/code&gt; guard, which handles everything through the &lt;code&gt;Authorization: Bearer&lt;/code&gt; header.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring the Guard
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;config/auth.php&lt;/code&gt;, make sure your &lt;code&gt;api&lt;/code&gt; guard points to Sanctum:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'guards'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'web'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'session'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'provider'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'users'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;

    &lt;span class="s1"&gt;'api'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'provider'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'users'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means &lt;code&gt;auth:api&lt;/code&gt; and &lt;code&gt;auth:sanctum&lt;/code&gt; both resolve to the same thing. Pick one and be consistent across your routes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the User Model
&lt;/h2&gt;

&lt;p&gt;Add the &lt;code&gt;HasApiTokens&lt;/code&gt; trait to your &lt;code&gt;User&lt;/code&gt; model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Laravel\Sanctum\HasApiTokens&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Authenticatable&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;HasApiTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;HasFactory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Notifiable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That trait gives Eloquent's &lt;code&gt;User&lt;/code&gt; model the &lt;code&gt;createToken()&lt;/code&gt;, &lt;code&gt;tokens()&lt;/code&gt;, and &lt;code&gt;currentAccessToken()&lt;/code&gt; methods. Everything downstream depends on this being in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Issuing Personal Access Tokens
&lt;/h2&gt;

&lt;p&gt;Token issuance happens through a dedicated endpoint. Keep it clean: one controller, one responsibility.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Http/Controllers/Auth/TokenController.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Http\Controllers\Auth&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Http\Controllers\Controller&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Hash&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Validation\ValidationException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\User&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenController&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Controller&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;\Illuminate\Http\JsonResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'email'&lt;/span&gt;       &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'email'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'password'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'device_name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'max:255'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'email'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nc"&gt;Hash&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nc"&gt;ValidationException&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;withMessages&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'email'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'The provided credentials are incorrect.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nv"&gt;$token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'api:write'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'token'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;plainTextToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;revoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;\Illuminate\Http\JsonResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;currentAccessToken&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Token revoked.'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;device_name&lt;/code&gt; field is part of Sanctum's design, it lets users see which device or client issued each token. The &lt;code&gt;ValidationException&lt;/code&gt; approach returns a clean 422 with field-level errors rather than a 401, which is what most frontend clients expect.&lt;/p&gt;

&lt;p&gt;Route registration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/api.php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Http\Controllers\Auth\TokenController&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/auth/token'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TokenController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'issue'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/auth/token'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TokenController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'revoke'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Architect's Note:&lt;/strong&gt; Never issue tokens from inside a route closure. The moment you need to add logging, rate limiting, or ability scoping, you're refactoring. Always use a dedicated controller from day one, the Service Container will thank you when you need to inject dependencies later.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Token Abilities: Scoped Permissions
&lt;/h2&gt;

&lt;p&gt;Token abilities are Sanctum's lightweight alternative to OAuth scopes. You define them as strings at issuance time, then check them at the route or controller level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Read-only token&lt;/span&gt;
&lt;span class="nv"&gt;$token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'mobile-client'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="c1"&gt;// Full-access token&lt;/span&gt;
&lt;span class="nv"&gt;$token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'admin-panel'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'api:write'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'api:delete'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="c1"&gt;// AI feature token&lt;/span&gt;
&lt;span class="nv"&gt;$token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-assistant'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'ai:query'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check abilities in your controllers using &lt;code&gt;tokenCan()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;tokenCan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'api:write'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Insufficient token abilities.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// proceed with write operation&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or apply the &lt;code&gt;abilities&lt;/code&gt; middleware directly on route groups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'abilities:api:read,api:write'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;DocumentController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'store'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'ability:api:read'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;DocumentController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'index'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the difference: &lt;code&gt;abilities&lt;/code&gt; (plural) requires the token to have &lt;em&gt;all&lt;/em&gt; listed abilities. &lt;code&gt;ability&lt;/code&gt; (singular) requires &lt;em&gt;at least one&lt;/em&gt;. This distinction catches people off guard in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Protecting Routes
&lt;/h2&gt;

&lt;p&gt;Standard protected route group for a JSON API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/api.php&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;apiResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;DocumentController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/query'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AiQueryController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'query'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ability:ai:query'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/history'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AiQueryController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'history'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ability:api:read'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;auth:sanctum&lt;/code&gt; guard resolves the authenticated user from the Bearer token, making &lt;code&gt;$request-&amp;gt;user()&lt;/code&gt; available throughout the request lifecycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Revocation
&lt;/h2&gt;

&lt;p&gt;You need at minimum three revocation endpoints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Revoke current token (logout this device)&lt;/span&gt;
&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/auth/token'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TokenController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'revoke'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Revoke a specific token by ID&lt;/span&gt;
&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/auth/tokens/{tokenId}'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TokenController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'revokeById'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Revoke all tokens (logout everywhere)&lt;/span&gt;
&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/auth/tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TokenController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'revokeAll'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The controller methods:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;revokeById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nv"&gt;$tokenId&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$deleted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;$tokenId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$deleted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Token not found.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Token revoked.'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;revokeAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nb"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'All tokens revoked.'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-&amp;gt;where('id', $tokenId)&lt;/code&gt; scoping on the tokens relationship is critical. Without it, a user could delete another user's token by guessing IDs — an IDOR vulnerability. The relationship scope ensures only that user's tokens are in play before the delete fires.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Production Pitfall:&lt;/strong&gt; Token revocation is only as fast as your database query. Under heavy concurrent traffic, a "revoke all" operation that hits an unindexed &lt;code&gt;tokenable_id&lt;/code&gt; column will cause measurable latency spikes. Add an index to &lt;code&gt;personal_access_tokens.tokenable_id&lt;/code&gt;, Laravel's migration doesn't include it by default in all versions.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In a new migration&lt;/span&gt;
&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'personal_access_tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'tokenable_type'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'tokenable_id'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Token Expiry
&lt;/h2&gt;

&lt;p&gt;By default, Sanctum tokens don't expire. Set an expiry in &lt;code&gt;config/sanctum.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'expiration'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// 30 days, in minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pair this with a scheduled prune command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/console.php (Laravel 11)&lt;/span&gt;
&lt;span class="nc"&gt;Schedule&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'sanctum:prune-expired --hours=24'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;daily&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Left unchecked, the &lt;code&gt;personal_access_tokens&lt;/code&gt; table grows indefinitely and starts dragging down every authenticated request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rate Limiting with Redis
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;auth:sanctum&lt;/code&gt; middleware doesn't include rate limiting. You're responsible for adding it. This is not optional for any API that calls an external AI provider - without it, a single misbehaving client can burn through your budget in minutes.&lt;/p&gt;

&lt;p&gt;Define rate limiters in &lt;code&gt;bootstrap/app.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Cache\RateLimiting\Limit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\RateLimiter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'api'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply them to route groups:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'throttle:api'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// standard API routes&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'auth:sanctum'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'throttle:ai'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// AI query routes&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Switch your cache driver to Redis. The default &lt;code&gt;file&lt;/code&gt; driver won't share rate limit counts across multiple server instances.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="py"&gt;CACHE_DRIVER&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;redis&lt;/span&gt;
&lt;span class="py"&gt;REDIS_HOST&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;
&lt;span class="py"&gt;REDIS_PORT&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;6379&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Efficiency Gain:&lt;/strong&gt; Redis handles both rate limiting and Sanctum's token lookup cache on the same connection. At scale, Sanctum resolves the user on every request by hashing the incoming token and hitting the database. If you're seeing measurable auth overhead in production profiling, an application-level cache keyed to the token hash eliminates that DB hit - but profile first, optimize second.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Multi-Tenant Token Scoping
&lt;/h2&gt;

&lt;p&gt;A user who belongs to multiple organizations shouldn't be able to use a token issued for Organization A to access Organization B's data. The cleanest fix: store a &lt;code&gt;team_id&lt;/code&gt; on the token itself using a custom token model.&lt;/p&gt;

&lt;p&gt;Extend the default token model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Models/PersonalAccessToken.php&lt;/span&gt;

&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Models&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Laravel\Sanctum\PersonalAccessToken&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;SanctumToken&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PersonalAccessToken&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;SanctumToken&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;protected&lt;/span&gt; &lt;span class="nv"&gt;$fillable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'token'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'abilities'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'expires_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'team_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in &lt;code&gt;AppServiceProvider&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Laravel\Sanctum\Sanctum&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\PersonalAccessToken&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Sanctum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;usePersonalAccessTokenModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;PersonalAccessToken&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'personal_access_tokens'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;unsignedBigInteger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'team_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;index&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Issue tokens with team context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;createToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;device_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'api:write'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;accessToken&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;forceFill&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'team_id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$currentTeam&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Authorize against both user and team on every request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$tokenTeamId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;currentAccessToken&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$tokenTeamId&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nv"&gt;$resource&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;team_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Token not authorized for this team.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Listing Tokens for the Authenticated User
&lt;/h2&gt;

&lt;p&gt;Give users visibility into their active tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'name'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'abilities'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'last_used_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'expires_at'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'created_at'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;latest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'id'&lt;/span&gt;           &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'name'&lt;/span&gt;         &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'abilities'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;abilities&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'last_used_at'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;last_used_at&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toDateTimeString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'expires_at'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;expires_at&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toDateTimeString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
                &lt;span class="s1"&gt;'created_at'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$token&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;toDateTimeString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$tokens&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The raw token value is never returned here, that's only available at issuance via &lt;code&gt;plainTextToken&lt;/code&gt;. After that, only the hash is stored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Sanctum Authentication
&lt;/h2&gt;

&lt;p&gt;Sanctum ships with a clean testing helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Laravel\Sanctum\Sanctum&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DocumentApiTest&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;TestCase&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;RefreshDatabase&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;test_authenticated_user_can_list_documents&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;Sanctum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;actingAs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/documents'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;test_token_without_write_ability_cannot_create_document&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nc"&gt;Sanctum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;actingAs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'api:read'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt; &lt;span class="c1"&gt;// no api:write&lt;/span&gt;

        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;postJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/documents'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'title'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Test'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Test content'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;test_unauthenticated_request_is_rejected&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/documents'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Sanctum::actingAs()&lt;/code&gt; bypasses the actual token lookup and tells the guard to treat the request as authenticated with those abilities. Your tests stay fast; no real tokens are issued.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Word to the Wise:&lt;/strong&gt; Test the ability checks explicitly. Developers regularly ship applications where the happy path tests pass but the authorization boundaries are never verified. A token with &lt;code&gt;api:read&lt;/code&gt; silently passing a &lt;code&gt;api:write&lt;/code&gt; endpoint is a data integrity problem, not just a security one. Write the negative cases, they catch the middleware misconfiguration you didn't know you'd made.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Returning Consistent Auth Error Responses
&lt;/h2&gt;

&lt;p&gt;By default, an unauthenticated request to a Sanctum-protected route returns a redirect to the login page. That's wrong for a JSON API. Fix it in &lt;code&gt;bootstrap/app.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withExceptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Exceptions&lt;/span&gt; &lt;span class="nv"&gt;$exceptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$exceptions&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;AuthenticationException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;expectsJson&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Unauthenticated.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nv"&gt;$exceptions&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;AuthorizationException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;expectsJson&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Forbidden.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A 302 redirect from an API client that expected a 401 is a confusing failure mode that causes hours of frontend debugging.&lt;/p&gt;

&lt;h2&gt;
  
  
  SPA Authentication (Cookie-Based)
&lt;/h2&gt;

&lt;p&gt;For a Vue or React SPA on the same top-level domain, Sanctum's stateful middleware handles auth through the session cookie — no tokens needed.&lt;/p&gt;

&lt;p&gt;The flow: the SPA GETs &lt;code&gt;/sanctum/csrf-cookie&lt;/code&gt; to initialize the CSRF cookie, POSTs credentials to your login endpoint, then includes the &lt;code&gt;X-XSRF-TOKEN&lt;/code&gt; header on subsequent requests.&lt;/p&gt;

&lt;p&gt;Configure your stateful domains in &lt;code&gt;config/sanctum.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="s1"&gt;'stateful'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;explode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;','&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'SANCTUM_STATEFUL_DOMAINS'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s1"&gt;'%s%s'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'localhost,localhost:3000,127.0.0.1,127.0.0.1:8000,::1'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Sanctum&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;currentApplicationUrlWithPort&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;))),&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;SANCTUM_STATEFUL_DOMAINS&lt;/code&gt; explicitly in production. Wildcard domains are not supported.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Edge Case Alert:&lt;/strong&gt; If your SPA and API are on different subdomains (&lt;code&gt;app.example.com&lt;/code&gt; and &lt;code&gt;api.example.com&lt;/code&gt;), the session cookie won't work across domains due to browser SameSite restrictions. Use token-based auth even for your SPA in this case. The cookie approach only works when both are served from the same domain.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Production Deployment Checklist
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;personal_access_tokens&lt;/code&gt; migration applied&lt;/li&gt;
&lt;li&gt;Index on &lt;code&gt;tokenable_type&lt;/code&gt; and &lt;code&gt;tokenable_id&lt;/code&gt; exists&lt;/li&gt;
&lt;li&gt;Token expiry configured and &lt;code&gt;sanctum:prune-expired&lt;/code&gt; scheduled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;APP_KEY&lt;/code&gt; is set and rotated from the default&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SESSION_DRIVER&lt;/code&gt; is &lt;code&gt;redis&lt;/code&gt; or &lt;code&gt;database&lt;/code&gt; in multi-server setups&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;CACHE_DRIVER&lt;/code&gt; is &lt;code&gt;redis&lt;/code&gt; for rate limiting across instances&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SANCTUM_STATEFUL_DOMAINS&lt;/code&gt; explicitly set if using SPA auth&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Named rate limiters defined for all API route groups&lt;/li&gt;
&lt;li&gt;Stricter limiters applied to AI query endpoints&lt;/li&gt;
&lt;li&gt;Rate limit responses return &lt;code&gt;Retry-After&lt;/code&gt; headers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Authorization&lt;/code&gt; header stripped at the CDN for non-API routes&lt;/li&gt;
&lt;li&gt;HTTPS enforced — tokens over HTTP is a non-starter&lt;/li&gt;
&lt;li&gt;Tokens are never logged — audit your logging configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track &lt;code&gt;personal_access_tokens&lt;/code&gt; table row count as a metric&lt;/li&gt;
&lt;li&gt;Alert on unusual token issuance spikes&lt;/li&gt;
&lt;li&gt;Log token revocation events for audit trails&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Sanctum Doesn't Cover
&lt;/h2&gt;

&lt;p&gt;Sanctum doesn't do: OAuth2 authorization flows, refresh tokens, third-party client authorization, dynamic scope negotiation, or JWT issuance. If any of those are requirements, use Passport.&lt;/p&gt;

&lt;p&gt;For the overwhelming majority of Laravel APIs — including every AI integration pattern — Sanctum's personal access tokens, SPA sessions, ability scoping, and revocation model cover the full authentication lifecycle. Don't reach for Passport's complexity if Sanctum's model fits your use case. The operational overhead isn't trivial.&lt;/p&gt;

&lt;h2&gt;
  
  
  Official Documentation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://laravel.com/docs/sanctum" rel="noopener noreferrer"&gt;Laravel Sanctum Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://laravel.com/docs/http-tests" rel="noopener noreferrer"&gt;Laravel HTTP Tests Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://origin-main.com/guides/laravel-sanctum-api-authentication" rel="noopener noreferrer"&gt;origin-main.com&lt;/a&gt; — Laravel + AI integration guides for production developers.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>laravel</category>
      <category>php</category>
      <category>authentication</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Laravel AI Middleware: Token Tracking &amp; Rate Limiting</title>
      <dc:creator>Dewald Hugo</dc:creator>
      <pubDate>Sun, 05 Apr 2026 00:10:17 +0000</pubDate>
      <link>https://dev.to/dewaldhugo/laravel-ai-middleware-token-tracking-rate-limiting-1k6c</link>
      <guid>https://dev.to/dewaldhugo/laravel-ai-middleware-token-tracking-rate-limiting-1k6c</guid>
      <description>&lt;p&gt;Integrating Claude or GPT-4o into a Laravel app is deceptively easy. One Http::post() and you feel like a genius. Then a single user loops your endpoint, or a bot scripts it, and your OpenAI bill climbs $300 before you wake up. The answer is a dedicated Laravel AI middleware layer, and most tutorials skip it entirely.&lt;/p&gt;

&lt;p&gt;That’s not a hypothetical. It happens.&lt;/p&gt;

&lt;p&gt;The fix isn’t smarter prompts, it’s infrastructure. In this guide, we build a proper AI management layer: tiered rate limiting, per-user token tracking with async logging, and a Pest test suite that actually validates the guards are working. We’re not demoing a toy. We’re building something you can ship. For &lt;a href="https://origin-main.com/laravel-architecture/laravel-prompt-migrations-version-control-ai-prompts/" rel="noopener noreferrer"&gt;prompt versioning&lt;/a&gt; and deployment discipline on the prompt side of that infrastructure, see the prompt migrations guide.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Strategy: Why a Laravel AI Middleware Layer?
&lt;/h2&gt;

&lt;p&gt;Middleware in the Laravel lifecycle is the right place for this because it separates &lt;strong&gt;infrastructure concerns&lt;/strong&gt; (cost, rate limits) from &lt;strong&gt;business logic&lt;/strong&gt; (the prompt itself). This decoupling means you can swap &lt;code&gt;gpt-4o&lt;/code&gt; for a self-hosted Llama instance without touching a single Controller.&lt;/p&gt;

&lt;p&gt;For the output side of that separation, validating what the model returns before your application acts on it, see the guide on &lt;a href="https://origin-main.com/laravel-architecture/laravel-agentic-workflow-schema-validation/" rel="noopener noreferrer"&gt;schema validation against LLM hallucinations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We’re tracking cost using the standard token pricing formula:&lt;/p&gt;

&lt;p&gt;$$Total Cost = \sum_{i=1}^{n} (Tokens_{in} \cdot Price_{in} + Tokens_{out} \cdot Price_{out})$$&lt;/p&gt;

&lt;p&gt;Centralise that calculation once. Never scatter it across Controllers.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Database Schema: An Audit Trail, Not Just a Counter
&lt;/h2&gt;

&lt;p&gt;Don’t fall into the trap of incrementing a single &lt;code&gt;total_tokens&lt;/code&gt; integer on your &lt;code&gt;users&lt;/code&gt; table. You need a row per request so you can audit, refund, and debug.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// database/migrations/xxxx_xx_xx_create_ai_usage_logs_table.php&lt;/span&gt;
&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;up&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai_usage_logs'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Blueprint&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;foreignId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'user_id'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;constrained&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;onDelete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'cascade'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'model_used'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// e.g., 'gpt-4o', 'claude-sonnet-4-6'&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feature_name'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;      &lt;span class="c1"&gt;// e.g., 'blog_generator', 'chat'&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;decimal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'estimated_cost'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;timestamps&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Tiered Rate Limiting
&lt;/h2&gt;

&lt;p&gt;Define your tiers in &lt;code&gt;AppServiceProvider&lt;/code&gt;. Free users get 5 requests per minute. Premium gets 100. Adjust to whatever your unit economics demand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Providers/AppServiceProvider.php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Cache\RateLimiting\Limit&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\RateLimiter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;boot&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-api'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;?-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_premium&lt;/span&gt;
            &lt;span class="o"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Limit&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;perMinute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;by&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Senior Dev Tip:&lt;/strong&gt; Back this with Redis, not your database. Set &lt;code&gt;CACHE_DRIVER=redis&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt;. If you’re running the default &lt;code&gt;file&lt;/code&gt; or &lt;code&gt;database&lt;/code&gt; driver, every AI request triggers a DB read-write cycle under the rate limiter — that’s a quiet bottleneck that won’t show up until you’re under load. See the &lt;a href="https://laravel.com/docs/cache" rel="noopener noreferrer"&gt;Laravel Cache documentation&lt;/a&gt; for driver configuration.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4. Building the Middleware
&lt;/h2&gt;

&lt;p&gt;Generate the class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;php artisan make:middleware EnsureUserHasAiCredits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This middleware enforces auth and your per-user spending ceiling before the request ever reaches the Controller:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Http/Middleware/EnsureUserHasAiCredits.php&lt;/span&gt;
&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Http\Middleware&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Closure&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Symfony\Component\HttpFoundation\Response&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;EnsureUserHasAiCredits&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Closure&lt;/span&gt; &lt;span class="nv"&gt;$next&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;Response&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Unauthorized'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;monthly_spend&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;spending_limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'error'&lt;/span&gt;   &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Monthly spending limit reached.'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'upgrade'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'billing.upgrade'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;402&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Registering the Middleware (Laravel 11+)
&lt;/h3&gt;

&lt;p&gt;Laravel 11 dropped &lt;code&gt;app/Http/Kernel.php&lt;/code&gt;. Register your middleware in &lt;code&gt;bootstrap/app.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// bootstrap/app.php&lt;/span&gt;
&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;withMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Middleware&lt;/span&gt; &lt;span class="nv"&gt;$middleware&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$middleware&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;appendToGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'api'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;\App\Http\Middleware\EnsureUserHasAiCredits&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then apply the rate limiter on your route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// routes/api.php&lt;/span&gt;
&lt;span class="nc"&gt;Route&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/generate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;AiController&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'generate'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'throttle:ai-api'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Externalising Model Pricing
&lt;/h2&gt;

&lt;p&gt;Hardcoding &lt;code&gt;0.000005&lt;/code&gt; per token inside your Job is a maintenance trap. One pricing change from OpenAI and your cost accounting silently breaks. Put rates in &lt;code&gt;config/ai.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/ai.php&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'models'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_rate'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_rate'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="s1"&gt;'claude-sonnet-4-5'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_rate'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000003&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_rate'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Async Usage Logging with a Queued Job
&lt;/h2&gt;

&lt;p&gt;Once the API responds, log asynchronously. Don’t make the user wait for a DB write.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Jobs/LogAiUsage.php&lt;/span&gt;
&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Jobs&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\AiUsageLog&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Bus\Queueable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Contracts\Queue\ShouldQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Foundation\Bus\Dispatchable&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Queue\SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LogAiUsage&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;ShouldQueue&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Dispatchable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;InteractsWithQueue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Queueable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;SerializesModels&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;__construct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;    &lt;span class="nv"&gt;$userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="kt"&gt;array&lt;/span&gt;  &lt;span class="nv"&gt;$usageData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$model&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'model'&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
        &lt;span class="nv"&gt;$rates&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"ai.models.&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_rate'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000005&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_rate'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.000015&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nv"&gt;$cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$rates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt_rate'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
              &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nv"&gt;$rates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'completion_rate'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="nc"&gt;AiUsageLog&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'user_id'&lt;/span&gt;           &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'model_used'&lt;/span&gt;        &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'feature_name'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'feature_name'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'estimated_cost'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Notice &lt;code&gt;feature_name&lt;/code&gt; is included in both the Job payload and the &lt;code&gt;create()&lt;/code&gt; call. The original migration defines that column, if you omit it from the insert, Laravel will either throw a &lt;code&gt;NOT NULL&lt;/code&gt; constraint violation or silently ignore it if it’s nullable. Either way, you’ve lost attribution data. Always match your payload to your schema.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  7. The Controller: Gluing It Together
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/Http/Controllers/AiController.php&lt;/span&gt;
&lt;span class="kn"&gt;namespace&lt;/span&gt; &lt;span class="nn"&gt;App\Http\Controllers&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Jobs\LogAiUsage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\Request&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Http\JsonResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Laravel\Facades\OpenAI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Validation\ValidationException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AiController&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;Controller&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;Request&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;JsonResponse&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$request&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'max:4000'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'feature'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'required'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'string'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'in:blog_generator,chat,summariser'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nv"&gt;$result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
                &lt;span class="s1"&gt;'model'&lt;/span&gt;    &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'messages'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'user'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt'&lt;/span&gt;&lt;span class="p"&gt;]]],&lt;/span&gt;
            &lt;span class="p"&gt;]);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\OpenAI\Exceptions\TransporterException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'AI service unavailable. Please try again.'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;\OpenAI\Exceptions\ErrorException&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Catches 429s, quota exceeded, invalid requests from OpenAI&lt;/span&gt;
            &lt;span class="nf"&gt;report&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getMessage&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt; &lt;span class="nv"&gt;$e&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;getCode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="nc"&gt;LogAiUsage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;dispatch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;id&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'model'&lt;/span&gt;             &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gpt-4o'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'feature_name'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$validated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'feature'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;promptTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;completionTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;response&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nv"&gt;$result&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re integrating Anthropic’s SDK instead, the pattern is identical — swap the client call and update the model string to &lt;code&gt;claude-sonnet-4-5&lt;/code&gt;. See the &lt;a href="https://docs.anthropic.com/en/api/getting-started" rel="noopener noreferrer"&gt;Anthropic API documentation&lt;/a&gt; for the current SDK reference.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Testing the Architecture with Pest
&lt;/h2&gt;

&lt;p&gt;A production system is only as solid as the test suite backing it. We need to verify three things independently:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; The rate limiter blocks users who’ve exhausted their quota.&lt;/li&gt;
&lt;li&gt; The middleware halts users who’ve hit their spending ceiling.&lt;/li&gt;
&lt;li&gt; The &lt;code&gt;LogAiUsage&lt;/code&gt; Job is dispatched with the correct payload — &lt;strong&gt;without hitting the real API&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last point is where most AI testing tutorials fail you. &lt;code&gt;Bus::fake()&lt;/code&gt; queues the Job but does nothing about the &lt;code&gt;OpenAI&lt;/code&gt; facade. If you don’t mock it, your test suite is burning real API credits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// tests/Feature/AiGenerationTest.php&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Jobs\LogAiUsage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;App\Models\User&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\Bus&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;Illuminate\Support\Facades\RateLimiter&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Laravel\Facades\OpenAI&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;OpenAI\Responses\Chat\CreateResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;beforeEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Bus&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blocks a free user who has hit their rate limit'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'is_premium'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Simulate 5 prior requests — the free-tier ceiling&lt;/span&gt;
    &lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;hit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'ai-api:'&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;actingAs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;postJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/generate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Hello AI'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'feature'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'chat'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'blocks a user who has exceeded their monthly spending limit'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="s1"&gt;'monthly_spend'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;50.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'spending_limit'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;50.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;actingAs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;postJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/generate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Hello AI'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'feature'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'chat'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;402&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertJsonPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'error'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Monthly spending limit reached.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'dispatches a LogAiUsage job with correct token data on success'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;$user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="s1"&gt;'is_premium'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="c1"&gt;// Mock the OpenAI facade — never hit the real API in tests&lt;/span&gt;
    &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nc"&gt;CreateResponse&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;fake&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="s1"&gt;'choices'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'message'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'role'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'assistant'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'content'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Mocked response'&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s1"&gt;'usage'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;     &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s1"&gt;'total_tokens'&lt;/span&gt;      &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;57&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;]);&lt;/span&gt;

    &lt;span class="nv"&gt;$this&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;actingAs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;postJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'/api/generate'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'prompt'&lt;/span&gt;  &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'Test prompt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'feature'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'chat'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertOk&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;assertJsonPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'content'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mocked response'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;Bus&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;assertDispatched&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;LogAiUsage&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;class&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;function&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;LogAiUsage&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nv"&gt;$user&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'prompt_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'completion_tokens'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;
            &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$job&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;usageData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'feature_name'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'chat'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you’re building out the Eloquent side of this and want to understand how to test model relationships and factory states at scale, the patterns covered in &lt;a href="https://origin-main.com/laravel/building-robust-test-factories-in-laravel/" rel="noopener noreferrer"&gt;building robust Laravel test factories&lt;/a&gt; translate directly to the &lt;code&gt;User&lt;/code&gt; and &lt;code&gt;AiUsageLog&lt;/code&gt; models here.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. What You Now Have
&lt;/h2&gt;

&lt;p&gt;Shipping this architecture gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Cost protection&lt;/strong&gt; — tiered rate limiting via the Laravel &lt;code&gt;RateLimiter&lt;/code&gt; facade backed by Redis, plus a hard monthly ceiling enforced in middleware.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Full audit trail&lt;/strong&gt; — every token consumed is recorded per-user, per-feature, per-model via an Eloquent model in the &lt;strong&gt;Service Container&lt;/strong&gt;-resolved &lt;code&gt;LogAiUsage&lt;/code&gt; Job.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Zero latency impact&lt;/strong&gt; — async logging means your users never wait for a DB write.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;A test suite that doesn’t cheat&lt;/strong&gt; — the OpenAI facade is mocked, so your CI pipeline stays free and your assertions are actually trustworthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The gap between a weekend project and a production system isn’t the AI integration. It’s everything around it. This is that layer.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;References:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://laravel.com/docs/routing#rate-limiting" rel="noopener noreferrer"&gt;Laravel Rate Limiting&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://docs.anthropic.com/en/api/getting-started" rel="noopener noreferrer"&gt;Anthropic API Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/openai-php/client" rel="noopener noreferrer"&gt;openai-php/client on GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>laravel</category>
      <category>ai</category>
      <category>webdev</category>
      <category>php</category>
    </item>
  </channel>
</rss>
