You've probably seen the demos — a button click, a spinner, and then a fully formed blog post appears. But how does that actually work in a production Laravel app? Not the happy-path tutorial version, but the real thing: with queued jobs, streaming responses, rate limiting, and token cost awareness. That's what this article covers.
Why Build This in Laravel?
Laravel's ecosystem makes it surprisingly well-suited for AI content pipelines. You get queues for async processing, events for real-time feedback, caching for expensive API calls, and Eloquent for storing generation history. Pair that with OpenAI's API, and you have a solid foundation without reaching for a separate microservice.
Let's build a practical content generation feature — one that could power a blog draft tool, product description generator, or internal documentation assistant.
Setting Up the OpenAI Client
Start by pulling in the official OpenAI PHP client:
composer require openai-php/client
Create a dedicated service class rather than scattering API calls across your controllers:
// app/Services/ContentGeneratorService.php
namespace App\Services;
use OpenAI\Client;
use Illuminate\Support\Facades\Cache;
class ContentGeneratorService
{
public function __construct(private readonly Client $client) {}
public function generateDraft(string $topic, string $tone = 'professional'): string
{
$cacheKey = 'draft_' . md5($topic . $tone);
return Cache::remember($cacheKey, now()->addHours(6), function () use ($topic, $tone) {
$response = $this->client->chat()->create([
'model' => 'gpt-4o-mini',
'messages' => [
[
'role' => 'system',
'content' => "You are a professional content writer. Write in a {$tone} tone. Return only the article body — no titles or meta commentary."
],
[
'role' => 'user',
'content' => "Write a detailed 400-word article draft about: {$topic}"
]
],
'max_tokens' => 800,
'temperature' => 0.7,
]);
return $response->choices[0]->message->content;
});
}
}
Bind it in a service provider or use Laravel's automatic resolution via the container. The caching layer here is important — identical requests won't burn API tokens twice.
Processing Generation Asynchronously with Jobs
For anything beyond a quick demo, you want generation happening in a queue — not blocking a request cycle for 5-10 seconds.
php artisan make:job GenerateContentJob
// app/Jobs/GenerateContentJob.php
namespace App\Jobs;
use App\Models\ContentDraft;
use App\Services\ContentGeneratorService;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;
class GenerateContentJob implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
public int $tries = 3;
public int $backoff = 30; // seconds between retries
public function __construct(
private readonly int $draftId,
private readonly string $topic,
private readonly string $tone
) {}
public function handle(ContentGeneratorService $generator): void
{
$draft = ContentDraft::findOrFail($this->draftId);
$draft->update(['status' => 'processing']);
try {
$content = $generator->generateDraft($this->topic, $this->tone);
$draft->update([
'content' => $content,
'status' => 'completed',
'generated_at' => now(),
]);
} catch (\Exception $e) {
$draft->update(['status' => 'failed']);
throw $e; // let the queue handle retries
}
}
}
This pattern keeps your controller thin and lets you monitor jobs via Laravel Horizon.
Tracking Token Usage and Costs
This is the part most tutorials skip — but in production, token usage directly maps to money. Store it:
$response = $this->client->chat()->create([...]);
$usage = $response->usage;
TokenUsageLog::create([
'model' => 'gpt-4o-mini',
'prompt_tokens' => $usage->promptTokens,
'completion_tokens' => $usage->completionTokens,
'total_tokens' => $usage->totalTokens,
'estimated_cost_usd' => ($usage->totalTokens / 1000) * 0.00015, // adjust per model pricing
'feature' => 'content_draft',
]);
This table becomes invaluable when you're reviewing monthly API bills or deciding whether to switch models.
Streaming for Better UX
If you want that typewriter effect users have come to expect, OpenAI's streaming API combined with Laravel's StreamedResponse works cleanly:
public function stream(Request $request): StreamedResponse
{
return response()->stream(function () use ($request) {
$stream = $this->client->chat()->createStreamed([
'model' => 'gpt-4o-mini',
'messages' => [
['role' => 'user', 'content' => $request->input('prompt')]
],
]);
foreach ($stream as $response) {
$text = $response->choices[0]->delta->content ?? '';
echo "data: " . json_encode(['text' => $text]) . "\n\n";
ob_flush();
flush();
}
echo "data: [DONE]\n\n";
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no',
]);
}
On the frontend, a small Alpine.js component can consume this SSE stream and update the UI token by token — no WebSockets needed.
Rate Limiting and Abuse Prevention
Don't skip this step. Wrap your generation endpoints with Laravel's rate limiter:
// In RouteServiceProvider or routes/api.php
RateLimiter::for('ai-generation', function (Request $request) {
return Limit::perMinute(5)->by($request->user()?->id ?: $request->ip());
});
Route::middleware(['auth', 'throttle:ai-generation'])
->post('/generate', [ContentController::class, 'generate']);
Five requests per minute is generous for most use cases. Adjust based on your cost model.
Prompt Engineering Matters More Than You Think
The quality of your system prompt determines 80% of your output quality. Treat prompts like code — version them, test them, and iterate. Store them in your database or config files rather than hardcoding strings:
// config/prompts.php
return [
'content_draft' => [
'system' => 'You are an expert content strategist writing for a technical audience. Be specific, avoid filler phrases, and structure your content with clear paragraphs.',
'user_template' => 'Write a :word_count-word article draft about :topic. Target audience: :audience.',
],
];
This approach also makes it easy to A/B test different prompt variations — something we do regularly when building AI features for client projects. For developers looking to see how these patterns come together in real-world builds, visit website where we document implementation patterns from actual Laravel projects.
Conclusion
Building AI content generation into Laravel isn't magic — it's just good software architecture applied to a new type of I/O. Queue your jobs, cache aggressively, log your token usage, and treat your prompts as first-class configuration. The real differentiator between a toy demo and a production feature is everything around the API call: error handling, retry logic, cost visibility, and UX during async processing. Nail those, and you've got something genuinely useful.
Top comments (0)