Dewald Hugo

Posted on Apr 16 • Originally published at origin-main.com

Beyond the API Key: Integrating Gemini into the Laravel AI SDK

#laravel #gemini #webdev #ai

If your Laravel application already speaks to OpenAI or Anthropic through the first-party laravel/ai SDK, adding Gemini feels like it should be a five-minute job. Set a new API key. Point to a new provider. Deploy. Done. That assumption is mostly correct, but the gap between “mostly” and “done” is where production systems fall apart. This guide closes that gap.

We will cover the full Laravel Gemini integration path: the OpenAI-compatible soft entry that lets you validate the connection in under ten minutes, the native Gemini provider configuration that actually belongs in a long-lived codebase, and a GeminiManager service pattern that gives you model failover, rate-limit awareness, and a response contract your Blade templates never have to think about.

By the end, Gemini will be a first-class citizen in your AI stack, not an afterthought bolted to a controller.

Why Add Gemini At All?

Fair question. If you are already running claude-sonnet-4-6 or gpt-4o in production, the case for a third provider needs to be more compelling than “Google exists.” Here is the honest pitch.

Gemini 2.5 Flash is fast, cheap, and ships with a one-million-token context window as standard. For high-throughput tasks—document summarisation, classification pipelines, RAG over large corpora—it undercuts the competition on cost without giving up much capability. Gemini 2.5 Pro sits at the other end: serious reasoning, multimodal by default, and Vertex AI integration if you are already inside the Google Cloud ecosystem.

The second reason is resilience. Single-provider AI architectures have a hard failure mode. When OpenAI rate-limits your 2 AM batch job, the entire feature goes dark. Gemini as a fallback provider changes that calculation. We are not talking about a hot standby you never test; we are talking about a properly designed failover that routes intelligently.

The third reason is the laravel/ai SDK itself. As covered in detail in What Laravel 13 Actually Changes for AI Development, the SDK was designed from the start to be provider-agnostic. Gemini is a first-class supported provider. The architecture already expects you to run multiple models.

Route 1: The OpenAI-Compatible Endpoint (The Soft Entry)

Before committing to the native integration, you can validate that Gemini works for your use case using Google’s OpenAI-compatible endpoint. This is genuinely useful for a quick proof-of-concept, or when you are migrating from an existing openai-php/client setup rather than the laravel/ai SDK.

Google maintains an endpoint at https://generativelanguage.googleapis.com/v1beta/openai/ that accepts Chat Completions API requests in the standard OpenAI format. The swap is mechanical: change the base URL, swap the API key, and change the model string.

Start with your .env:

GEMINI_API_KEY=your-google-ai-studio-key
GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
GEMINI_MODEL=gemini-2.5-flash

If you are on an existing OpenAI configuration in config/ai.php, you can add a second connection:

// config/ai.php
'connections' => [
    'openai' => [
        'driver'  => 'openai',
        'api_key' => env('OPENAI_API_KEY'),
    ],

    'gemini_compat' => [
        'driver'    => 'openai',           // still using the OpenAI driver
        'api_key'   => env('GEMINI_API_KEY'),
        'base_url'  => env('GEMINI_BASE_URL'),
    ],
],

You can now call Gemini through any code that already uses the openai driver, just by swapping the connection name:

use Illuminate\Support\Facades\AI;

$response = AI::connection('gemini_compat')->text(
    model: 'gemini-2.5-flash',
    messages: [
        ['role' => 'user', 'content' => $prompt],
    ]
);

This works well enough for a smoke test. Resist the urge to ship it as your production integration.

[Production Pitfall] The OpenAI-compatible endpoint has real coverage gaps. Gemini-native features—grounding via Google Search, the thinking budget parameter, native multimodal tool use, and fine-grained safety settings—are not available through the compat layer. If your feature depends on any of those, the compat approach will silently drop them and return a degraded response with no error. You will not find out until QA, or worse, production.

Route 2: The Native Gemini Provider (The Right Way)

The laravel/ai SDK ships with a first-party Gemini driver. Use it. The compat approach is a migration crutch; the native driver is what belongs in your config/ai.php long-term.

Install the SDK if you have not already:

composer require laravel/ai
php artisan vendor:publish --provider="Laravel\Ai\AiServiceProvider"
php artisan migrate

Then register the native Gemini connection:

// config/ai.php
'connections' => [
    'openai' => [
        'driver'  => 'openai',
        'api_key' => env('OPENAI_API_KEY'),
    ],

    'anthropic' => [
        'driver'  => 'anthropic',
        'api_key' => env('ANTHROPIC_API_KEY'),
    ],

    'gemini' => [
        'driver'  => 'gemini',
        'api_key' => env('GEMINI_API_KEY'),
    ],
],

'default' => env('AI_DEFAULT_CONNECTION', 'openai'),

Your .env stays clean:

GEMINI_API_KEY=your-google-ai-studio-key

A basic request through the native driver looks identical to your existing OpenAI calls:

use Illuminate\Support\Facades\AI;

$response = AI::connection('gemini')->text(
    model: 'gemini-2.5-flash',
    messages: [
        ['role' => 'user', 'content' => $prompt],
    ]
);

return $response->text;

That’s the surface. Now let’s build something that actually runs in production.

Model Selection: Which Gemini Model?

Not all Gemini models are created equal, and Google’s naming conventions have become genuinely confusing over the last year. Here is the practical breakdown for production use.

gemini-2.5-flash is your default. It hit general availability in June 2025, it is fast, it is cost-efficient, and it handles up to one million tokens of context. For most text generation, summarisation, and classification tasks, it is the correct choice.

gemini-2.5-pro is for heavy lifting. Complex reasoning chains, code generation over large codebases, long-form synthesis tasks. It is meaningfully more expensive than Flash and slower under load, so do not reach for it by default.

[Edge Case Alert] Do not use gemini-2.0-flash in new integrations. Google has announced that Gemini 2.0 Flash and Flash-Lite models will be shut down on June 1, 2026. Any production system pointing at those model strings will start throwing 404 errors at that date with no warning beyond the deprecation announcement. If you have any existing references to gemini-2.0-flash in your codebase, change them now.

For preview models in the Gemini 3.x family, treat them exactly as you would treat preview Claude models: useful for internal testing, not for customer-facing production paths until they have a stable designation.

The GeminiManager Pattern

A single AI connection configured in config/ai.php is fine for prototyping. Production systems need something more deliberate: model selection logic, rate-limit handling, failover between providers, and a response contract that the rest of your application can depend on regardless of which model actually ran.

The GeminiManager service class is that layer. It sits between the laravel/ai facade and your application code, owns the failover logic, and returns a typed response object your Blade templates or API controllers can render without knowing which provider answered the request.

Start by creating the service in app/Services/AI/:

<?php

namespace App\Services\AI;

use Illuminate\Support\Facades\AI;
use Illuminate\Support\Facades\Log;
use App\DTOs\AiResponse;

class GeminiManager
{
    private array $modelPriority = [
        'gemini-2.5-flash',
        'gemini-2.5-pro',
    ];

    private string $fallbackConnection = 'openai';
    private string $fallbackModel      = 'gpt-4o-mini';

    public function generate(string $prompt, array $options = []): AiResponse
    {
        foreach ($this->modelPriority as $model) {
            try {
                $response = $this->attemptGemini($prompt, $model, $options);

                return new AiResponse(
                    content:    $response->text,
                    model:      $model,
                    provider:   'gemini',
                    inputTokens:  $response->usage?->inputTokens ?? 0,
                    outputTokens: $response->usage?->outputTokens ?? 0,
                );

            } catch (\Exception $e) {
                Log::warning('Gemini model failed', [
                    'model'   => $model,
                    'error'   => $e->getMessage(),
                    'prompt_length' => strlen($prompt),
                ]);

                continue;
            }
        }

        // All Gemini models exhausted. Fall back to configured provider.
        return $this->fallback($prompt, $options);
    }

    private function attemptGemini(string $prompt, string $model, array $options): mixed
    {
        return AI::connection('gemini')->text(
            model: $model,
            messages: [['role' => 'user', 'content' => $prompt]],
            ...$options
        );
    }

    private function fallback(string $prompt, array $options): AiResponse
    {
        Log::error('All Gemini models exhausted. Falling back.', [
            'fallback_connection' => $this->fallbackConnection,
            'fallback_model'      => $this->fallbackModel,
        ]);

        $response = AI::connection($this->fallbackConnection)->text(
            model: $this->fallbackModel,
            messages: [['role' => 'user', 'content' => $prompt]],
        );

        return new AiResponse(
            content:     $response->text,
            model:       $this->fallbackModel,
            provider:    $this->fallbackConnection,
            inputTokens:  $response->usage?->inputTokens ?? 0,
            outputTokens: $response->usage?->outputTokens ?? 0,
        );
    }
}

The AiResponse DTO keeps your response contract stable:

<?php

namespace App\DTOs;

readonly class AiResponse
{
    public function __construct(
        public string $content,
        public string $model,
        public string $provider,
        public int    $inputTokens,
        public int    $outputTokens,
    ) {}
}

Bind it in your bootstrap/app.php, using Laravel 11/12 syntax:

use App\Services\AI\GeminiManager;

->withProviders([
    function () {
        app()->singleton(GeminiManager::class);
    },
])

[Architect’s Note] This pattern mirrors exactly what we described in Production-Grade AI Architecture in Laravel: Contracts, Governance & Telemetry—the response DTO is your contract, and nothing outside the service layer should care which provider produced the response. The moment your Blade template starts conditionally rendering based on $response->provider, you have an architecture problem, not a template problem.

Rate Limit Handling

The GeminiManager above catches all exceptions and continues to the next model. That is fine for a first pass but too broad for production. You want to distinguish between rate-limit errors, which should trigger immediate failover, and authentication or configuration errors, which should not silently retry.

Gemini rate-limit responses return HTTP 429. A tighter catch block:

use Illuminate\Http\Client\RequestException;

private function attemptGemini(string $prompt, string $model, array $options): mixed
{
    try {
        return AI::connection('gemini')->text(
            model: $model,
            messages: [['role' => 'user', 'content' => $prompt]],
            ...$options
        );

    } catch (RequestException $e) {
        $status = $e->response->status();

        if ($status === 429) {
            // Rate limited. Try next model.
            throw $e;
        }

        if ($status === 401 || $status === 403) {
            // Auth failure. Do not retry silently.
            Log::critical('Gemini authentication failure', ['status' => $status]);
            throw new \RuntimeException('Gemini authentication failed. Check GEMINI_API_KEY.');
        }

        // Anything else: surface it.
        throw $e;
    }
}

For consistent rate-limit and token tracking across all your AI providers, the middleware pattern we detailed in Laravel AI Middleware: Token Tracking & Rate Limiting plugs directly into this architecture. Wire it as HTTP middleware on your AI routes and your AiResponse DTO’s token counts feed it automatically.

Keeping Your Blade Templates Clean

This is the part most tutorials ignore. It is also where the design falls apart in real applications.

Your Blade templates should receive an AiResponse object and render $response->content. That is it. They should not know the provider is Gemini, they should not branch on $response->model, and they should never contain try/catch logic. If your failover logic is working correctly, the view never knows a failover happened.

In a controller:

use App\Services\AI\GeminiManager;

class ArticleSummaryController extends Controller
{
    public function __construct(
        private readonly GeminiManager $ai
    ) {}

    public function show(Article $article): View
    {
        $summary = $this->ai->generate(
            prompt: "Summarise this article in three sentences: {$article->body}"
        );

        return view('articles.show', [
            'article' => $article,
            'summary' => $summary,
        ]);
    }
}

The Blade template:

<div class="summary">
    {{ $summary->content }}
</div>

Done. No provider logic. No model names. No error handling. The controller injects the service, the service handles everything messy, and the view renders a string.

[Word to the Wise] The discipline to keep AI provider logic out of your views pays off when you swap models six months from now. And you will swap models. Gemini 2.5 Flash will eventually give way to whatever Google releases next. If your template has @if ($summary->model === 'gemini-2.5-flash') buried inside it, that migration becomes a grep exercise across your entire view layer. Design the abstraction correctly now.

Registering Gemini as a Queued Job Provider

Synchronous AI calls in web requests are a bad idea at scale. Wrap your GeminiManager calls in a queued job for anything that is not a real-time user interaction:

<?php

namespace App\Jobs;

use App\Services\AI\GeminiManager;
use App\Models\Article;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class GenerateArticleSummary implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries    = 3;
    public int $backoff  = 30; // seconds

    public function __construct(
        private readonly int $articleId
    ) {}

    public function handle(GeminiManager $ai): void
    {
        $article = Article::findOrFail($this->articleId);

        $summary = $ai->generate(
            prompt: "Summarise this article in three sentences: {$article->body}"
        );

        $article->update([
            'summary'          => $summary->content,
            'summary_model'    => $summary->model,
            'summary_provider' => $summary->provider,
        ]);
    }

    public function failed(\Throwable $e): void
    {
        Log::error('Article summary generation failed', [
            'article_id' => $this->articleId,
            'error'      => $e->getMessage(),
        ]);
    }
}

The $tries = 3 and $backoff = 30 give you three attempts with a 30-second gap between them, which is usually enough to ride out a transient Gemini rate limit without wasting queue resources. The failed hook gives you an audit trail.

Official Documentation

Two references worth bookmarking as you build this:

Google Gemini OpenAI Compatibility Documentation — the canonical reference for what is and is not supported through the compat layer, including current model strings and endpoint coverage.
Laravel AI SDK Documentation — the official introduction covering provider setup, agent workflows, and the provider-agnostic design philosophy.

Production Deployment Checklist

A few things that catch teams off-guard when they move this from staging to production.

Google AI Studio API keys are rate-limited by default. For production workloads, you need to request increased quota through Google Cloud, or switch to Vertex AI. The Vertex AI endpoint requires a different authentication mechanism (service account credentials rather than a plain API key) and a slightly different base URL, so plan for that migration if your volume grows.

Environment-specific model configuration matters. You might run gemini-2.5-flash in production for cost reasons and gemini-2.5-pro in staging for testing purposes. Keep that in your .env and out of your GeminiManager:

# .env.production
GEMINI_MODEL=gemini-2.5-flash

# .env.staging
GEMINI_MODEL=gemini-2.5-pro

Read it in the service:

private array $modelPriority = [];

public function __construct()
{
    $this->modelPriority = [
        config('services.gemini.model', 'gemini-2.5-flash'),
        'gemini-2.5-pro',    // fallback within Gemini
    ];
}

And add to config/services.php:

'gemini' => [
    'model' => env('GEMINI_MODEL', 'gemini-2.5-flash'),
],

If you are deploying to Laravel Forge or a similar managed environment, confirm your queue workers restart after deployment. A worker that caches the old GeminiManager configuration will ignore model changes until it restarts. This is not Gemini-specific, but it catches teams every time they change provider configuration without restarting workers.

[Efficiency Gain] Gemini 2.5 Flash’s one-million-token context window is genuinely useful for RAG pipelines. If you are building document search or retrieval-augmented generation, the ability to stuff a much larger context window means fewer chunking edge cases and simpler retrieval logic. If you are already running embeddings and vector search in Laravel, the integration pattern described in Laravel Embeddings, Vector Databases, and RAG: A Production Implementation Guide maps directly to Gemini’s native multimodal context handling.

Where to Go From Here

The GeminiManager pattern above is a solid foundation. The logical next step is versioning your prompts so that model upgrades do not introduce silent regressions—something that becomes critical the moment you start comparing output quality between gemini-2.5-flash and whatever Gemini 3.x stable lands in your production window. The Prompt Migrations: Bringing Determinism to AI in Laravel tutorial covers exactly that workflow.

The Google AI ecosystem moves fast. gemini-3.1-flash-lite-preview is already in the preview tier as of this writing, and Google is actively iterating. The architecture we have built here isolates model selection to a single config value and a single service class. When the next stable Gemini model lands, your migration is a one-line change.

DEV Community