Reymart A. Calicdan

Posted on May 3 • Edited on May 18

Streaming AI Responses in PHP with Server-Sent Events

#php #ai #webdev #opensource

If you have ever built something with an LLM API like Gemini or OpenAI, you know the experience people expect: text that streams in word by word, like the model is thinking in real time. Achieving that in PHP has traditionally meant a lot of plumbing. You end up wrestling with raw curl handles, manual buffer flushing, or even reaching for a Node.js sidecar just to handle the stream.

I built Hibla HTTP Client to make this feel natural in PHP. It is an async-first, fiber-based HTTP client with first-class SSE support baked in. This post walks through how SSE works in the library and how you can use it today to consume streaming AI responses cleanly.

What is SSE and why does it matter for AI?

Server-Sent Events is a simple HTTP-based protocol where the server keeps a connection open and pushes data to the client as it becomes available. Every major LLM API, Gemini, OpenAI, Anthropic, Mistral, uses SSE to stream generated tokens back to you incrementally instead of making you wait for the full response.

The wire format looks like this:

data: {"candidates":[{"content":{"parts":[{"text":"Hello"}]}}]}

data: {"candidates":[{"content":{"parts":[{"text":" world"}]}}]}

Parsing that manually, reconnecting on drop, and handling backpressure is tedious. That is exactly what Hibla handles for you.

Installation

The library is currently in beta. Add this to your composer.json first:

{
    "minimum-stability": "beta",
    "prefer-stable": true
}

Then install:

composer require hiblaphp/http-client

Requirements: PHP 8.4+ and ext-curl.

A real example: streaming Gemini

Here is a complete working example that streams a response from Google Gemini and prints each token as it arrives:

<?php

declare(strict_types=1);

require 'vendor/autoload.php';

use Hibla\HttpClient\Http;
use Hibla\HttpClient\SSE\SSEControl;
use Hibla\HttpClient\SSE\SSEDataFormat;

use function Hibla\await;

function ask(string $prompt): void
{
    await(
        Http::client()
            ->withHeader('x-goog-api-key', getenv('GEMINI_API_KEY'))
            ->withMethod('POST')
            ->withJson([
                'contents' => [[
                    'role'  => 'user',
                    'parts' => [['text' => $prompt]],
                ]],
            ])
            ->sse('https://generativelanguage.googleapis.com/v1beta/models/gemma-3-27b-it:streamGenerateContent?alt=sse')
            ->withoutReconnection()
            ->withDataFormat(SSEDataFormat::DecodedJson)
            ->onEvent(function (array|string $data, SSEControl $control) {
                echo $data['candidates'][0]['content']['parts'][0]['text'] ?? '';
            })
            ->connect()
    );
}

ask('Why is PHP still relevant in 2026?');

Note: I know using getenv() is a bad practice is public code but this is just a sample code and you can use any configuration library of your choice.

That is it. No manual curl setup, no buffer parsing, no event loop wiring. The onEvent callback fires for every chunk the server sends, and SSEDataFormat::DecodedJson means the payload is already decoded for you, so no json_decode() needed.

How the SSE builder works

The .sse() method returns a fluent builder. You chain your configuration on it and call .connect() at the end, which returns a promise that resolves once the HTTP handshake completes. Events keep flowing after that, as the connection stays open.

Http::client()
    ->sse('https://api.example.com/events')
    ->withDataFormat(SSEDataFormat::DecodedJson)
    ->onEvent(function (array|string $data, SSEControl $control) {
        echo $data['message'];

        // stop the stream from inside the callback
        if ($data['done'] ?? false) {
            $control->cancel();
        }
    })
    ->onError(function (Throwable $e) {
        echo "Connection error: " . $e->getMessage();
    })
    ->connect();

A few things worth knowing about how this behaves:

The promise resolves as soon as the server responds with a 2xx status and headers. Events arrive after that. But if the server returns a non-2xx status at handshake time, the promise rejects immediately and no events are delivered.

The operation timeout is disabled by default for SSE connections because they are meant to be long-lived. The connection timeout still applies to the initial handshake, which is exactly what you want.

Data formats

One small thing that makes a real difference in practice is the withDataFormat() option. By default the callback receives a full SSEEvent object with all the raw fields, but for AI APIs that send JSON payloads you almost never care about anything other than the data field, already decoded.

use Hibla\HttpClient\SSE\SSEDataFormat;

// Full event object, useful when you need event type, id, or retry hint
->withDataFormat(SSEDataFormat::Event)
->onEvent(function (SSEEvent $event, SSEControl $control) {
    echo $event->data;
    echo $event->id;
})

// Already decoded JSON, the sweet spot for LLM APIs
->withDataFormat(SSEDataFormat::DecodedJson)
->onEvent(function (array|string $data, SSEControl $control) {
    echo $data['choices'][0]['delta']['content'] ?? '';
})

// Raw string, for when you want to handle deserialization yourself
->withDataFormat(SSEDataFormat::Raw)
->onEvent(function (string $data, SSEControl $control) {
    $decoded = json_decode($data, true);
})

Mapping events to typed objects

If you want to keep your onEvent callback clean and strongly typed, you can use map() to transform each event before it reaches the callback. The mapper runs after format conversion, so you get the already-decoded value:

Http::client()
    ->sse('https://api.example.com/stream')
    ->withDataFormat(SSEDataFormat::DecodedJson)
    ->map(fn(array $data) => new ChatChunk($data))
    ->onEvent(function (ChatChunk $chunk, SSEControl $control) {
        echo $chunk->text();
    })
    ->connect();

This is particularly useful if you are building a wrapper around an LLM API and want clean domain objects flowing through your application rather than raw arrays.

Reconnection

For production use cases where you cannot afford to silently drop a stream, Hibla has built-in reconnection with exponential backoff. The Last-Event-ID header is forwarded automatically on reconnect so the server can resume from where it left off:

Http::client()
    ->sse('https://api.example.com/events')
    ->reconnect(
        maxAttempts:       10,
        initialDelay:      1.0,
        maxDelay:          30.0,
        backoffMultiplier: 2.0,
        jitter:            true,
    )
    ->onEvent(fn($data, $ctrl) => handleEvent($data))
    ->connect();

For AI streaming use cases you typically want .withoutReconnection() since a dropped generation is not resumable anyway. But for persistent event feeds like webhooks, live dashboards, or notification streams, the reconnection support is genuinely useful.

Cancellation

You can cancel a stream from inside the callback via SSEControl, or from outside via the promise or connection object:

// From inside, a clean way to stop after a sentinel value
->onEvent(function (array|string $data, SSEControl $control) {
    if (($data['finish_reason'] ?? '') === 'stop') {
        $control->cancel();
    }
})

// From outside
$promise    = Http::client()->sse('...')->onEvent(...)->connect();
$connection = await($promise);

$connection->close();
// or equivalently
$promise->cancel();

Cancellation immediately closes the connection and suppresses any further reconnection attempts.

Running multiple streams concurrently

Because every connect() call returns a promise, you can run multiple SSE connections at the same time using Promise::all(). This is useful if you want to fan out a prompt to multiple models and compare their outputs:

use Hibla\Promise\Promise;
use function Hibla\await;

$stream = fn(string $model, string $prompt) => Http::client()
    ->withHeader('x-goog-api-key', getenv('GEMINI_API_KEY'))
    ->withMethod('POST')
    ->withJson(['contents' => [['role' => 'user', 'parts' => [['text' => $prompt]]]]])
    ->sse("https://generativelanguage.googleapis.com/v1beta/models/{$model}:streamGenerateContent?alt=sse")
    ->withoutReconnection()
    ->withDataFormat(SSEDataFormat::DecodedJson)
    ->onEvent(function (array|string $data, SSEControl $control) use ($model) {
        echo "[{$model}] " . ($data['candidates'][0]['content']['parts'][0]['text'] ?? '');
    })
    ->connect();

await(Promise::all([
    $stream('gemini-2.5-flash', 'Explain fibers in PHP'),
    $stream('gemma-3-27b-it',   'Explain fibers in PHP'),
]));

Both streams run concurrently under the same event loop. No threads, no forking, no extra processes.

Beyond SSE

SSE is one piece of what Hibla HTTP Client covers. The same async-first design applies across the whole library: regular requests, file downloads, uploads, streaming responses, and a full interceptor pipeline for auth, logging, caching, and retry logic. If you have used Laravel's HTTP client, the fluent builder API will feel familiar.

// Concurrent requests
[$users, $posts] = await(Promise::all([
    Http::get('https://api.example.com/users'),
    Http::get('https://api.example.com/posts'),
]));

// Sliding window concurrency, keeping 10 requests in flight at a time
$results = await(Promise::concurrent($tasks, concurrency: 10));

// File download with progress tracking
await(Http::download(
    'https://files.example.com/model-weights.bin',
    '/tmp/weights.bin',
    fn(DownloadProgress $p) => printf("%.1f%%\n", $p->percent)
));

Try it out

The library is on GitHub and Packagist. If you are building anything with LLM APIs in PHP, the SSE support is worth a look as it cuts out a lot of the boilerplate that usually comes with streaming responses.

GitHub: https://github.com/hiblaphp/http-client
Packagist: hiblaphp/http-client

Feedback, issues, and contributions are very welcome. The library is young and there is still a lot to build out, including alternative transports beyond cURL. If you try it and run into something unexpected, open an issue.

Have you worked with SSE in PHP before? What was your approach? Let me know in the comments.

DEV Community