Manychat Engineering for Manychat

Posted on Jun 24 • Originally published at Medium on Apr 16

PHP Fibers: Simplifying Async Code and Speeding Up Development

#phpdevelopers #php #softwaredevelopment #softwareengineering

PHP Fibers: simplifying async code and speeding up development

How serialization overhead, a surprise OpenSSL upgrade, and idle workers pushed us toward PHP 8.1 Fibers, and what changed when we did.

I’m Max, Infrastructure Team Lead at Manychat. This is the next part of our PHP series.

In the previous article, we built concurrent HTTP requests in PHP without threads — using curl_multi_exec to let a single worker handle multiple external calls at once. It worked. Then our AI features expanded, external calls multiplied, and the model started buckling under its own complexity.

This article is about what we did next: PHP 8.1 Fibers, and how they changed the way our workers process payloads.

What was exactly wrong with Concurrent Requests?

The curl_multi_exec architecture came with a steep price. To make pseudo-concurrency work, we had to explicitly serialize and deserialize requests, responses and exceptions at every async boundary. That meant a significant refactor, new internal tooling, and conventions developers had to follow just to write new code correctly. As AI features grew in scope and number, the cognitive overhead became impossible to ignore.

Error Handling complexity. Handling exceptions, timeouts, and corner cases got increasingly painful. Every new scenario — retries, network failures, edge cases — required explicit handling, and since context had to survive serialization boundaries, each one added another layer of boilerplate

Scattered Context. The hardest part wasn’t writing the code — it was reading it afterward. Business logic was split across serialization points: some state lived before the async boundary, some after. Tracing a single payload through the system meant mentally jumping between the sync worker, the async queue, and back. Code reviews became genuinely hard.

Testing Overhead. Testing also became more complicated. Tests had to account for the full serialization/deserialization chain. Even a simple mock meant verifying multiple intermediate steps instead of a single function call.

Idle Workers. Before Fibers, Meta API calls stayed synchronous — serializing and deserializing state at the point of the call would have required even more refactoring, so we just didn’t touch it. The average response time is around 250ms. Not slow enough to panic over — but not fast enough to ignore at Manychat’s scale. During that time, the worker just sat there.

Bottom line: the code was getting harder to read, harder to test, and harder to extend. Development was slowing down — and everyone felt it.

Three more things that made us rethink

While we were sitting with those outcomes, three things happened in parallel:

1. We solved our memory problem via PCNTL fork. By using pcntl_fork() to spawn workers, we enabled OPcache sharing and Linux copy-on-write — significantly reducing the memory footprint of each worker. In theory, we could almost stop worrying about idle workers; they were no longer consuming nearly as much memory. But they still consumed network connections. So the problem wasn’t fully gone.

2. Ubuntu upgrade revealed a new bottleneck. We migrated from Ubuntu 20.04 to the new LTS and CPU load jumped 10%. Nothing in our code had changed.

We dug in. The problem was OpenSSL 3.0 — shipped with the new Ubuntu — which made SSL handshakes significantly more expensive. OpenSSL’s root certificate store on Linux is one large concatenated file — and the new version introduced mutex-style locking when iterating through it. Even Facebook’s own optimization of using a single root certificate file didn’t fully absorb the hit.

The cause was our still-synchronous calls to the Meta API. Each payload opened a new TCP connection. At Manychat’s scale, that added up fast — and that 10% CPU overhead became the trigger for the next step.

So Anton Gorin, Chief Architect of Manychat, and I decided to combine our existing async worker — built on curl_multi_exec — with Fibers introduced in PHP 8.1.

What is a Fiber?

Fibers are a low-level mechanism for cooperative multitasking: pause execution at any point, resume it later from exactly the same spot, without — no threads, no processes.

<?php

$fiber = new Fiber(function() {

    echo "Suspending…\n";

    $last = Fiber::suspend(16);

    echo "Resuming with last value {$last}\n";

});

$last = $fiber->start();

echo "Suspended with last value {$last}\n";

$fiber->resume(42);

Suspended with last value 16;

Resuming with last value 42.

Unlike true multithreading, Fibers run within a single OS thread and don’t execute in parallel. Instead, they switch context explicitly via Fiber::suspend() and resume. That makes them well-suited for I/O-bound work: yield control while waiting for a response, do something else, come back when it’s ready.

The new payload processing flow with fibers

Previously, every HTTP request meant serialize, hand off, wait, deserialize, restore. Here’s what that looked like in practice:

The sync worker picks a payload from the queue and starts processing it.
When execution hits an external HTTP call, the worker serializes the request along with the current business-logic state and writes it to the async task queue.
The async worker reads from the queue, deserializes multiple requests, and executes them concurrently via curl_multi_exec.
When a response is ready, the async worker serializes it together with the updated state and writes it back to the sync task queue.
A sync worker picks it up, deserializes everything, restores business-logic state, and continues from where it left off.

This is the diagram of this very complex flow:

With Fibers, the logic was simpler:

The worker starts a fiber and begins processing a payload.
When execution hits an external HTTP call — Meta API, LLM, whatever — the fiber suspends, returning the request that needs to be executed.
The workflow is passed to Guzzle request loop, which executes the request and if there is no response ready with data, the worker immediately starts the next fiber and begins processing another payload.
If there is any response available in Guzzle loop, the corresponding fiber resumes from exactly where it stopped.
If that fiber produces another request, it suspends again and goes back into the loop.

Within a single worker, multiple fibers may be suspended, waiting for the response to come simultaneously and one actively executed at the same moment of time, depending on configuration.

What are the wins? And one trade-off

More cases to make async — like API calls via Meta SDK

Before Fibers, making Meta API calls async meant serializing and deserializing business state around every call. We just didn’t bother. With Fibers, we added a suspend point and called it done. A single Meta API call takes ~250ms — small individually, but Manychat makes billions of them. The compound effect is massive.

Savings on resources: CPU and connections

We rewrote part of the Facebook SDK to reuse connections. One HTTP/2 connection per worker, multiplexed across multiple requests. No repeated TCP handshakes. No OpenSSL overhead per request.

CPU usage returned to previous levels.

Asynchronous Sleep

Sometimes we need to wait — for example, before retrying after an HTTP 500, or to ensure correct message order before sending the next one. A regular sleep() blocks the entire process. If API errors spike and retry logic misbehaves, you’ve put the whole server to sleep.

With fibers, we can implement an asynchronous sleep. A specific fiber sleeps for a defined interval while the worker continues processing other fibers.

Simpler code

No more serialization. No more deserialization. Business context stays where it belongs — inside the fiber. Developers don’t even need to know they’re inside a fiber. The code looks like regular PHP — because for all practical purposes, it is.

In practice: instead of pushing a request back into the queue on retry, you just do an asynchronous sleep.

Simpler tests

Testing async code with Guzzle required enormous effort — the full serialization/deserialization chain had to be accounted for, and even a simple mock meant verifying multiple intermediate steps. With Fibers, the code reads linearly and tests follow naturally. That said, some things are hard to reproduce outside production — but in practice, if it worked in dev, it worked in prod.

One trade-off: blast radius

Fibers came with one compromise. Previously, our guiding principle was “better to crash hard than silently suffer from error” — non-fatal warning, log it, terminate the worker. One payload lost, clean slate.

With multiple fibers suspended simultaneously, that no longer works. Terminating the worker interrupts all in-flight payloads at once. We redesigned exception handling so that catchable errors terminate only the affected fiber while the worker continues processing others. Fatal errors — like out-of-memory — still take down the entire process. If five payloads are in flight, all five are lost.

This meant working through existing technical debt and committing to treating critical errors as critical — actually reacting to them, not letting them slide. Migrating to PHP 8.5 helped: it introduced stack traces for fatal errors, which made them significantly easier to diagnose and fix.

Could we have done less work?

Probably. Revolt, ReactPHP, AMPHP and OpenSwoole all solve similar problems and would have saved us from building a custom event loop. AMPHP in particular goes further — async SQL queries, not just HTTP, and battle-tested error handling out of the box.

But we didn’t start from a blank slate. We already had a Guzzle-based event loop from the earlier proof of concept, and adding Fibers on top was the natural next step. Starting over today, we’d look at Revolt first and skip the custom event loop entirely.

What we’d keep regardless: developers don’t need to know they’re inside a fiber. The wrapping happens under the hood. That was a deliberate choice — and it’s the part that matters most in a large codebase with many contributors.

This article is based on a talk I gave at PHP Talks #7. If you’d rather watch than read — the video is [_here](https://www.youtube.com/watch?v=in_XaE0T5IY)._