Samuel Akopyan

Posted on Apr 6

From Arrays to GPU - how the PHP ecosystem is moving toward real ML

#php #machinelearning #ai #phpml

From Arrays to GPU - how the PHP ecosystem is moving toward real ML
Why PHP arrays are a poor fit for math, how Tensor and NDArray emerged, and why RubixML eventually moved toward GPU.

Whenever I talk to someone or write about "machine learning in PHP", the reaction is usually predictable: people either smirk, point to Python, or downvote (except for those who are genuinely interested).

The general conclusion is simple - this is just not what the language is for.

And honestly, there is some truth in that. But…

For several years now, I've been trying to push the idea of "ML in PHP". Why? I'm not entirely sure. I've asked myself that question many times and still don't have a clear answer - even for myself.

I'm not crazy, and I'm not obsessed with some fixed idea.

I'm just a regular developer who enjoys writing PHP.
This language has paid my bills for years (and still does). It's not perfect, but overall I like how it evolves and where it's heading.

So it feels like the very idea of "ML in PHP" is a kind of challenge that keeps me up at night - trying to understand math formulas or implement yet another ML algorithm in PHP. And honestly - I enjoy it.

Of course, PHP was never designed for numerical computing. It has no built-in support for vector operations, no real control over memory, and no access to low-level optimizations that are standard in scientific computing.

And yet - machine learning in PHP exists.

Not as an experiment, but as part of real systems:

inference directly inside web applications
data processing inside SaaS
automation and embedded analytics

And every month - the use of ML in PHP keeps growing.

If you're skeptical, take a look here:
👉 https://github.com/apphp/awesome-php-ml

This is one of the most complete curated lists of machine learning, AI, NLP, LLM, and data analysis libraries for PHP - over 140 resources. The PHP ML ecosystem is not loud, but it's fairly mature. It consists of four layers:

classical ML libraries
mathematical foundation
integration tools with modern ML systems
integration with external ML services

In this article, we'll focus on the first two layers, and only briefly touch the other two.

Despite the ecosystem being alive and growing, if you look deeper, it becomes clear: it evolved not thanks to the language, but despite its limitations - driven by a handful of enthusiasts (I'll try to mention as many as I know).

Let me repeat a cliché - this article is not a library overview and not a comparison of PHP vs Python.

It's a story of how the PHP community gradually arrived at the same conclusion:

the problem is not the algorithms - it's the runtime model itself.

At first, vectors and matrices were implemented as plain PHP arrays. Then came optimization attempts. Then native structures written in C and Rust. And at some point it became obvious: even that is not enough.

The next step was inevitable - moving computation to the GPU.

We'll walk through this path step by step: from naive implementations to modern solutions like Tensor, NDArray, and GPU backends in RubixML.

Along the way, we'll see how not only the API changed, but also the understanding of where PHP ends - and where a real computational system or orchestration layer begins.

In short: this is not a story about how to do ML in PHP, but why it had to be done differently.

2. First attempt: when a matrix is just an array

If we ignore theory, a matrix is just a table of numbers. And since PHP already has arrays, you can represent a matrix as an array of arrays:

$matrix = [
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
];

This idea was the foundation of the first ML libraries in PHP:

PHP-ML - https://github.com/jorgecasas/php-ml (by Arkadiusz Kondas)
early versions of RubixML - https://github.com/RubixML/RubixML (by Andrew DalPino)
and a few others

Everything was written in pure PHP - no extensions, no native code (meaning all computations were done by the interpreter, without low-level optimizations).

From a developer's perspective, this was extremely convenient: no compilation required, easy debugging with familiar tools, and fully transparent code.

What it looked like in code

Even operations like matrix multiplication were implemented very directly - using three nested loops:

function matmul(array $a, array $b): array {
    $result = [];

    $rowsA = count($a);
    $colsA = count($a[0]);
    $colsB = count($b[0]);

    for ($i = 0; $i < $rowsA; $i++) {
        for ($j = 0; $j < $colsB; $j++) {
            $sum = 0.0;
            for ($k = 0; $k < $colsA; $k++) {
                $sum += $a[$i][$k] * $b[$k][$j];
            }
            $result[$i][$j] = $sum;
        }
    }

    return $result;
}

Or, for example, a dot product:

function dot(array $a, array $b): float {
    $sum = 0.0;

    for ($i = 0, $n = count($a); $i < $n; $i++) {
        $sum += $a[$i] * $b[$i];
    }

    return $sum;
}

From an algorithmic point of view - everything is correct.
From a PHP point of view - everything is "by the book".

For a human - it's clear and readable. And most importantly, it works.

In fact, this approach never really disappeared.

Even today, you can find fresh examples where neural networks are implemented in pure PHP - without external libraries or extensions. Usually not for production, but for learning: to understand how things work "under the hood" and how complex algorithms are built from simple operations.

For example (manual forward pass of a neural network layer):
function forward(array $inputs, array $weights, array $biases):

array {
    $outputs = [];

    foreach ($weights as $i => $neuronWeights) {
        $sum = 0.0;

        foreach ($neuronWeights as $j => $weight) {
            $sum += $inputs[$j] * $weight;
        }

        $sum += $biases[$i];
        $outputs[$i] = 1 / (1 + exp(-$sum));
    }
    return $outputs;
}

This is an important point.

On one hand, it shows that:

the approach is still understandable
it's still useful for learning and experimentation

On the other hand - its limitations become obvious very quickly when you try to scale or move to production.

If you're interested in looking ahead, I recommend watching Alexey Nechaev's relatively recent talk (It's in Russian, but I promise you'll have a blast watching it):

It walks you through the implementation of a neural network step by step, directly in PHP, using the same concepts: arrays, loops, and basic mathematical operations.

Why it was convenient

The main advantage - simplicity.

If you're motivated, you can implement in one evening:

k-NN
simple linear regression
a basic classifier
etc.

All without leaving PHP.

For example, a piece of logistic regression:

function sigmoid(float $x): float {
    return 1 / (1 + exp(-$x));
}

function predict(array $weights, array $features): float {
    return sigmoid(dot($weights, $features));
}

As I like to say - "everything is neat and old-school".

On small datasets, this works fine.
For learning, experiments, and prototypes - more than enough.

Problems start a bit later.

3. Why it stops working

The problems don't appear immediately.

At first everything looks fine. Then the data grows - and the code starts slowing down. First a little, then dramatically, and eventually it becomes unusable.

The first instinct is: "we need to optimize the loops". But very quickly it becomes clear - the issue is not the algorithms or the code itself. Loop optimizations don't help much.

The problem is deeper - in how PHP works internally.

A number is not just a number

In PHP, a number is not a primitive - it's a zval structure that stores:

type
value
metadata (e.g., reference count)

So instead of compact 4–8 bytes like in C, each element takes much more memory.

If you have a million elements - that's a million such structures.

An array is a hash table

A PHP array is not contiguous memory - it's typically a hash table (except for packed arrays).

That means elements are not stored next to each other. Access is not just "index lookup", but a sequence of operations:

key lookup
traversal of internal structures
extraction of the zval

For example:

$value = $matrix[10][5];

Under the hood, this is a series of operations, not a direct memory access.

This immediately impacts several things:

CPU cache efficiency drops
data access becomes more expensive
no automatic SIMD vectorization at the PHP level

In numerical computing, this is critical.

Copy-on-write

There is another less obvious issue - copy-on-write.

PHP may implicitly copy arrays when modified:

$b = $a;
$b[0][0] = 42;

You don't always control when the copy happens.

In matrix-heavy workloads, this can:

significantly increase memory usage
introduce extra allocations

No vectorization

All operations are done via loops:

for (...) {
    $c[$i] = $a[$i] + $b[$i];
}

Whereas in NumPy this is a single operation executed in C using SIMD (Single Instruction, Multiple Data).

So you end up in a strange situation: the algorithm is correct, the code is simple - but every operation is too expensive.

And that's why everything suddenly breaks when data grows.

What it looks like in practice

Let's compare a simple operation - matrix multiplication.

Option 1: pure PHP

function matmul(array $a, array $b): array {
    $result = [];

    $rowsA = count($a);
    $colsA = count($a[0]);
    $colsB = count($b[0]);

    for ($i = 0; $i < $rowsA; $i++) {
        for ($j = 0; $j < $colsB; $j++) {
            $sum = 0.0;
            for ($k = 0; $k < $colsA; $k++) {
                $sum += $a[$i][$k] * $b[$k][$j];
            }
            $result[$i][$j] = $sum;
        }
    }

    return $result;
}

Option 2: Tensor

(https://github.com/RubixML/Tensor)

use Tensor\Matrix;

$a = Matrix::rand(500, 500);
$b = Matrix::rand(500, 500);

$c = $a->matmul($b);

Option 3: NumPower (potential GPU)

(https://github.com/RubixML/numpower)

$c = NumPower::multiply($a, $b);
// or
$c = $a * $b;

Results (approximate)

Approach Time for matrix 500x500

PHP arrays ~ 10–20 sec
Tensor (CPU) ~ 0.3–0.8 sec
GPU (NumPower) ~ 0.05–0.2 sec

The exact numbers depend on hardware, but the order of magnitude remains.

The key point is not absolute values - it's the orders of magnitude difference.

And this is where it becomes clear: the issue is not that PHP is "slow" or "bad" - it's that it's using the wrong tool for the job.

4. Optimization attempts - and why they don't save it

For example, you can remove repeated count() calls and reduce array access:

$count = count($a);

for ($i = 0; $i < $count; $i++) {
    $ai = $a[$i];
    $bi = $b[$i];
    $sum += $ai * $bi;
}

This:

calls count() once instead of every iteration
reduces array access
uses local variables, which are faster

It does give a small speedup, especially in large loops. But the gain is modest and depends heavily on the code and PHP version.

You quickly hit the limits of such micro-optimizations.

The core issue is that this is optimization at the syntax level.

The fundamental limitations remain:

no memory control
no contiguous storage
no SIMD
no BLAS

The bottleneck is not the loop - it's PHP's memory model.

Still, optimizations didn't stop there.

Optimization: transposition

A common trick is to transpose the second matrix in advance.

The issue with naive multiplication in PHP is frequent column access:

$sum += $a[$i][$k] * $b[$k][$j];

Accessing $a[$i][$k] is fine – it's row-wise.
But $b[$k][$j] is column access – inefficient in PHP arrays.

As a result, we are constantly jumping around in memory.

👉 For a fun way to pass the time, I highly recommend watching the video of Andrew DalPino (author of the RubixML library):

The solution here is deadly simple: transpose the second matrix in advance:

function transpose(array $matrix): array {
    $result = [];

    foreach ($matrix as $i => $row) {
        foreach ($row as $j => $value) {
            $result[$j][$i] = $value;
        }
    }
    return $result;
}

Then:

function matmulOptimized(array $a, array $b): array {
    $bT = transpose($b);
    $result = [];

    $rowsA = count($a);
    $colsB = count($bT);
    for ($i = 0; $i < $rowsA; $i++) {
        for ($j = 0; $j < $colsB; $j++) {
            $sum = 0.0;
            for ($k = 0; $k < count($a[0]); $k++) {
                $sum += $a[$i][$k] * $bT[$j][$k];
            }
            $result[$i][$j] = $sum;
        }
    }
    return $result;
}

Now we always work with rows:

$bT[$j][$k] // instead of $b[$k][$j]

Why it helps:

fewer memory "jumps"
better CPU cache usage
more predictable access

This can noticeably improve performance even in pure PHP.

But - there's a catch.

We still:

use zval
use hash tables
run inside the interpreter

So we're making things less bad - but not truly efficient.

At some point, you have to admit: even advanced PHP tricks won't replace a proper data model.

PHP was not designed for high-performance numerical computing.

Optimization: CPU cache

Another idea - improve cache locality.

In low-level languages (C, C++, NumPy), performance heavily depends on how data is laid out in memory.

If data is contiguous, the CPU can read it in blocks (cache lines):

for ($i = 0; $i < $n; $i++) {
    $sum += $a[$i] * $b[$i];
}

This is fast in C not because it's simple, but because:

data is contiguous
CPU reads it in blocks
access is predictable

So naturally, you try:

row-based access
minimize randomness
use linear structures

Where it breaks in PHP

Even with a "flat" array:

$a = [1.0, 2.0, 3.0, 4.0];

In memory this is:

not a sequence of floats
but an array of zval
still backed by internal structures

So:

data is not compact
there is overhead between elements
CPU cache is not used efficiently

You end up optimizing as if cache matters - but getting little benefit.

Optimization: packed arrays

PHP 7+ introduced packed arrays for simple cases:

$a = [1, 2, 3, 4, 5];

These are more compact and faster.

But for matrices:

$matrix = [
    [1, 2, 3],
    [4, 5, 6],
];

You still get an array of arrays - not truly contiguous.

And:

each element is still a zval
no real contiguous memory

Plus, packed arrays are fragile:

$a = [1, 2, 3];
$a[10] = 4; // breaks packing

In ML workloads (filtering, reshaping, indexing), this happens all the time.

If you combine everything - transposition, cache tricks, packed arrays - you see the same pattern:

we can speed things up a bit - but we can't change the foundation.

And again we arrive at the key insight:

the problem is not how we write loops –
the problem is how data is represented in memory.

A subtle but important point

There's another aspect that's rarely discussed.

Many libraries from that era simply stopped evolving. The internet is full of abandoned PHP ML pet projects.

Not because the idea was bad - quite the opposite. It was logical.

But developers quickly hit the same ceiling.

They could:

rewrite internals
add more optimizations
improve APIs

But it didn't lead to fundamental gains.

At some point, authors realized:

no matter how much you optimize PHP arrays, they won't become suitable for numerical computing.

And then there were only two options:

rewrite everything in C / Rust
lose motivation

Often, it was the second.

Enthusiasm fades when each step brings diminishing returns, while the core problem remains.

And this is one of the clearest signals:

the issue was not in the libraries –
it was in the model itself.

5. The turning point: when matrices stop being arrays

The next stage was simple - stop treating matrices as PHP arrays.

Libraries and extensions appeared that store data outside PHP structures.

For example:

Rubix Tensor (from the already known to us Andrew DalPino) - https://github.com/RubixML/Tensor
NDArray (implementation using Rust by Kyrian Obikwelu) - https://github.com/phpmlkit/ndarray

Now data is stored in contiguous memory, like in NumPy. This is a completely different level.

The code looks similar:

use Tensor\Matrix;

$a = Matrix::quick([
    [1, 2],
    [3, 4],
]);
$b = Matrix::quick([
    [5, 6],
    [7, 8],
]);
$c = $a->matmul($b);

But internally everything changes:

no zval per element
no hash table
tightly packed data
efficient CPU usage

Rust plays an interesting role here. In NDArray, it provides performance without typical C memory pitfalls. It's a good example of PHP delegating heavy work to other languages.

And it works - speedups can be significant.

But even this is not the end.

Architecture shift

Simplified evolution:
PHP arrays
    ↓
(realization of limitations)
    ↓
Native structures (Tensor, ndarray)
    ↓
C / Rust extensions

More broadly - this is a shift in roles:

before: PHP did computations
now: PHP orchestrates

And interestingly, code becomes not just faster - but less "PHP-like" and closer to actual math.

Higher-level operations

$c = $a->add($b);
$c = $a->divide($a->sum());

This reads almost like formulas.

But something still feels missing.

The API is still object-oriented, not fully mathematical.

NumPower and a new abstraction level

NumPower (originally created by Henrique Borba, not it's a part of RubixML) goes further:

$c = NumPower::multiply($a, $b);
// or
$c = $a * $b;

Even:

$y = $a / $b;

This is not just syntax sugar - it's making math expressions native to the language.

6. A new ceiling: when CPU is not enough

As workloads grow, it becomes clear - the problem is not just PHP.
Even with perfect data structures and fast native code, CPU has limits: cores and bandwidth.

ML tasks are essentially massive matrix operations.

At some scale, even optimized CPU implementations struggle.

So we move to the next logical step.

7. Why everything leads to GPU

GPU is built for parallel computation: thousands of threads, high memory bandwidth - perfect for linear algebra.

Today it's obvious: squeezing more out of CPU is a dead end for many ML workloads.

You need to change the architecture.

What's happening now

RubixML v3 is moving toward GPU via projects like NumPower:

👉 https://github.com/RubixML/numpower

This is a paradigm shift:

PHP → orchestration
Tensor / NumPower → computation
GPU → heavy math

**This is still driven by enthusiasts and under active development.

If you're interested - this is a great time to get involved:**

👉 https://github.com/RubixML/ML/tree/3.0

This is no longer about "optimizing computation". It's about changing the model: PHP becomes an orchestrator, delegating heavy work to native layers.

8. What actually happened

Looking at the evolution, the picture is quite logical.

First, PHP was used as a compute environment. Then it became clear it's not suited for that. Then computation moved outward: first to native structures, then extensions, and now GPU.

Today, PHP in ML is mostly an orchestration layer. It connects components, manages processes, handles data - while heavy math happens elsewhere:

GPU
Rust
native libraries

And that's probably the key takeaway.

A practical example

We have projects like:

👉 https://github.com/CodeWithKyrian/transformers-php (by Kyrian Obikwelu)

This library gives access to modern models (like Hugging Face transformers) in PHP.

But under the hood, PHP is not doing heavy computation.

It:

manages the pipeline
loads models
processes results

Computation happens outside PHP - via native libraries or external runtimes.

Example:

use function Codewithkyrian\Transformers\Pipelines\pipeline;

// Allocate a pipeline for sentiment analysis
$classifier = pipeline('sentiment-analysis');

$out = $classifier(['I love transformers!']);
echo print_r($out, true) . PHP_EOL;

$out = $classifier(['I hate transformers!']);
echo print_r($out, true);

This shows how PHP's role changed:

before:

// we implement the mathematics ourselves
$sum += $inputs[$i] * $weights[$i];

now:

// we use a ready-made ML infrastructure
$result = $pipeline($text);

Similarly, libraries like:

👉 https://github.com/llphant/llphant (by Franco Lombardo, Fabrizio Balliano and others)

LLPhant moves even higher - working with full LLM scenarios:

text generation
embeddings
retrieval (RAG)
chat interfaces

Example:

use LLPhant\Chat\Enums\ChatRole;
use LLPhant\Chat\Message;
use LLPhant\Chat\OpenAIChat;
use LLPhant\OpenAIConfig;

$chat = new OpenAIChat(new OpenAIConfig(...));

$message = new Message();
$message->role = ChatRole::User;
$message->content = 'What is the capital of France? ';

$response = $chat->generateText((string)$message);
echo $response;

Embeddings:

use LLPhant\Embeddings\OpenAI\EmbeddingGenerator;

$generator = new EmbeddingGenerator();

$embedding = $generator->embed("PHP and machine learning");

php

Now we no longer think about:

matrices
loops
memory

We think about:

data
requests
system behavior

And the transformation doesn't stop here.

Higher-level tools are emerging where PHP is not just a wrapper, but an environment for building AI logic.

For example:

👉 https://github.com/neuron-core/neuron-ai (by Valerio Barbera)

Neuron AI focuses on:

agents
chains
LLM integrations

Example:

$message = MyAgent::make()
    ->chat(new UserMessage("Hi!"))
    ->getMessage();

echo "Q1: Hi!\n";
echo "A: " . $message->getContent() . "\n\n";
echo "Q2: Explain who you are?\n";
echo "A: ". MyAgent::make()
    ->chat(new UserMessage("Explain who you are?"))
    ->getMessage()
    ->getContent();

Here it's especially clear how PHP's role evolved:

before - we implemented math
then - we moved it to native structures
now - we describe system behavior

9. Conclusion

Looking at the full journey over the last ten years, it becomes clear: this is not just an evolution of tools - it's a gradual displacement of PHP from computation.

First, we honestly multiply matrices in arrays.
Then we optimize and hit a ceiling.
Then we move to native structures.
Then to C / Rust.
And eventually… to GPU.

PHP arrays
    ↓
(realization of limitations)
    ↓
Native structures (Tensor, ndarray)
    ↓
C / Rust extensions
    ↓
GPU (NumPower, RubixML v3)

And here's a somewhat uncomfortable conclusion:

PHP is not becoming faster at ML.
It's just stopping doing the math.

It's no longer a compute engine - it's a dispatcher, an orchestrator, glue between components.

And maybe that's exactly its chance to stay relevant.

But the question remains:

*Is the game still going on?
Or has PHP been playing a different game for a while now?
*

If you're interested in the topic, you can dive deeper in my free book:
https://apphp.gitbook.io/ai-for-php-developers/

And explore interactive examples:
https://aiwithphp.org/books/ai-for-php-developers/examples/

DEV Community

From Arrays to GPU - how the PHP ecosystem is moving toward real ML

2. First attempt: when a matrix is just an array

What it looked like in code

Why it was convenient

3. Why it stops working

A number is not just a number

An array is a hash table

Copy-on-write

No vectorization

What it looks like in practice

Option 1: pure PHP

Option 2: Tensor

Option 3: NumPower (potential GPU)

Results (approximate)

4. Optimization attempts - and why they don't save it

Optimization: transposition

Optimization: CPU cache

Where it breaks in PHP

Optimization: packed arrays

A subtle but important point

5. The turning point: when matrices stop being arrays

Architecture shift

Higher-level operations

NumPower and a new abstraction level

6. A new ceiling: when CPU is not enough

7. Why everything leads to GPU

What's happening now

8. What actually happened

A practical example

9. Conclusion

Top comments (0)